Sample records for optimized regression models

  1. Multidisciplinary Design Optimization for Aeropropulsion Engines and Solid Modeling/Animation via the Integrated Forced Methods

    NASA Technical Reports Server (NTRS)

    2004-01-01

    The grant closure report is organized in the following four chapters: Chapter describes the two research areas Design optimization and Solid mechanics. Ten journal publications are listed in the second chapter. Five highlights is the subject matter of chapter three. CHAPTER 1. The Design Optimization Test Bed CometBoards. CHAPTER 2. Solid Mechanics: Integrated Force Method of Analysis. CHAPTER 3. Five Highlights: Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft. Neural Network and Regression Soft Model Extended for PX-300 Aircraft Engine. Engine with Regression and Neural Network Approximators Designed. Cascade Optimization Strategy with Neural network and Regression Approximations Demonstrated on a Preliminary Aircraft Engine Design. Neural Network and Regression Approximations Used in Aircraft Design.

  2. Optimizing separate phase light hydrocarbon recovery from contaminated unconfined aquifers

    NASA Astrophysics Data System (ADS)

    Cooper, Grant S.; Peralta, Richard C.; Kaluarachchi, Jagath J.

    A modeling approach is presented that optimizes separate phase recovery of light non-aqueous phase liquids (LNAPL) for a single dual-extraction well in a homogeneous, isotropic unconfined aquifer. A simulation/regression/optimization (S/R/O) model is developed to predict, analyze, and optimize the oil recovery process. The approach combines detailed simulation, nonlinear regression, and optimization. The S/R/O model utilizes nonlinear regression equations describing system response to time-varying water pumping and oil skimming. Regression equations are developed for residual oil volume and free oil volume. The S/R/O model determines optimized time-varying (stepwise) pumping rates which minimize residual oil volume and maximize free oil recovery while causing free oil volume to decrease a specified amount. This S/R/O modeling approach implicitly immobilizes the free product plume by reversing the water table gradient while achieving containment. Application to a simple representative problem illustrates the S/R/O model utility for problem analysis and remediation design. When compared with the best steady pumping strategies, the optimal stepwise pumping strategy improves free oil recovery by 11.5% and reduces the amount of residual oil left in the system due to pumping by 15%. The S/R/O model approach offers promise for enhancing the design of free phase LNAPL recovery systems and to help in making cost-effective operation and management decisions for hydrogeologists, engineers, and regulators.

  3. SCI model structure determination program (OSR) user's guide. [optimal subset regression

    NASA Technical Reports Server (NTRS)

    1979-01-01

    The computer program, OSR (Optimal Subset Regression) which estimates models for rotorcraft body and rotor force and moment coefficients is described. The technique used is based on the subset regression algorithm. Given time histories of aerodynamic coefficients, aerodynamic variables, and control inputs, the program computes correlation between various time histories. The model structure determination is based on these correlations. Inputs and outputs of the program are given.

  4. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.; Bader, Jon B.

    2010-01-01

    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  5. Fatigue design of a cellular phone folder using regression model-based multi-objective optimization

    NASA Astrophysics Data System (ADS)

    Kim, Young Gyun; Lee, Jongsoo

    2016-08-01

    In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.

  6. ℓ(p)-Norm multikernel learning approach for stock market price forecasting.

    PubMed

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.

  7. A computer tool for a minimax criterion in binary response and heteroscedastic simple linear regression models.

    PubMed

    Casero-Alonso, V; López-Fidalgo, J; Torsney, B

    2017-01-01

    Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  8. Optimization of Regression Models of Experimental Data Using Confirmation Points

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2010-01-01

    A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.

  9. ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting

    PubMed Central

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561

  10. Real estate value prediction using multivariate regression models

    NASA Astrophysics Data System (ADS)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  11. Application of nonlinear least-squares regression to ground-water flow modeling, west-central Florida

    USGS Publications Warehouse

    Yobbi, D.K.

    2000-01-01

    A nonlinear least-squares regression technique for estimation of ground-water flow model parameters was applied to an existing model of the regional aquifer system underlying west-central Florida. The regression technique minimizes the differences between measured and simulated water levels. Regression statistics, including parameter sensitivities and correlations, were calculated for reported parameter values in the existing model. Optimal parameter values for selected hydrologic variables of interest are estimated by nonlinear regression. Optimal estimates of parameter values are about 140 times greater than and about 0.01 times less than reported values. Independently estimating all parameters by nonlinear regression was impossible, given the existing zonation structure and number of observations, because of parameter insensitivity and correlation. Although the model yields parameter values similar to those estimated by other methods and reproduces the measured water levels reasonably accurately, a simpler parameter structure should be considered. Some possible ways of improving model calibration are to: (1) modify the defined parameter-zonation structure by omitting and/or combining parameters to be estimated; (2) carefully eliminate observation data based on evidence that they are likely to be biased; (3) collect additional water-level data; (4) assign values to insensitive parameters, and (5) estimate the most sensitive parameters first, then, using the optimized values for these parameters, estimate the entire data set.

  12. [Extraction Optimization of Rhizome of Curcuma longa by Response Surface Methodology and Support Vector Regression].

    PubMed

    Zhou, Pei-pei; Shan, Jin-feng; Jiang, Jian-lan

    2015-12-01

    To optimize the optimal microwave-assisted extraction method of curcuminoids from Curcuma longa. On the base of single factor experiment, the ethanol concentration, the ratio of liquid to solid and the microwave time were selected for further optimization. Support Vector Regression (SVR) and Central Composite Design-Response Surface Methodology (CCD) algorithm were utilized to design and establish models respectively, while Particle Swarm Optimization (PSO) was introduced to optimize the parameters of SVR models and to search optimal points of models. The evaluation indicator, the sum of curcumin, demethoxycurcumin and bisdemethoxycurcumin by HPLC, were used. The optimal parameters of microwave-assisted extraction were as follows: ethanol concentration of 69%, ratio of liquid to solid of 21 : 1, microwave time of 55 s. On those conditions, the sum of three curcuminoids was 28.97 mg/g (per gram of rhizomes powder). Both the CCD model and the SVR model were credible, for they have predicted the similar process condition and the deviation of yield were less than 1.2%.

  13. Design Sensitivity for a Subsonic Aircraft Predicted by Neural Network and Regression Models

    NASA Technical Reports Server (NTRS)

    Hopkins, Dale A.; Patnaik, Surya N.

    2005-01-01

    A preliminary methodology was obtained for the design optimization of a subsonic aircraft by coupling NASA Langley Research Center s Flight Optimization System (FLOPS) with NASA Glenn Research Center s design optimization testbed (COMETBOARDS with regression and neural network analysis approximators). The aircraft modeled can carry 200 passengers at a cruise speed of Mach 0.85 over a range of 2500 n mi and can operate on standard 6000-ft takeoff and landing runways. The design simulation was extended to evaluate the optimal airframe and engine parameters for the subsonic aircraft to operate on nonstandard runways. Regression and neural network approximators were used to examine aircraft operation on runways ranging in length from 4500 to 7500 ft.

  14. An optimal sample data usage strategy to minimize overfitting and underfitting effects in regression tree models based on remotely-sensed data

    USGS Publications Warehouse

    Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis

    2016-01-01

    Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.

  15. Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective

    PubMed Central

    Qian, Xiaoning; Dougherty, Edward R.

    2017-01-01

    The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268

  16. OPC modeling by genetic algorithm

    NASA Astrophysics Data System (ADS)

    Huang, W. C.; Lai, C. M.; Luo, B.; Tsai, C. K.; Tsay, C. S.; Lai, C. W.; Kuo, C. C.; Liu, R. G.; Lin, H. T.; Lin, B. J.

    2005-05-01

    Optical proximity correction (OPC) is usually used to pre-distort mask layouts to make the printed patterns as close to the desired shapes as possible. For model-based OPC, a lithographic model to predict critical dimensions after lithographic processing is needed. The model is usually obtained via a regression of parameters based on experimental data containing optical proximity effects. When the parameters involve a mix of the continuous (optical and resist models) and the discrete (kernel numbers) sets, the traditional numerical optimization method may have difficulty handling model fitting. In this study, an artificial-intelligent optimization method was used to regress the parameters of the lithographic models for OPC. The implemented phenomenological models were constant-threshold models that combine diffused aerial image models with loading effects. Optical kernels decomposed from Hopkin"s equation were used to calculate aerial images on the wafer. Similarly, the numbers of optical kernels were treated as regression parameters. This way, good regression results were obtained with different sets of optical proximity effect data.

  17. A Subsonic Aircraft Design Optimization With Neural Network and Regression Approximators

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.; Haller, William J.

    2004-01-01

    The Flight-Optimization-System (FLOPS) code encountered difficulty in analyzing a subsonic aircraft. The limitation made the design optimization problematic. The deficiencies have been alleviated through use of neural network and regression approximations. The insight gained from using the approximators is discussed in this paper. The FLOPS code is reviewed. Analysis models are developed and validated for each approximator. The regression method appears to hug the data points, while the neural network approximation follows a mean path. For an analysis cycle, the approximate model required milliseconds of central processing unit (CPU) time versus seconds by the FLOPS code. Performance of the approximators was satisfactory for aircraft analysis. A design optimization capability has been created by coupling the derived analyzers to the optimization test bed CometBoards. The approximators were efficient reanalysis tools in the aircraft design optimization. Instability encountered in the FLOPS analyzer was eliminated. The convergence characteristics were improved for the design optimization. The CPU time required to calculate the optimum solution, measured in hours with the FLOPS code was reduced to minutes with the neural network approximation and to seconds with the regression method. Generation of the approximators required the manipulation of a very large quantity of data. Design sensitivity with respect to the bounds of aircraft constraints is easily generated.

  18. Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft

    NASA Technical Reports Server (NTRS)

    Hopkins, Dale A.; Lavelle, Thomas M.; Patnaik, Surya

    2003-01-01

    The neural network and regression methods of NASA Glenn Research Center s COMETBOARDS design optimization testbed were used to generate approximate analysis and design models for a subsonic aircraft operating at Mach 0.85 cruise speed. The analytical model is defined by nine design variables: wing aspect ratio, engine thrust, wing area, sweep angle, chord-thickness ratio, turbine temperature, pressure ratio, bypass ratio, fan pressure; and eight response parameters: weight, landing velocity, takeoff and landing field lengths, approach thrust, overall efficiency, and compressor pressure and temperature. The variables were adjusted to optimally balance the engines to the airframe. The solution strategy included a sensitivity model and the soft analysis model. Researchers generated the sensitivity model by training the approximators to predict an optimum design. The trained neural network predicted all response variables, within 5-percent error. This was reduced to 1 percent by the regression method. The soft analysis model was developed to replace aircraft analysis as the reanalyzer in design optimization. Soft models have been generated for a neural network method, a regression method, and a hybrid method obtained by combining the approximators. The performance of the models is graphed for aircraft weight versus thrust as well as for wing area and turbine temperature. The regression method followed the analytical solution with little error. The neural network exhibited 5-percent maximum error over all parameters. Performance of the hybrid method was intermediate in comparison to the individual approximators. Error in the response variable is smaller than that shown in the figure because of a distortion scale factor. The overall performance of the approximators was considered to be satisfactory because aircraft analysis with NASA Langley Research Center s FLOPS (Flight Optimization System) code is a synthesis of diverse disciplines: weight estimation, aerodynamic analysis, engine cycle analysis, propulsion data interpolation, mission performance, airfield length for landing and takeoff, noise footprint, and others.

  19. Strategies of experiment standardization and response optimization in a rat model of hemorrhagic shock and chronic hypertension.

    PubMed

    Reynolds, Penny S; Tamariz, Francisco J; Barbee, Robert Wayne

    2010-04-01

    Exploratory pilot studies are crucial to best practice in research but are frequently conducted without a systematic method for maximizing the amount and quality of information obtained. We describe the use of response surface regression models and simultaneous optimization methods to develop a rat model of hemorrhagic shock in the context of chronic hypertension, a clinically relevant comorbidity. Response surface regression model was applied to determine optimal levels of two inputs--dietary NaCl concentration (0.49%, 4%, and 8%) and time on the diet (4, 6, 8 weeks)--to achieve clinically realistic and stable target measures of systolic blood pressure while simultaneously maximizing critical oxygen delivery (a measure of vulnerability to hemorrhagic shock) and body mass M. Simultaneous optimization of the three response variables was performed though a dimensionality reduction strategy involving calculation of a single aggregate measure, the "desirability" function. Optimal conditions for inducing systolic blood pressure of 208 mmHg, critical oxygen delivery of 4.03 mL/min, and M of 290 g were determined to be 4% [NaCl] for 5 weeks. Rats on the 8% diet did not survive past 7 weeks. Response surface regression model and simultaneous optimization method techniques are commonly used in process engineering but have found little application to date in animal pilot studies. These methods will ensure both the scientific and ethical integrity of experimental trials involving animals and provide powerful tools for the development of novel models of clinically interacting comorbidities with shock.

  20. Portfolio optimization for index tracking modelling in Malaysia stock market

    NASA Astrophysics Data System (ADS)

    Siew, Lam Weng; Jaaman, Saiful Hafizah; Ismail, Hamizun

    2016-06-01

    Index tracking is an investment strategy in portfolio management which aims to construct an optimal portfolio to generate similar mean return with the stock market index mean return without purchasing all of the stocks that make up the index. The objective of this paper is to construct an optimal portfolio using the optimization model which adopts regression approach in tracking the benchmark stock market index return. In this study, the data consists of weekly price of stocks in Malaysia market index which is FTSE Bursa Malaysia Kuala Lumpur Composite Index from January 2010 until December 2013. The results of this study show that the optimal portfolio is able to track FBMKLCI Index at minimum tracking error of 1.0027% with 0.0290% excess mean return over the mean return of FBMKLCI Index. The significance of this study is to construct the optimal portfolio using optimization model which adopts regression approach in tracking the stock market index without purchasing all index components.

  1. Hybrid ABC Optimized MARS-Based Modeling of the Milling Tool Wear from Milling Run Experimental Data

    PubMed Central

    García Nieto, Paulino José; García-Gonzalo, Esperanza; Ordóñez Galán, Celestino; Bernardo Sánchez, Antonio

    2016-01-01

    Milling cutters are important cutting tools used in milling machines to perform milling operations, which are prone to wear and subsequent failure. In this paper, a practical new hybrid model to predict the milling tool wear in a regular cut, as well as entry cut and exit cut, of a milling tool is proposed. The model was based on the optimization tool termed artificial bee colony (ABC) in combination with multivariate adaptive regression splines (MARS) technique. This optimization mechanism involved the parameter setting in the MARS training procedure, which significantly influences the regression accuracy. Therefore, an ABC–MARS-based model was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. Regression with optimal hyperparameters was performed and a determination coefficient of 0.94 was obtained. The ABC–MARS-based model's goodness of fit to experimental data confirmed the good performance of this model. This new model also allowed us to ascertain the most influential parameters on the milling tool flank wear with a view to proposing milling machine's improvements. Finally, conclusions of this study are exposed. PMID:28787882

  2. Hybrid ABC Optimized MARS-Based Modeling of the Milling Tool Wear from Milling Run Experimental Data.

    PubMed

    García Nieto, Paulino José; García-Gonzalo, Esperanza; Ordóñez Galán, Celestino; Bernardo Sánchez, Antonio

    2016-01-28

    Milling cutters are important cutting tools used in milling machines to perform milling operations, which are prone to wear and subsequent failure. In this paper, a practical new hybrid model to predict the milling tool wear in a regular cut, as well as entry cut and exit cut, of a milling tool is proposed. The model was based on the optimization tool termed artificial bee colony (ABC) in combination with multivariate adaptive regression splines (MARS) technique. This optimization mechanism involved the parameter setting in the MARS training procedure, which significantly influences the regression accuracy. Therefore, an ABC-MARS-based model was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc . Regression with optimal hyperparameters was performed and a determination coefficient of 0.94 was obtained. The ABC-MARS-based model's goodness of fit to experimental data confirmed the good performance of this model. This new model also allowed us to ascertain the most influential parameters on the milling tool flank wear with a view to proposing milling machine's improvements. Finally, conclusions of this study are exposed.

  3. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert M.

    2013-01-01

    A new regression model search algorithm was developed that may be applied to both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The algorithm is a simplified version of a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. The new algorithm performs regression model term reduction to prevent overfitting of data. It has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a regression model search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression model. Therefore, the simplified algorithm is not intended to replace the original algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new search algorithm.

  4. Surrogate Model Application to the Identification of Optimal Groundwater Exploitation Scheme Based on Regression Kriging Method—A Case Study of Western Jilin Province

    PubMed Central

    An, Yongkai; Lu, Wenxi; Cheng, Weiguo

    2015-01-01

    This paper introduces a surrogate model to identify an optimal exploitation scheme, while the western Jilin province was selected as the study area. A numerical simulation model of groundwater flow was established first, and four exploitation wells were set in the Tongyu county and Qian Gorlos county respectively so as to supply water to Daan county. Second, the Latin Hypercube Sampling (LHS) method was used to collect data in the feasible region for input variables. A surrogate model of the numerical simulation model of groundwater flow was developed using the regression kriging method. An optimization model was established to search an optimal groundwater exploitation scheme using the minimum average drawdown of groundwater table and the minimum cost of groundwater exploitation as multi-objective functions. Finally, the surrogate model was invoked by the optimization model in the process of solving the optimization problem. Results show that the relative error and root mean square error of the groundwater table drawdown between the simulation model and the surrogate model for 10 validation samples are both lower than 5%, which is a high approximation accuracy. The contrast between the surrogate-based simulation optimization model and the conventional simulation optimization model for solving the same optimization problem, shows the former only needs 5.5 hours, and the latter needs 25 days. The above results indicate that the surrogate model developed in this study could not only considerably reduce the computational burden of the simulation optimization process, but also maintain high computational accuracy. This can thus provide an effective method for identifying an optimal groundwater exploitation scheme quickly and accurately. PMID:26264008

  5. A computational approach to compare regression modelling strategies in prediction research.

    PubMed

    Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H

    2016-08-25

    It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.

  6. Reader reaction to "a robust method for estimating optimal treatment regimes" by Zhang et al. (2012).

    PubMed

    Taylor, Jeremy M G; Cheng, Wenting; Foster, Jared C

    2015-03-01

    A recent article (Zhang et al., 2012, Biometrics 168, 1010-1018) compares regression based and inverse probability based methods of estimating an optimal treatment regime and shows for a small number of covariates that inverse probability weighted methods are more robust to model misspecification than regression methods. We demonstrate that using models that fit the data better reduces the concern about non-robustness for the regression methods. We extend the simulation study of Zhang et al. (2012, Biometrics 168, 1010-1018), also considering the situation of a larger number of covariates, and show that incorporating random forests into both regression and inverse probability weighted based methods improves their properties. © 2014, The International Biometric Society.

  7. A hybrid PSO-SVM-based method for predicting the friction coefficient between aircraft tire and coating

    NASA Astrophysics Data System (ADS)

    Zhan, Liwei; Li, Chengwei

    2017-02-01

    A hybrid PSO-SVM-based model is proposed to predict the friction coefficient between aircraft tire and coating. The presented hybrid model combines a support vector machine (SVM) with particle swarm optimization (PSO) technique. SVM has been adopted to solve regression problems successfully. Its regression accuracy is greatly related to optimizing parameters such as the regularization constant C , the parameter gamma γ corresponding to RBF kernel and the epsilon parameter \\varepsilon in the SVM training procedure. However, the friction coefficient which is predicted based on SVM has yet to be explored between aircraft tire and coating. The experiment reveals that drop height and tire rotational speed are the factors affecting friction coefficient. Bearing in mind, the friction coefficient can been predicted using the hybrid PSO-SVM-based model by the measured friction coefficient between aircraft tire and coating. To compare regression accuracy, a grid search (GS) method and a genetic algorithm (GA) are used to optimize the relevant parameters (C , γ and \\varepsilon ), respectively. The regression accuracy could be reflected by the coefficient of determination ({{R}2} ). The result shows that the hybrid PSO-RBF-SVM-based model has better accuracy compared with the GS-RBF-SVM- and GA-RBF-SVM-based models. The agreement of this model (PSO-RBF-SVM) with experiment data confirms its good performance.

  8. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  9. Predictive and mechanistic multivariate linear regression models for reaction development

    PubMed Central

    Santiago, Celine B.; Guo, Jing-Yao

    2018-01-01

    Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711

  10. Improved model of the retardance in citric acid coated ferrofluids using stepwise regression

    NASA Astrophysics Data System (ADS)

    Lin, J. F.; Qiu, X. R.

    2017-06-01

    Citric acid (CA) coated Fe3O4 ferrofluids (FFs) have been conducted for biomedical application. The magneto-optical retardance of CA coated FFs was measured by a Stokes polarimeter. Optimization and multiple regression of retardance in FFs were executed by Taguchi method and Microsoft Excel previously, and the F value of regression model was large enough. However, the model executed by Excel was not systematic. Instead we adopted the stepwise regression to model the retardance of CA coated FFs. From the results of stepwise regression by MATLAB, the developed model had highly predictable ability owing to F of 2.55897e+7 and correlation coefficient of one. The average absolute error of predicted retardances to measured retardances was just 0.0044%. Using the genetic algorithm (GA) in MATLAB, the optimized parametric combination was determined as [4.709 0.12 39.998 70.006] corresponding to the pH of suspension, molar ratio of CA to Fe3O4, CA volume, and coating temperature. The maximum retardance was found as 31.712°, close to that obtained by evolutionary solver in Excel and a relative error of -0.013%. Above all, the stepwise regression method was successfully used to model the retardance of CA coated FFs, and the maximum global retardance was determined by the use of GA.

  11. Efficient design of gain-flattened multi-pump Raman fiber amplifiers using least squares support vector regression

    NASA Astrophysics Data System (ADS)

    Chen, Jing; Qiu, Xiaojie; Yin, Cunyi; Jiang, Hao

    2018-02-01

    An efficient method to design the broadband gain-flattened Raman fiber amplifier with multiple pumps is proposed based on least squares support vector regression (LS-SVR). A multi-input multi-output LS-SVR model is introduced to replace the complicated solving process of the nonlinear coupled Raman amplification equation. The proposed approach contains two stages: offline training stage and online optimization stage. During the offline stage, the LS-SVR model is trained. Owing to the good generalization capability of LS-SVR, the net gain spectrum can be directly and accurately obtained when inputting any combination of the pump wavelength and power to the well-trained model. During the online stage, we incorporate the LS-SVR model into the particle swarm optimization algorithm to find the optimal pump configuration. The design results demonstrate that the proposed method greatly shortens the computation time and enhances the efficiency of the pump parameter optimization for Raman fiber amplifier design.

  12. NARMAX model identification of a palm oil biodiesel engine using multi-objective optimization differential evolution

    NASA Astrophysics Data System (ADS)

    Mansor, Zakwan; Zakaria, Mohd Zakimi; Nor, Azuwir Mohd; Saad, Mohd Sazli; Ahmad, Robiah; Jamaluddin, Hishamuddin

    2017-09-01

    This paper presents the black-box modelling of palm oil biodiesel engine (POB) using multi-objective optimization differential evolution (MOODE) algorithm. Two objective functions are considered in the algorithm for optimization; minimizing the number of term of a model structure and minimizing the mean square error between actual and predicted outputs. The mathematical model used in this study to represent the POB system is nonlinear auto-regressive moving average with exogenous input (NARMAX) model. Finally, model validity tests are applied in order to validate the possible models that was obtained from MOODE algorithm and lead to select an optimal model.

  13. Efficient Regressions via Optimally Combining Quantile Information*

    PubMed Central

    Zhao, Zhibiao; Xiao, Zhijie

    2014-01-01

    We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481

  14. Optimization of Bioethanol Production Using Whole Plant of Water Hyacinth as Substrate in Simultaneous Saccharification and Fermentation Process

    PubMed Central

    Zhang, Qiuzhuo; Weng, Chen; Huang, Huiqin; Achal, Varenyam; Wang, Duanchao

    2016-01-01

    Water hyacinth was used as substrate for bioethanol production in the present study. Combination of acid pretreatment and enzymatic hydrolysis was the most effective process for sugar production that resulted in the production of 402.93 mg reducing sugar at optimal condition. A regression model was built to optimize the fermentation factors according to response surface method in saccharification and fermentation (SSF) process. The optimized condition for ethanol production by SSF process was fermented at 38.87°C in 81.87 h when inoculated with 6.11 ml yeast, where 1.291 g/L bioethanol was produced. Meanwhile, 1.289 g/L ethanol was produced during experimentation, which showed reliability of presented regression model in this research. The optimization method discussed in the present study leading to relatively high bioethanol production could provide a promising way for Alien Invasive Species with high cellulose content. PMID:26779125

  15. Optimization of fixture layouts of glass laser optics using multiple kernel regression.

    PubMed

    Su, Jianhua; Cao, Enhua; Qiao, Hong

    2014-05-10

    We aim to build an integrated fixturing model to describe the structural properties and thermal properties of the support frame of glass laser optics. Therefore, (a) a near global optimal set of clamps can be computed to minimize the surface shape error of the glass laser optic based on the proposed model, and (b) a desired surface shape error can be obtained by adjusting the clamping forces under various environmental temperatures based on the model. To construct the model, we develop a new multiple kernel learning method and call it multiple kernel support vector functional regression. The proposed method uses two layer regressions to group and order the data sources by the weights of the kernels and the factors of the layers. Because of that, the influences of the clamps and the temperature can be evaluated by grouping them into different layers.

  16. A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary

    NASA Astrophysics Data System (ADS)

    Gillis, Nicolas; Luce, Robert

    2018-01-01

    A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.

  17. QSRR modeling for diverse drugs using different feature selection methods coupled with linear and nonlinear regressions.

    PubMed

    Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan

    2012-12-01

    A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. Image interpolation via regularized local linear regression.

    PubMed

    Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang

    2011-12-01

    The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE

  19. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert; Bader, Jon B.

    2009-01-01

    Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.

  20. [Optimization of processing technology for semen cuscuta by uniform and regression analysis].

    PubMed

    Li, Chun-yu; Luo, Hui-yu; Wang, Shu; Zhai, Ya-nan; Tian, Shu-hui; Zhang, Dan-shen

    2011-02-01

    To optimize the best preparation technology for the contains of total flavornoids, polysaccharides, the percentage of water and alcohol-soluble components in Semen Cuscuta herb processing. UV-spectrophotometry was applied to determine the contains of total flavornoids and polysaccharides, which were extracted from Semen Cuscuta. And the processing was optimized by the way of uniform design and contour map. The best preparation technology was satisfied with some conditions as follows: baking temperature 150 degrees C, baking time 140 seconds. The regression models are notable and reasonable, which can forecast results precisely.

  1. Parametric optimization of multiple quality characteristics in laser cutting of Inconel-718 by using hybrid approach of multiple regression analysis and genetic algorithm

    NASA Astrophysics Data System (ADS)

    Shrivastava, Prashant Kumar; Pandey, Arun Kumar

    2018-06-01

    Inconel-718 has found high demand in different industries due to their superior mechanical properties. The traditional cutting methods are facing difficulties for cutting these alloys due to their low thermal potential, lower elasticity and high chemical compatibility at inflated temperature. The challenges of machining and/or finishing of unusual shapes and/or sizes in these materials have also faced by traditional machining. Laser beam cutting may be applied for the miniaturization and ultra-precision cutting and/or finishing by appropriate control of different process parameter. This paper present multi-objective optimization the kerf deviation, kerf width and kerf taper in the laser cutting of Incone-718 sheet. The second order regression models have been developed for different quality characteristics by using the experimental data obtained through experimentation. The regression models have been used as objective function for multi-objective optimization based on the hybrid approach of multiple regression analysis and genetic algorithm. The comparison of optimization results to experimental results shows an improvement of 88%, 10.63% and 42.15% in kerf deviation, kerf width and kerf taper, respectively. Finally, the effects of different process parameters on quality characteristics have also been discussed.

  2. Network anomaly detection system with optimized DS evidence theory.

    PubMed

    Liu, Yuan; Wang, Xiaofeng; Liu, Kaiyu

    2014-01-01

    Network anomaly detection has been focused on by more people with the fast development of computer network. Some researchers utilized fusion method and DS evidence theory to do network anomaly detection but with low performance, and they did not consider features of network-complicated and varied. To achieve high detection rate, we present a novel network anomaly detection system with optimized Dempster-Shafer evidence theory (ODS) and regression basic probability assignment (RBPA) function. In this model, we add weights for each sensor to optimize DS evidence theory according to its previous predict accuracy. And RBPA employs sensor's regression ability to address complex network. By four kinds of experiments, we find that our novel network anomaly detection model has a better detection rate, and RBPA as well as ODS optimization methods can improve system performance significantly.

  3. Network Anomaly Detection System with Optimized DS Evidence Theory

    PubMed Central

    Liu, Yuan; Wang, Xiaofeng; Liu, Kaiyu

    2014-01-01

    Network anomaly detection has been focused on by more people with the fast development of computer network. Some researchers utilized fusion method and DS evidence theory to do network anomaly detection but with low performance, and they did not consider features of network—complicated and varied. To achieve high detection rate, we present a novel network anomaly detection system with optimized Dempster-Shafer evidence theory (ODS) and regression basic probability assignment (RBPA) function. In this model, we add weights for each senor to optimize DS evidence theory according to its previous predict accuracy. And RBPA employs sensor's regression ability to address complex network. By four kinds of experiments, we find that our novel network anomaly detection model has a better detection rate, and RBPA as well as ODS optimization methods can improve system performance significantly. PMID:25254258

  4. The Naïve Overfitting Index Selection (NOIS): A new method to optimize model complexity for hyperspectral data

    NASA Astrophysics Data System (ADS)

    Rocha, Alby D.; Groen, Thomas A.; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Willemen, Louise

    2017-11-01

    The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that are only valid for the data used, and too complex to make inferences about the underlying process.

  5. Geodesic regression on orientation distribution functions with its application to an aging study.

    PubMed

    Du, Jia; Goh, Alvina; Kushnarev, Sergey; Qiu, Anqi

    2014-02-15

    In this paper, we treat orientation distribution functions (ODFs) derived from high angular resolution diffusion imaging (HARDI) as elements of a Riemannian manifold and present a method for geodesic regression on this manifold. In order to find the optimal regression model, we pose this as a least-squares problem involving the sum-of-squared geodesic distances between observed ODFs and their model fitted data. We derive the appropriate gradient terms and employ gradient descent to find the minimizer of this least-squares optimization problem. In addition, we show how to perform statistical testing for determining the significance of the relationship between the manifold-valued regressors and the real-valued regressands. Experiments on both synthetic and real human data are presented. In particular, we examine aging effects on HARDI via geodesic regression of ODFs in normal adults aged 22 years old and above. © 2013 Elsevier Inc. All rights reserved.

  6. Modeling Stationary Lithium-Ion Batteries for Optimization and Predictive Control

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, Kyri A; Shi, Ying; Christensen, Dane T

    Accurately modeling stationary battery storage behavior is crucial to understand and predict its limitations in demand-side management scenarios. In this paper, a lithium-ion battery model was derived to estimate lifetime and state-of-charge for building-integrated use cases. The proposed battery model aims to balance speed and accuracy when modeling battery behavior for real-time predictive control and optimization. In order to achieve these goals, a mixed modeling approach was taken, which incorporates regression fits to experimental data and an equivalent circuit to model battery behavior. A comparison of the proposed battery model output to actual data from the manufacturer validates the modelingmore » approach taken in the paper. Additionally, a dynamic test case demonstrates the effects of using regression models to represent internal resistance and capacity fading.« less

  7. Modeling Stationary Lithium-Ion Batteries for Optimization and Predictive Control: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Raszmann, Emma; Baker, Kyri; Shi, Ying

    Accurately modeling stationary battery storage behavior is crucial to understand and predict its limitations in demand-side management scenarios. In this paper, a lithium-ion battery model was derived to estimate lifetime and state-of-charge for building-integrated use cases. The proposed battery model aims to balance speed and accuracy when modeling battery behavior for real-time predictive control and optimization. In order to achieve these goals, a mixed modeling approach was taken, which incorporates regression fits to experimental data and an equivalent circuit to model battery behavior. A comparison of the proposed battery model output to actual data from the manufacturer validates the modelingmore » approach taken in the paper. Additionally, a dynamic test case demonstrates the effects of using regression models to represent internal resistance and capacity fading.« less

  8. [Quantitative structure-gas chromatographic retention relationship of polycyclic aromatic sulfur heterocycles using molecular electronegativity-distance vector].

    PubMed

    Li, Zhenghua; Cheng, Fansheng; Xia, Zhining

    2011-01-01

    The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.

  9. A fuzzy adaptive network approach to parameter estimation in cases where independent variables come from an exponential distribution

    NASA Astrophysics Data System (ADS)

    Dalkilic, Turkan Erbay; Apaydin, Aysen

    2009-11-01

    In a regression analysis, it is assumed that the observations come from a single class in a data cluster and the simple functional relationship between the dependent and independent variables can be expressed using the general model; Y=f(X)+[epsilon]. However; a data cluster may consist of a combination of observations that have different distributions that are derived from different clusters. When faced with issues of estimating a regression model for fuzzy inputs that have been derived from different distributions, this regression model has been termed the [`]switching regression model' and it is expressed with . Here li indicates the class number of each independent variable and p is indicative of the number of independent variables [J.R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transaction on Systems, Man and Cybernetics 23 (3) (1993) 665-685; M. Michel, Fuzzy clustering and switching regression models using ambiguity and distance rejects, Fuzzy Sets and Systems 122 (2001) 363-399; E.Q. Richard, A new approach to estimating switching regressions, Journal of the American Statistical Association 67 (338) (1972) 306-310]. In this study, adaptive networks have been used to construct a model that has been formed by gathering obtained models. There are methods that suggest the class numbers of independent variables heuristically. Alternatively, in defining the optimal class number of independent variables, the use of suggested validity criterion for fuzzy clustering has been aimed. In the case that independent variables have an exponential distribution, an algorithm has been suggested for defining the unknown parameter of the switching regression model and for obtaining the estimated values after obtaining an optimal membership function, which is suitable for exponential distribution.

  10. Using Quantile and Asymmetric Least Squares Regression for Optimal Risk Adjustment.

    PubMed

    Lorenz, Normann

    2017-06-01

    In this paper, we analyze optimal risk adjustment for direct risk selection (DRS). Integrating insurers' activities for risk selection into a discrete choice model of individuals' health insurance choice shows that DRS has the structure of a contest. For the contest success function (csf) used in most of the contest literature (the Tullock-csf), optimal transfers for a risk adjustment scheme have to be determined by means of a restricted quantile regression, irrespective of whether insurers are primarily engaged in positive DRS (attracting low risks) or negative DRS (repelling high risks). This is at odds with the common practice of determining transfers by means of a least squares regression. However, this common practice can be rationalized for a new csf, but only if positive and negative DRSs are equally important; if they are not, optimal transfers have to be calculated by means of a restricted asymmetric least squares regression. Using data from German and Swiss health insurers, we find considerable differences between the three types of regressions. Optimal transfers therefore critically depend on which csf represents insurers' incentives for DRS and, if it is not the Tullock-csf, whether insurers are primarily engaged in positive or negative DRS. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  11. Modeling Group Differences in OLS and Orthogonal Regression: Implications for Differential Validity Studies

    ERIC Educational Resources Information Center

    Kane, Michael T.; Mroch, Andrew A.

    2010-01-01

    In evaluating the relationship between two measures across different groups (i.e., in evaluating "differential validity") it is necessary to examine differences in correlation coefficients and in regression lines. Ordinary least squares (OLS) regression is the standard method for fitting lines to data, but its criterion for optimal fit…

  12. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

    PubMed

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  13. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    PubMed Central

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

  14. Modeling landslide susceptibility in data-scarce environments using optimized data mining and statistical methods

    NASA Astrophysics Data System (ADS)

    Lee, Jung-Hyun; Sameen, Maher Ibrahim; Pradhan, Biswajeet; Park, Hyuck-Jin

    2018-02-01

    This study evaluated the generalizability of five models to select a suitable approach for landslide susceptibility modeling in data-scarce environments. In total, 418 landslide inventories and 18 landslide conditioning factors were analyzed. Multicollinearity and factor optimization were investigated before data modeling, and two experiments were then conducted. In each experiment, five susceptibility maps were produced based on support vector machine (SVM), random forest (RF), weight-of-evidence (WoE), ridge regression (Rid_R), and robust regression (RR) models. The highest accuracy (AUC = 0.85) was achieved with the SVM model when either the full or limited landslide inventories were used. Furthermore, the RF and WoE models were severely affected when less landslide samples were used for training. The other models were affected slightly when the training samples were limited.

  15. Fast determination of total ginsenosides content in ginseng powder by near infrared reflectance spectroscopy

    NASA Astrophysics Data System (ADS)

    Chen, Hua-cai; Chen, Xing-dan; Lu, Yong-jun; Cao, Zhi-qiang

    2006-01-01

    Near infrared (NIR) reflectance spectroscopy was used to develop a fast determination method for total ginsenosides in Ginseng (Panax Ginseng) powder. The spectra were analyzed with multiplicative signal correction (MSC) correlation method. The best correlative spectra region with the total ginsenosides content was 1660 nm~1880 nm and 2230nm~2380 nm. The NIR calibration models of ginsenosides were built with multiple linear regression (MLR), principle component regression (PCR) and partial least squares (PLS) regression respectively. The results showed that the calibration model built with PLS combined with MSC and the optimal spectrum region was the best one. The correlation coefficient and the root mean square error of correction validation (RMSEC) of the best calibration model were 0.98 and 0.15% respectively. The optimal spectrum region for calibration was 1204nm~2014nm. The result suggested that using NIR to rapidly determinate the total ginsenosides content in ginseng powder were feasible.

  16. An efficient surrogate-based simulation-optimization method for calibrating a regional MODFLOW model

    NASA Astrophysics Data System (ADS)

    Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.

    2017-01-01

    Simulation-optimization method entails a large number of model simulations, which is computationally intensive or even prohibitive if the model simulation is extremely time-consuming. Statistical models have been examined as a surrogate of the high-fidelity physical model during simulation-optimization process to tackle this problem. Among them, Multivariate Adaptive Regression Splines (MARS), a non-parametric adaptive regression method, is superior in overcoming problems of high-dimensions and discontinuities of the data. Furthermore, the stability and accuracy of MARS model can be improved by bootstrap aggregating methods, namely, bagging. In this paper, Bagging MARS (BMARS) method is integrated to a surrogate-based simulation-optimization framework to calibrate a three-dimensional MODFLOW model, which is developed to simulate the groundwater flow in an arid hardrock-alluvium region in northwestern Oman. The physical MODFLOW model is surrogated by the statistical model developed using BMARS algorithm. The surrogate model, which is fitted and validated using training dataset generated by the physical model, can approximate solutions rapidly. An efficient Sobol' method is employed to calculate global sensitivities of head outputs to input parameters, which are used to analyze their importance for the model outputs spatiotemporally. Only sensitive parameters are included in the calibration process to further improve the computational efficiency. Normalized root mean square error (NRMSE) between measured and simulated heads at observation wells is used as the objective function to be minimized during optimization. The reasonable history match between the simulated and observed heads demonstrated feasibility of this high-efficient calibration framework.

  17. Prediction of silicon oxynitride plasma etching using a generalized regression neural network

    NASA Astrophysics Data System (ADS)

    Kim, Byungwhan; Lee, Byung Teak

    2005-08-01

    A prediction model of silicon oxynitride (SiON) etching was constructed using a neural network. Model prediction performance was improved by means of genetic algorithm. The etching was conducted in a C2F6 inductively coupled plasma. A 24 full factorial experiment was employed to systematically characterize parameter effects on SiON etching. The process parameters include radio frequency source power, bias power, pressure, and C2F6 flow rate. To test the appropriateness of the trained model, additional 16 experiments were conducted. For comparison, four types of statistical regression models were built. Compared to the best regression model, the optimized neural network model demonstrated an improvement of about 52%. The optimized model was used to infer etch mechanisms as a function of parameters. The pressure effect was noticeably large only as relatively large ion bombardment was maintained in the process chamber. Ion-bombardment-activated polymer deposition played the most significant role in interpreting the complex effect of bias power or C2F6 flow rate. Moreover, [CF2] was expected to be the predominant precursor to polymer deposition.

  18. Harmony search optimization in dimensional accuracy of die sinking EDM process using SS316L stainless steel

    NASA Astrophysics Data System (ADS)

    Deris, A. M.; Zain, A. M.; Sallehuddin, R.; Sharif, S.

    2017-09-01

    Electric discharge machine (EDM) is one of the widely used nonconventional machining processes for hard and difficult to machine materials. Due to the large number of machining parameters in EDM and its complicated structural, the selection of the optimal solution of machining parameters for obtaining minimum machining performance is remain as a challenging task to the researchers. This paper proposed experimental investigation and optimization of machining parameters for EDM process on stainless steel 316L work piece using Harmony Search (HS) algorithm. The mathematical model was developed based on regression approach with four input parameters which are pulse on time, peak current, servo voltage and servo speed to the output response which is dimensional accuracy (DA). The optimal result of HS approach was compared with regression analysis and it was found HS gave better result y giving the most minimum DA value compared with regression approach.

  19. Using exploratory regression to identify optimal driving factors for cellular automaton modeling of land use change.

    PubMed

    Feng, Yongjiu; Tong, Xiaohua

    2017-09-22

    Defining transition rules is an important issue in cellular automaton (CA)-based land use modeling because these models incorporate highly correlated driving factors. Multicollinearity among correlated driving factors may produce negative effects that must be eliminated from the modeling. Using exploratory regression under pre-defined criteria, we identified all possible combinations of factors from the candidate factors affecting land use change. Three combinations that incorporate five driving factors meeting pre-defined criteria were assessed. With the selected combinations of factors, three logistic regression-based CA models were built to simulate dynamic land use change in Shanghai, China, from 2000 to 2015. For comparative purposes, a CA model with all candidate factors was also applied to simulate the land use change. Simulations using three CA models with multicollinearity eliminated performed better (with accuracy improvements about 3.6%) than the model incorporating all candidate factors. Our results showed that not all candidate factors are necessary for accurate CA modeling and the simulations were not sensitive to changes in statistically non-significant driving factors. We conclude that exploratory regression is an effective method to search for the optimal combinations of driving factors, leading to better land use change models that are devoid of multicollinearity. We suggest identification of dominant factors and elimination of multicollinearity before building land change models, making it possible to simulate more realistic outcomes.

  20. glmnetLRC f/k/a lrc package: Logistic Regression Classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2016-06-09

    Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold formore » binary classification.« less

  1. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  2. Cascade Optimization Strategy with Neural Network and Regression Approximations Demonstrated on a Preliminary Aircraft Engine Design

    NASA Technical Reports Server (NTRS)

    Hopkins, Dale A.; Patnaik, Surya N.

    2000-01-01

    A preliminary aircraft engine design methodology is being developed that utilizes a cascade optimization strategy together with neural network and regression approximation methods. The cascade strategy employs different optimization algorithms in a specified sequence. The neural network and regression methods are used to approximate solutions obtained from the NASA Engine Performance Program (NEPP), which implements engine thermodynamic cycle and performance analysis models. The new methodology is proving to be more robust and computationally efficient than the conventional optimization approach of using a single optimization algorithm with direct reanalysis. The methodology has been demonstrated on a preliminary design problem for a novel subsonic turbofan engine concept that incorporates a wave rotor as a cycle-topping device. Computations of maximum thrust were obtained for a specific design point in the engine mission profile. The results (depicted in the figure) show a significant improvement in the maximum thrust obtained using the new methodology in comparison to benchmark solutions obtained using NEPP in a manual design mode.

  3. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    NASA Technical Reports Server (NTRS)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  4. The analytical representation of viscoelastic material properties using optimization techniques

    NASA Technical Reports Server (NTRS)

    Hill, S. A.

    1993-01-01

    This report presents a technique to model viscoelastic material properties with a function of the form of the Prony series. Generally, the method employed to determine the function constants requires assuming values for the exponential constants of the function and then resolving the remaining constants through linear least-squares techniques. The technique presented here allows all the constants to be analytically determined through optimization techniques. This technique is employed in a computer program named PRONY and makes use of commercially available optimization tool developed by VMA Engineering, Inc. The PRONY program was utilized to compare the technique against previously determined models for solid rocket motor TP-H1148 propellant and V747-75 Viton fluoroelastomer. In both cases, the optimization technique generated functions that modeled the test data with at least an order of magnitude better correlation. This technique has demonstrated the capability to use small or large data sets and to use data sets that have uniformly or nonuniformly spaced data pairs. The reduction of experimental data to accurate mathematical models is a vital part of most scientific and engineering research. This technique of regression through optimization can be applied to other mathematical models that are difficult to fit to experimental data through traditional regression techniques.

  5. A Bayesian model averaging method for the derivation of reservoir operating rules

    NASA Astrophysics Data System (ADS)

    Zhang, Jingwen; Liu, Pan; Wang, Hao; Lei, Xiaohui; Zhou, Yanlai

    2015-09-01

    Because the intrinsic dynamics among optimal decision making, inflow processes and reservoir characteristics are complex, functional forms of reservoir operating rules are always determined subjectively. As a result, the uncertainty of selecting form and/or model involved in reservoir operating rules must be analyzed and evaluated. In this study, we analyze the uncertainty of reservoir operating rules using the Bayesian model averaging (BMA) model. Three popular operating rules, namely piecewise linear regression, surface fitting and a least-squares support vector machine, are established based on the optimal deterministic reservoir operation. These individual models provide three-member decisions for the BMA combination, enabling the 90% release interval to be estimated by the Markov Chain Monte Carlo simulation. A case study of China's the Baise reservoir shows that: (1) the optimal deterministic reservoir operation, superior to any reservoir operating rules, is used as the samples to derive the rules; (2) the least-squares support vector machine model is more effective than both piecewise linear regression and surface fitting; (3) BMA outperforms any individual model of operating rules based on the optimal trajectories. It is revealed that the proposed model can reduce the uncertainty of operating rules, which is of great potential benefit in evaluating the confidence interval of decisions.

  6. Optimal weighted combinatorial forecasting model of QT dispersion of ECGs in Chinese adults.

    PubMed

    Wen, Zhang; Miao, Ge; Xinlei, Liu; Minyi, Cen

    2016-07-01

    This study aims to provide a scientific basis for unifying the reference value standard of QT dispersion of ECGs in Chinese adults. Three predictive models including regression model, principal component model, and artificial neural network model are combined to establish the optimal weighted combination model. The optimal weighted combination model and single model are verified and compared. Optimal weighted combinatorial model can reduce predicting risk of single model and improve the predicting precision. The reference value of geographical distribution of Chinese adults' QT dispersion was precisely made by using kriging methods. When geographical factors of a particular area are obtained, the reference value of QT dispersion of Chinese adults in this area can be estimated by using optimal weighted combinatorial model and reference value of the QT dispersion of Chinese adults anywhere in China can be obtained by using geographical distribution figure as well.

  7. Optimizing landslide susceptibility zonation: Effects of DEM spatial resolution and slope unit delineation on logistic regression models

    NASA Astrophysics Data System (ADS)

    Schlögel, R.; Marchesini, I.; Alvioli, M.; Reichenbach, P.; Rossi, M.; Malet, J.-P.

    2018-01-01

    We perform landslide susceptibility zonation with slope units using three digital elevation models (DEMs) of varying spatial resolution of the Ubaye Valley (South French Alps). In so doing, we applied a recently developed algorithm automating slope unit delineation, given a number of parameters, in order to optimize simultaneously the partitioning of the terrain and the performance of a logistic regression susceptibility model. The method allowed us to obtain optimal slope units for each available DEM spatial resolution. For each resolution, we studied the susceptibility model performance by analyzing in detail the relevance of the conditioning variables. The analysis is based on landslide morphology data, considering either the whole landslide or only the source area outline as inputs. The procedure allowed us to select the most useful information, in terms of DEM spatial resolution, thematic variables and landslide inventory, in order to obtain the most reliable slope unit-based landslide susceptibility assessment.

  8. Wildlife tradeoffs based on landscape models of habitat preference

    USGS Publications Warehouse

    Loehle, C.; Mitchell, M.S.; White, M.

    2000-01-01

    Wildlife tradeoffs based on landscape models of habitat preference were presented. Multiscale logistic regression models were used and based on these models a spatial optimization technique was utilized to generate optimal maps. The tradeoffs were analyzed by gradually increasing the weighting on a single species in the objective function over a series of simulations. Results indicated that efficiency of habitat management for species diversity could be maximized for small landscapes by incorporating spatial context.

  9. Retargeted Least Squares Regression Algorithm.

    PubMed

    Zhang, Xu-Yao; Wang, Lingfeng; Xiang, Shiming; Liu, Cheng-Lin

    2015-09-01

    This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.

  10. Multi-objective optimization of combustion, performance and emission parameters in a jatropha biodiesel engine using Non-dominated sorting genetic algorithm-II

    NASA Astrophysics Data System (ADS)

    Dhingra, Sunil; Bhushan, Gian; Dubey, Kashyap Kumar

    2014-03-01

    The present work studies and identifies the different variables that affect the output parameters involved in a single cylinder direct injection compression ignition (CI) engine using jatropha biodiesel. Response surface methodology based on Central composite design (CCD) is used to design the experiments. Mathematical models are developed for combustion parameters (Brake specific fuel consumption (BSFC) and peak cylinder pressure (Pmax)), performance parameter brake thermal efficiency (BTE) and emission parameters (CO, NO x , unburnt HC and smoke) using regression techniques. These regression equations are further utilized for simultaneous optimization of combustion (BSFC, Pmax), performance (BTE) and emission (CO, NO x , HC, smoke) parameters. As the objective is to maximize BTE and minimize BSFC, Pmax, CO, NO x , HC, smoke, a multiobjective optimization problem is formulated. Nondominated sorting genetic algorithm-II is used in predicting the Pareto optimal sets of solution. Experiments are performed at suitable optimal solutions for predicting the combustion, performance and emission parameters to check the adequacy of the proposed model. The Pareto optimal sets of solution can be used as guidelines for the end users to select optimal combination of engine output and emission parameters depending upon their own requirements.

  11. Penalized nonparametric scalar-on-function regression via principal coordinates

    PubMed Central

    Reiss, Philip T.; Miller, David L.; Wu, Pei-Shien; Hua, Wen-Yu

    2016-01-01

    A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression, one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, principal coordinate ridge regression, with dynamic time warping distance used to define the principal coordinates, is shown to outperform a functional generalized linear model. PMID:29217963

  12. Numerically accurate computational techniques for optimal estimator analyses of multi-parameter models

    NASA Astrophysics Data System (ADS)

    Berger, Lukas; Kleinheinz, Konstantin; Attili, Antonio; Bisetti, Fabrizio; Pitsch, Heinz; Mueller, Michael E.

    2018-05-01

    Modelling unclosed terms in partial differential equations typically involves two steps: First, a set of known quantities needs to be specified as input parameters for a model, and second, a specific functional form needs to be defined to model the unclosed terms by the input parameters. Both steps involve a certain modelling error, with the former known as the irreducible error and the latter referred to as the functional error. Typically, only the total modelling error, which is the sum of functional and irreducible error, is assessed, but the concept of the optimal estimator enables the separate analysis of the total and the irreducible errors, yielding a systematic modelling error decomposition. In this work, attention is paid to the techniques themselves required for the practical computation of irreducible errors. Typically, histograms are used for optimal estimator analyses, but this technique is found to add a non-negligible spurious contribution to the irreducible error if models with multiple input parameters are assessed. Thus, the error decomposition of an optimal estimator analysis becomes inaccurate, and misleading conclusions concerning modelling errors may be drawn. In this work, numerically accurate techniques for optimal estimator analyses are identified and a suitable evaluation of irreducible errors is presented. Four different computational techniques are considered: a histogram technique, artificial neural networks, multivariate adaptive regression splines, and an additive model based on a kernel method. For multiple input parameter models, only artificial neural networks and multivariate adaptive regression splines are found to yield satisfactorily accurate results. Beyond a certain number of input parameters, the assessment of models in an optimal estimator analysis even becomes practically infeasible if histograms are used. The optimal estimator analysis in this paper is applied to modelling the filtered soot intermittency in large eddy simulations using a dataset of a direct numerical simulation of a non-premixed sooting turbulent flame.

  13. Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States

    NASA Astrophysics Data System (ADS)

    Yang, J.; Astitha, M.; Schwartz, C. S.

    2017-12-01

    Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.

  14. Nonconvex Sparse Logistic Regression With Weakly Convex Regularization

    NASA Astrophysics Data System (ADS)

    Shen, Xinyue; Gu, Yuantao

    2018-06-01

    In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem. The idea is based on the finding that a weakly convex function as an approximation of the $\\ell_0$ pseudo norm is able to better induce sparsity than the commonly used $\\ell_1$ norm. For a class of weakly convex sparsity inducing functions, we prove the nonconvexity of the corresponding sparse logistic regression problem, and study its local optimality conditions and the choice of the regularization parameter to exclude trivial solutions. Despite the nonconvexity, a method based on proximal gradient descent is used to solve the general weakly convex sparse logistic regression, and its convergence behavior is studied theoretically. Then the general framework is applied to a specific weakly convex function, and a necessary and sufficient local optimality condition is provided. The solution method is instantiated in this case as an iterative firm-shrinkage algorithm, and its effectiveness is demonstrated in numerical experiments by both randomly generated and real datasets.

  15. Q-learning residual analysis: application to the effectiveness of sequences of antipsychotic medications for patients with schizophrenia.

    PubMed

    Ertefaie, Ashkan; Shortreed, Susan; Chakraborty, Bibhas

    2016-06-15

    Q-learning is a regression-based approach that uses longitudinal data to construct dynamic treatment regimes, which are sequences of decision rules that use patient information to inform future treatment decisions. An optimal dynamic treatment regime is composed of a sequence of decision rules that indicate how to optimally individualize treatment using the patients' baseline and time-varying characteristics to optimize the final outcome. Constructing optimal dynamic regimes using Q-learning depends heavily on the assumption that regression models at each decision point are correctly specified; yet model checking in the context of Q-learning has been largely overlooked in the current literature. In this article, we show that residual plots obtained from standard Q-learning models may fail to adequately check the quality of the model fit. We present a modified Q-learning procedure that accommodates residual analyses using standard tools. We present simulation studies showing the advantage of the proposed modification over standard Q-learning. We illustrate this new Q-learning approach using data collected from a sequential multiple assignment randomized trial of patients with schizophrenia. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. Development of Optimal Stressor Scenarios for New Operational Energy Systems

    DTIC Science & Technology

    2017-12-01

    Analyzing the previous model using a design of experiments (DOE) and regression analysis provides critical information about the associated operational...from experimentation. The resulting system requirements can be used to revisit the design requirements and develop a more robust system. This process...stressor scenarios for acceptance testing. Analyzing the previous model using a design of experiments (DOE) and regression analysis provides critical

  17. Aircraft Engine Thrust Estimator Design Based on GSA-LSSVM

    NASA Astrophysics Data System (ADS)

    Sheng, Hanlin; Zhang, Tianhong

    2017-08-01

    In view of the necessity of highly precise and reliable thrust estimator to achieve direct thrust control of aircraft engine, based on support vector regression (SVR), as well as least square support vector machine (LSSVM) and a new optimization algorithm - gravitational search algorithm (GSA), by performing integrated modelling and parameter optimization, a GSA-LSSVM-based thrust estimator design solution is proposed. The results show that compared to particle swarm optimization (PSO) algorithm, GSA can find unknown optimization parameter better and enables the model developed with better prediction and generalization ability. The model can better predict aircraft engine thrust and thus fulfills the need of direct thrust control of aircraft engine.

  18. Application of mathematical model methods for optimization tasks in construction materials technology

    NASA Astrophysics Data System (ADS)

    Fomina, E. V.; Kozhukhova, N. I.; Sverguzova, S. V.; Fomin, A. E.

    2018-05-01

    In this paper, the regression equations method for design of construction material was studied. Regression and polynomial equations representing the correlation between the studied parameters were proposed. The logic design and software interface of the regression equations method focused on parameter optimization to provide the energy saving effect at the stage of autoclave aerated concrete design considering the replacement of traditionally used quartz sand by coal mining by-product such as argillite. The mathematical model represented by a quadric polynomial for the design of experiment was obtained using calculated and experimental data. This allowed the estimation of relationship between the composition and final properties of the aerated concrete. The surface response graphically presented in a nomogram allowed the estimation of concrete properties in response to variation of composition within the x-space. The optimal range of argillite content was obtained leading to a reduction of raw materials demand, development of target plastic strength of aerated concrete as well as a reduction of curing time before autoclave treatment. Generally, this method allows the design of autoclave aerated concrete with required performance without additional resource and time costs.

  19. Hybrid Support Vector Regression and Autoregressive Integrated Moving Average Models Improved by Particle Swarm Optimization for Property Crime Rates Forecasting with Economic Indicators

    PubMed Central

    Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina

    2013-01-01

    Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729

  20. Hybrid support vector regression and autoregressive integrated moving average models improved by particle swarm optimization for property crime rates forecasting with economic indicators.

    PubMed

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina

    2013-01-01

    Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.

  1. Mathematical Modelling of Optimization of Structures of Monolithic Coverings Based on Liquid Rubbers

    NASA Astrophysics Data System (ADS)

    Turgumbayeva, R. Kh; Abdikarimov, M. N.; Mussabekov, R.; Sartayev, D. T.

    2018-05-01

    The paper considers optimization of monolithic coatings compositions using a computer and MPE methods. The goal of the paper was to construct a mathematical model of the complete factorial experiment taking into account its plan and conditions. Several regression equations were received. Dependence between content components and parameters of rubber, as well as the quantity of a rubber crumb, was considered. An optimal composition for manufacturing the material of monolithic coatings compositions was recommended based on experimental data.

  2. Optimism on quality of life in Portuguese chronic patients: moderator/mediator?

    PubMed

    Vilhena, Estela; Pais-Ribeiro, José; Silva, Isabel; Pedro, Luísa; Meneses, Rute F; Cardoso, Helena; Silva, António Martins da; Mendonça, Denisa

    2014-07-01

    optimism is an important variable that has consistently been shown to affect adjustment to quality of life in chronic diseases. This study aims to clarify if dispositional optimism exerts a moderating or a mediating influence on the personality traits-quality of life association, in Portuguese chronic patients. multiple regression models were used to test the moderation and mediation effects of dispositional optimism in quality of life. A sample of 729 patients was recruited in Portugal's main hospitals and completed self-reported questionnaires assessing socio-demographic and clinical variables, personality, dispositional optimism, quality of life (QoL) and subjective well-being (SWB). the results of the regression models showed that dispositional optimism did not moderate the relationships between personality traits and quality of life. After controlling for gender, age, education level and severity of disease perception, the effects of personality traits on QoL and in SWB were mediated by dispositional optimism (partially and completely), except for the links between neuroticism/openness to experience and physical health. dispositional optimism is more likely to play a mediating, rather than a moderating role in personality traits-quality of life pathway in Portuguese chronic patients, suggesting that "the expectation that good things will happen" contributes to a better quality of life and subjective well-being.

  3. Wavelet regression model in forecasting crude oil price

    NASA Astrophysics Data System (ADS)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  4. A Short-Term and High-Resolution System Load Forecasting Approach Using Support Vector Regression with Hybrid Parameters Optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, Huaiguang

    This work proposes an approach for distribution system load forecasting, which aims to provide highly accurate short-term load forecasting with high resolution utilizing a support vector regression (SVR) based forecaster and a two-step hybrid parameters optimization method. Specifically, because the load profiles in distribution systems contain abrupt deviations, a data normalization is designed as the pretreatment for the collected historical load data. Then an SVR model is trained by the load data to forecast the future load. For better performance of SVR, a two-step hybrid optimization algorithm is proposed to determine the best parameters. In the first step of themore » hybrid optimization algorithm, a designed grid traverse algorithm (GTA) is used to narrow the parameters searching area from a global to local space. In the second step, based on the result of the GTA, particle swarm optimization (PSO) is used to determine the best parameters in the local parameter space. After the best parameters are determined, the SVR model is used to forecast the short-term load deviation in the distribution system.« less

  5. Modeling the milling tool wear by using an evolutionary SVM-based model from milling runs experimental data

    NASA Astrophysics Data System (ADS)

    Nieto, Paulino José García; García-Gonzalo, Esperanza; Vilán, José Antonio Vilán; Robleda, Abraham Segade

    2015-12-01

    The main aim of this research work is to build a new practical hybrid regression model to predict the milling tool wear in a regular cut as well as entry cut and exit cut of a milling tool. The model was based on Particle Swarm Optimization (PSO) in combination with support vector machines (SVMs). This optimization mechanism involved kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. Bearing this in mind, a PSO-SVM-based model, which is based on the statistical learning theory, was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. To accomplish the objective of this study, the experimental dataset represents experiments from runs on a milling machine under various operating conditions. In this way, data sampled by three different types of sensors (acoustic emission sensor, vibration sensor and current sensor) were acquired at several positions. A second aim is to determine the factors with the greatest bearing on the milling tool flank wear with a view to proposing milling machine's improvements. Firstly, this hybrid PSO-SVM-based regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the flank wear (output variable) and input variables (time, depth of cut, feed, etc.). Indeed, regression with optimal hyperparameters was performed and a determination coefficient of 0.95 was obtained. The agreement of this model with experimental data confirmed its good performance. Secondly, the main advantages of this PSO-SVM-based model are its capacity to produce a simple, easy-to-interpret model, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, the main conclusions of this study are exposed.

  6. Stochastic search, optimization and regression with energy applications

    NASA Astrophysics Data System (ADS)

    Hannah, Lauren A.

    Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.

  7. Membrane Introduction Mass Spectrometry Combined with an Orthogonal Partial-Least Squares Calibration Model for Mixture Analysis.

    PubMed

    Li, Min; Zhang, Lu; Yao, Xiaolong; Jiang, Xingyu

    2017-01-01

    The emerging membrane introduction mass spectrometry technique has been successfully used to detect benzene, toluene, ethyl benzene and xylene (BTEX), while overlapped spectra have unfortunately hindered its further application to the analysis of mixtures. Multivariate calibration, an efficient method to analyze mixtures, has been widely applied. In this paper, we compared univariate and multivariate analyses for quantification of the individual components of mixture samples. The results showed that the univariate analysis creates poor models with regression coefficients of 0.912, 0.867, 0.440 and 0.351 for BTEX, respectively. For multivariate analysis, a comparison to the partial-least squares (PLS) model shows that the orthogonal partial-least squares (OPLS) regression exhibits an optimal performance with regression coefficients of 0.995, 0.999, 0.980 and 0.976, favorable calibration parameters (RMSEC and RMSECV) and a favorable validation parameter (RMSEP). Furthermore, the OPLS exhibits a good recovery of 73.86 - 122.20% and relative standard deviation (RSD) of the repeatability of 1.14 - 4.87%. Thus, MIMS coupled with the OPLS regression provides an optimal approach for a quantitative BTEX mixture analysis in monitoring and predicting water pollution.

  8. Estimating suspended sediment load with multivariate adaptive regression spline, teaching-learning based optimization, and artificial bee colony models.

    PubMed

    Yilmaz, Banu; Aras, Egemen; Nacar, Sinan; Kankal, Murat

    2018-05-23

    The functional life of a dam is often determined by the rate of sediment delivery to its reservoir. Therefore, an accurate estimate of the sediment load in rivers with dams is essential for designing and predicting a dam's useful lifespan. The most credible method is direct measurements of sediment input, but this can be very costly and it cannot always be implemented at all gauging stations. In this study, we tested various regression models to estimate suspended sediment load (SSL) at two gauging stations on the Çoruh River in Turkey, including artificial bee colony (ABC), teaching-learning-based optimization algorithm (TLBO), and multivariate adaptive regression splines (MARS). These models were also compared with one another and with classical regression analyses (CRA). Streamflow values and previously collected data of SSL were used as model inputs with predicted SSL data as output. Two different training and testing dataset configurations were used to reinforce the model accuracy. For the MARS method, the root mean square error value was found to range between 35% and 39% for the test two gauging stations, which was lower than errors for other models. Error values were even lower (7% to 15%) using another dataset. Our results indicate that simultaneous measurements of streamflow with SSL provide the most effective parameter for obtaining accurate predictive models and that MARS is the most accurate model for predicting SSL. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach.

    PubMed

    Duarte, Belmiro P M; Wong, Weng Kee

    2015-08-01

    This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted.

  10. Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach

    PubMed Central

    Duarte, Belmiro P. M.; Wong, Weng Kee

    2014-01-01

    Summary This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted. PMID:26512159

  11. Population heterogeneity in the salience of multiple risk factors for adolescent delinquency.

    PubMed

    Lanza, Stephanie T; Cooper, Brittany R; Bray, Bethany C

    2014-03-01

    To present mixture regression analysis as an alternative to more standard regression analysis for predicting adolescent delinquency. We demonstrate how mixture regression analysis allows for the identification of population subgroups defined by the salience of multiple risk factors. We identified population subgroups (i.e., latent classes) of individuals based on their coefficients in a regression model predicting adolescent delinquency from eight previously established risk indices drawn from the community, school, family, peer, and individual levels. The study included N = 37,763 10th-grade adolescents who participated in the Communities That Care Youth Survey. Standard, zero-inflated, and mixture Poisson and negative binomial regression models were considered. Standard and mixture negative binomial regression models were selected as optimal. The five-class regression model was interpreted based on the class-specific regression coefficients, indicating that risk factors had varying salience across classes of adolescents. Standard regression showed that all risk factors were significantly associated with delinquency. Mixture regression provided more nuanced information, suggesting a unique set of risk factors that were salient for different subgroups of adolescents. Implications for the design of subgroup-specific interventions are discussed. Copyright © 2014 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  12. Incorporation of prior information on parameters into nonlinear regression groundwater flow models: 1. Theory

    USGS Publications Warehouse

    Cooley, Richard L.

    1982-01-01

    Prior information on the parameters of a groundwater flow model can be used to improve parameter estimates obtained from nonlinear regression solution of a modeling problem. Two scales of prior information can be available: (1) prior information having known reliability (that is, bias and random error structure) and (2) prior information consisting of best available estimates of unknown reliability. A regression method that incorporates the second scale of prior information assumes the prior information to be fixed for any particular analysis to produce improved, although biased, parameter estimates. Approximate optimization of two auxiliary parameters of the formulation is used to help minimize the bias, which is almost always much smaller than that resulting from standard ridge regression. It is shown that if both scales of prior information are available, then a combined regression analysis may be made.

  13. Optimizing pulsed Nd:YAG laser beam welding process parameters to attain maximum ultimate tensile strength for thin AISI316L sheet using response surface methodology and simulated annealing algorithm

    NASA Astrophysics Data System (ADS)

    Torabi, Amir; Kolahan, Farhad

    2018-07-01

    Pulsed laser welding is a powerful technique especially suitable for joining thin sheet metals. In this study, based on experimental data, pulsed laser welding of thin AISI316L austenitic stainless steel sheet has been modeled and optimized. The experimental data required for modeling are gathered as per Central Composite Design matrix in Response Surface Methodology (RSM) with full replication of 31 runs. Ultimate Tensile Strength (UTS) is considered as the main quality measure in laser welding. Furthermore, the important process parameters including peak power, pulse duration, pulse frequency and welding speed are selected as input process parameters. The relation between input parameters and the output response is established via full quadratic response surface regression with confidence level of 95%. The adequacy of the regression model was verified using Analysis of Variance technique results. The main effects of each factor and the interactions effects with other factors were analyzed graphically in contour and surface plot. Next, to maximum joint UTS, the best combinations of parameters levels were specified using RSM. Moreover, the mathematical model is implanted into a Simulated Annealing (SA) optimization algorithm to determine the optimal values of process parameters. The results obtained by both SA and RSM optimization techniques are in good agreement. The optimal parameters settings for peak power of 1800 W, pulse duration of 4.5 ms, frequency of 4.2 Hz and welding speed of 0.5 mm/s would result in a welded joint with 96% of the base metal UTS. Computational results clearly demonstrate that the proposed modeling and optimization procedures perform quite well for pulsed laser welding process.

  14. Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and Gene-Set Based Genetic Algorithm

    PubMed Central

    Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui

    2016-01-01

    Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. PMID:27610365

  15. Computation of nonlinear least squares estimator and maximum likelihood using principles in matrix calculus

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.

    2017-11-01

    This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation

  16. Genetic Programming Transforms in Linear Regression Situations

    NASA Astrophysics Data System (ADS)

    Castillo, Flor; Kordon, Arthur; Villa, Carlos

    The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.

  17. Mathematical model of optical signals emitted by electrical discharges occuring in electroinsulating oil

    NASA Astrophysics Data System (ADS)

    Kozioł, Michał

    2017-10-01

    The article presents a parametric model describing the registered distributions spectrum of optical radiation emitted by electrical discharges generated in the systems: the needle- needle, the needleplate and in the system for surface discharges. Generation of electrical discharges and registration of the emitted radiation was carried out in three different electrical insulating oils: fabric new, operated (used) and operated with air bubbles. For registration of optical spectra in the range of ultraviolet, visible and near infrared a high resolution spectrophotometer was. The proposed mathematical model was developed in a regression procedure using gauss-sigmoid type function. The dependent variable was the intensity of the recorded optical signals. In order to estimate the optimal parameters of the model an evolutionary algorithm was used. The optimization procedure was performed in Matlab environment. For determination of the matching quality of theoretical parameters of the regression function to the empirical data determination coefficient R2 was applied.

  18. A Novel Multiobjective Evolutionary Algorithm Based on Regression Analysis

    PubMed Central

    Song, Zhiming; Wang, Maocai; Dai, Guangming; Vasile, Massimiliano

    2015-01-01

    As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m − 1)-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA) is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m − 1)-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper. PMID:25874246

  19. Improving virtual screening predictive accuracy of Human kallikrein 5 inhibitors using machine learning models.

    PubMed

    Fang, Xingang; Bagui, Sikha; Bagui, Subhash

    2017-08-01

    The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Mammalian cell culture monitoring using in situ spectroscopy: Is your method really optimised?

    PubMed

    André, Silvère; Lagresle, Sylvain; Hannas, Zahia; Calvosa, Éric; Duponchel, Ludovic

    2017-03-01

    In recent years, as a result of the process analytical technology initiative of the US Food and Drug Administration, many different works have been carried out on direct and in situ monitoring of critical parameters for mammalian cell cultures by Raman spectroscopy and multivariate regression techniques. However, despite interesting results, it cannot be said that the proposed monitoring strategies, which will reduce errors of the regression models and thus confidence limits of the predictions, are really optimized. Hence, the aim of this article is to optimize some critical steps of spectroscopic acquisition and data treatment in order to reach a higher level of accuracy and robustness of bioprocess monitoring. In this way, we propose first an original strategy to assess the most suited Raman acquisition time for the processes involved. In a second part, we demonstrate the importance of the interbatch variability on the accuracy of the predictive models with a particular focus on the optical probes adjustment. Finally, we propose a methodology for the optimization of the spectral variables selection in order to decrease prediction errors of multivariate regressions. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:308-316, 2017. © 2017 American Institute of Chemical Engineers.

  1. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    NASA Astrophysics Data System (ADS)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  2. Neuro-fuzzy and neural network techniques for forecasting sea level in Darwin Harbor, Australia

    NASA Astrophysics Data System (ADS)

    Karimi, Sepideh; Kisi, Ozgur; Shiri, Jalal; Makarynskyy, Oleg

    2013-03-01

    Accurate predictions of sea level with different forecast horizons are important for coastal and ocean engineering applications, as well as in land drainage and reclamation studies. The methodology of tidal harmonic analysis, which is generally used for obtaining a mathematical description of the tides, is data demanding requiring processing of tidal observation collected over several years. In the present study, hourly sea levels for Darwin Harbor, Australia were predicted using two different, data driven techniques, adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN). Multi linear regression (MLR) technique was used for selecting the optimal input combinations (lag times) of hourly sea level. The input combination comprises current sea level as well as five previous level values found to be optimal. For the ANFIS models, five different membership functions namely triangular, trapezoidal, generalized bell, Gaussian and two Gaussian membership function were tested and employed for predicting sea level for the next 1 h, 24 h, 48 h and 72 h. The used ANN models were trained using three different algorithms, namely, Levenberg-Marquardt, conjugate gradient and gradient descent. Predictions of optimal ANFIS and ANN models were compared with those of the optimal auto-regressive moving average (ARMA) models. The coefficient of determination, root mean square error and variance account statistics were used as comparison criteria. The obtained results indicated that triangular membership function was optimal for predictions with the ANFIS models while adaptive learning rate and Levenberg-Marquardt were most suitable for training the ANN models. Consequently, ANFIS and ANN models gave similar forecasts and performed better than the developed for the same purpose ARMA models for all the prediction intervals.

  3. Modeling and process optimization of electrospinning of chitosan-collagen nanofiber by response surface methodology

    NASA Astrophysics Data System (ADS)

    Amiri, Nafise; Moradi, Ali; Abolghasem Sajjadi Tabasi, Sayyed; Movaffagh, Jebrail

    2018-04-01

    Chitosan-collagen composite nanofiber is of a great interest to researchers in biomedical fields. Since the electrospinning is the most popular method for nanofiber production, having a comprehensive knowledge of the electrospinning process is beneficial. Modeling techniques are precious tools for managing variables in the electrospinning process, prior to the more time- consuming and expensive experimental techniques. In this study, a central composite design of response surface methodology (RSM) was employed to develop a statistical model as well as to define the optimum condition for fabrication of chitosan-collagen nanofiber with minimum diameter. The individual and the interaction effects of applied voltage (10–25 kV), flow rate (0.5–1.5 mL h‑1), and needle to collector distance (15–25 cm) on the fiber diameter were investigated. ATR- FTIR and cell study were done to evaluate the optimized nanofibers. According to the RSM, a two-factor interaction (2FI) model was the most suitable model. The high regression coefficient value (R 2 ≥ 0.9666) of the fitted regression model and insignificant lack of fit (P = 0.0715) indicated that the model was highly adequate in predicting chitosan-collagen nanofiber diameter. The optimization process showed that the chitosan-collagen nanofiber diameter of 156.05 nm could be obtained in 9 kV, 0.2 ml h‑1, and 25 cm which was confirmed by experiment (155.92 ± 18.95 nm). The ATR-FTIR and cell study confirmed the structure and biocompatibility of the optimized membrane. The represented model could assist researchers in fabricating chitosan-collagen electrospun scaffolds with a predictable fiber diameter, and optimized chitosan-collagen nanofibrous mat could be a potential candidate for wound healing and tissue engineering.

  4. Combined genetic algorithm and multiple linear regression (GA-MLR) optimizer: Application to multi-exponential fluorescence decay surface.

    PubMed

    Fisz, Jacek J

    2006-12-07

    The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi-linear combinations of nonlinear functions, is indicated. The VP algorithm does not distinguish the weakly nonlinear parameters from the nonlinear ones and it does not apply to the model functions which are multi-linear combinations of nonlinear functions.

  5. Prospective Teachers' Future Time Perspective and Professional Plans about Teaching: The Mediating Role of Academic Optimism

    ERIC Educational Resources Information Center

    Eren, Altay

    2012-01-01

    This study aimed to examine the mediating role of prospective teachers' academic optimism in the relationship between their future time perspective and professional plans about teaching. A total of 396 prospective teachers voluntarily participated in the study. Correlation, regression, and structural equation modeling analyses were conducted in…

  6. Divergent estimation error in portfolio optimization and in linear regression

    NASA Astrophysics Data System (ADS)

    Kondor, I.; Varga-Haszonits, I.

    2008-08-01

    The problem of estimation error in portfolio optimization is discussed, in the limit where the portfolio size N and the sample size T go to infinity such that their ratio is fixed. The estimation error strongly depends on the ratio N/T and diverges for a critical value of this parameter. This divergence is the manifestation of an algorithmic phase transition, it is accompanied by a number of critical phenomena, and displays universality. As the structure of a large number of multidimensional regression and modelling problems is very similar to portfolio optimization, the scope of the above observations extends far beyond finance, and covers a large number of problems in operations research, machine learning, bioinformatics, medical science, economics, and technology.

  7. Towards smart energy systems: application of kernel machine regression for medium term electricity load forecasting.

    PubMed

    Alamaniotis, Miltiadis; Bargiotas, Dimitrios; Tsoukalas, Lefteri H

    2016-01-01

    Integration of energy systems with information technologies has facilitated the realization of smart energy systems that utilize information to optimize system operation. To that end, crucial in optimizing energy system operation is the accurate, ahead-of-time forecasting of load demand. In particular, load forecasting allows planning of system expansion, and decision making for enhancing system safety and reliability. In this paper, the application of two types of kernel machines for medium term load forecasting (MTLF) is presented and their performance is recorded based on a set of historical electricity load demand data. The two kernel machine models and more specifically Gaussian process regression (GPR) and relevance vector regression (RVR) are utilized for making predictions over future load demand. Both models, i.e., GPR and RVR, are equipped with a Gaussian kernel and are tested on daily predictions for a 30-day-ahead horizon taken from the New England Area. Furthermore, their performance is compared to the ARMA(2,2) model with respect to mean average percentage error and squared correlation coefficient. Results demonstrate the superiority of RVR over the other forecasting models in performing MTLF.

  8. Regression Model Optimization for the Analysis of Experimental Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2009-01-01

    A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.

  9. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    PubMed

    Zawbaa, Hossam M; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  10. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres—Focus on Feature Selection

    PubMed Central

    Zawbaa, Hossam M.; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205

  11. Time series modeling by a regression approach based on a latent process.

    PubMed

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  12. Rapid performance modeling and parameter regression of geodynamic models

    NASA Astrophysics Data System (ADS)

    Brown, J.; Duplyakin, D.

    2016-12-01

    Geodynamic models run in a parallel environment have many parameters with complicated effects on performance and scientifically-relevant functionals. Manually choosing an efficient machine configuration and mapping out the parameter space requires a great deal of expert knowledge and time-consuming experiments. We propose an active learning technique based on Gaussion Process Regression to automatically select experiments to map out the performance landscape with respect to scientific and machine parameters. The resulting performance model is then used to select optimal experiments for improving the accuracy of a reduced order model per unit of computational cost. We present the framework and evaluate its quality and capability using popular lithospheric dynamics models.

  13. Analytical Modelling and Optimization of the Temperature-Dependent Dynamic Mechanical Properties of Fused Deposition Fabricated Parts Made of PC-ABS.

    PubMed

    Mohamed, Omar Ahmed; Masood, Syed Hasan; Bhowmik, Jahar Lal

    2016-11-04

    Fused deposition modeling (FDM) additive manufacturing has been intensively used for many industrial applications due to its attractive advantages over traditional manufacturing processes. The process parameters used in FDM have significant influence on the part quality and its properties. This process produces the plastic part through complex mechanisms and it involves complex relationships between the manufacturing conditions and the quality of the processed part. In the present study, the influence of multi-level manufacturing parameters on the temperature-dependent dynamic mechanical properties of FDM processed parts was investigated using IV-optimality response surface methodology (RSM) and multilayer feed-forward neural networks (MFNNs). The process parameters considered for optimization and investigation are slice thickness, raster to raster air gap, deposition angle, part print direction, bead width, and number of perimeters. Storage compliance and loss compliance were considered as response variables. The effect of each process parameter was investigated using developed regression models and multiple regression analysis. The surface characteristics are studied using scanning electron microscope (SEM). Furthermore, performance of optimum conditions was determined and validated by conducting confirmation experiment. The comparison between the experimental values and the predicted values by IV-Optimal RSM and MFNN was conducted for each experimental run and results indicate that the MFNN provides better predictions than IV-Optimal RSM.

  14. Analytical Modelling and Optimization of the Temperature-Dependent Dynamic Mechanical Properties of Fused Deposition Fabricated Parts Made of PC-ABS

    PubMed Central

    Mohamed, Omar Ahmed; Masood, Syed Hasan; Bhowmik, Jahar Lal

    2016-01-01

    Fused deposition modeling (FDM) additive manufacturing has been intensively used for many industrial applications due to its attractive advantages over traditional manufacturing processes. The process parameters used in FDM have significant influence on the part quality and its properties. This process produces the plastic part through complex mechanisms and it involves complex relationships between the manufacturing conditions and the quality of the processed part. In the present study, the influence of multi-level manufacturing parameters on the temperature-dependent dynamic mechanical properties of FDM processed parts was investigated using IV-optimality response surface methodology (RSM) and multilayer feed-forward neural networks (MFNNs). The process parameters considered for optimization and investigation are slice thickness, raster to raster air gap, deposition angle, part print direction, bead width, and number of perimeters. Storage compliance and loss compliance were considered as response variables. The effect of each process parameter was investigated using developed regression models and multiple regression analysis. The surface characteristics are studied using scanning electron microscope (SEM). Furthermore, performance of optimum conditions was determined and validated by conducting confirmation experiment. The comparison between the experimental values and the predicted values by IV-Optimal RSM and MFNN was conducted for each experimental run and results indicate that the MFNN provides better predictions than IV-Optimal RSM. PMID:28774019

  15. Response Surface Methodology Using a Fullest Balanced Model: A Re-Analysis of a Dataset in the Korean Journal for Food Science of Animal Resources.

    PubMed

    Rheem, Sungsue; Rheem, Insoo; Oh, Sejong

    2017-01-01

    Response surface methodology (RSM) is a useful set of statistical techniques for modeling and optimizing responses in research studies of food science. In the analysis of response surface data, a second-order polynomial regression model is usually used. However, sometimes we encounter situations where the fit of the second-order model is poor. If the model fitted to the data has a poor fit including a lack of fit, the modeling and optimization results might not be accurate. In such a case, using a fullest balanced model, which has no lack of fit, can fix such problem, enhancing the accuracy of the response surface modeling and optimization. This article presents how to develop and use such a model for the better modeling and optimizing of the response through an illustrative re-analysis of a dataset in Park et al. (2014) published in the Korean Journal for Food Science of Animal Resources .

  16. Optimization of process variables for decolorization of Disperse Yellow 211 by Bacillus subtilis using Box-Behnken design.

    PubMed

    Sharma, Praveen; Singh, Lakhvinder; Dilbaghi, Neeraj

    2009-05-30

    Decolorization of textile azo dye Disperse Yellow 211 (DY 211) was carried out from simulated aqueous solution by bacterial strain Bacillus subtilis. Response surface methodology (RSM), involving Box-Behnken design matrix in three most important operating variables; temperature, pH and initial dye concentration was successfully employed for the study and optimization of decolorization process. The total 17 experiments were conducted in the study towards the construction of a quadratic model. According to analysis of variance (ANOVA) results, the proposed model can be used to navigate the design space. Under optimized conditions the bacterial strain was able to decolorize DY 211 up to 80%. Model indicated that initial dye concentration of 100 mgl(-1), pH 7 and a temperature of 32.5 degrees C were found optimum for maximum % decolorization. Very high regression coefficient between the variables and the response (R(2)=0.9930) indicated excellent evaluation of experimental data by polynomial regression model. The combination of the three variables predicted through RSM was confirmed through confirmatory experiments, hence the bacterial strain holds a great potential for the treatment of colored textile effluents.

  17. A non-linear data mining parameter selection algorithm for continuous variables

    PubMed Central

    Razavi, Marianne; Brady, Sean

    2017-01-01

    In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables. PMID:29131829

  18. Intelligent Optimization of the Film-to-Fiber Ratio of a Degradable Braided Bicomponent Ureteral Stent

    PubMed Central

    Liu, Xiaoyan; Li, Feng; Ding, Yongsheng; Zou, Ting; Wang, Lu; Hao, Kuangrong

    2015-01-01

    A hierarchical support vector regression (SVR) model (HSVRM) was employed to correlate the compositions and mechanical properties of bicomponent stents composed of poly(lactic-co-glycolic acid) (PGLA) film and poly(glycolic acid) (PGA) fibers for urethral repair for the first time. PGLA film and PGA fibers could provide ureteral stents with good compressive and tensile properties, respectively. In bicomponent stents, high film content led to high stiffness, while high fiber content resulted in poor compressional properties. To simplify the procedures to optimize the ratio of PGLA film and PGA fiber in the stents, a hierarchical support vector regression model (HSVRM) and particle swarm optimization (PSO) algorithm were used to construct relationships between the film-to-fiber weight ratio and the measured compressional/tensile properties of the stents. The experimental data and simulated data fit well, proving that the HSVRM could closely reflect the relationship between the component ratio and performance properties of the ureteral stents. PMID:28793658

  19. Optimization by response surface methodology of lutein recovery from paprika leaves using accelerated solvent extraction.

    PubMed

    Kang, Jae-Hyun; Kim, Suna; Moon, BoKyung

    2016-08-15

    In this study, we used response surface methodology (RSM) to optimize the extraction conditions for recovering lutein from paprika leaves using accelerated solvent extraction (ASE). The lutein content was quantitatively analyzed using a UPLC equipped with a BEH C18 column. A central composite design (CCD) was employed for experimental design to obtain the optimized combination of extraction temperature (°C), static time (min), and solvent (EtOH, %). The experimental data obtained from a twenty sample set were fitted to a second-order polynomial equation using multiple regression analysis. The adjusted coefficient of determination (R(2)) for the lutein extraction model was 0.9518, and the probability value (p=0.0000) demonstrated a high significance for the regression model. The optimum extraction conditions for lutein were temperature: 93.26°C, static time: 5 min, and solvent: 79.63% EtOH. Under these conditions, the predicted extraction yield of lutein was 232.60 μg/g. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.

    PubMed

    Xie, Yanmei; Zhang, Biao

    2017-04-20

    Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).

  1. Optimized endogenous post-stratification in forest inventories

    Treesearch

    Paul L. Patterson

    2012-01-01

    An example of endogenous post-stratification is the use of remote sensing data with a sample of ground data to build a logistic regression model to predict the probability that a plot is forested and using the predicted probabilities to form categories for post-stratification. An optimized endogenous post-stratified estimator of the proportion of forest has been...

  2. Guidelines 13 and 14—Prediction uncertainty

    USGS Publications Warehouse

    Hill, Mary C.; Tiedeman, Claire

    2005-01-01

    An advantage of using optimization for model development and calibration is that optimization provides methods for evaluating and quantifying prediction uncertainty. Both deterministic and statistical methods can be used. Guideline 13 discusses using regression and post-audits, which we classify as deterministic methods. Guideline 14 discusses inferential statistics and Monte Carlo methods, which we classify as statistical methods.

  3. Central Plant Optimization for Waste Energy Reduction (CPOWER). ESTCP Cost and Performance Report

    DTIC Science & Technology

    2016-12-01

    in the regression models. The solar radiation data did not appear reliable in the weather dataset for the location, and hence it was not used. The...and additional factors (e.g., solar insolation) may be needed to obtain a better model. 2. Inputs to optimizer: During several periods of...Location: North Carolina Energy Consumption Cost Savings $ 443,698.00 Analysis Type: FEMP PV of total savings 215,698.00$ Base Date: April 1

  4. Robust learning for optimal treatment decision with NP-dimensionality

    PubMed Central

    Shi, Chengchun; Song, Rui; Lu, Wenbin

    2016-01-01

    In order to identify important variables that are involved in making optimal treatment decision, Lu, Zhang and Zeng (2013) proposed a penalized least squared regression framework for a fixed number of predictors, which is robust against the misspecification of the conditional mean model. Two problems arise: (i) in a world of explosively big data, effective methods are needed to handle ultra-high dimensional data set, for example, with the dimension of predictors is of the non-polynomial (NP) order of the sample size; (ii) both the propensity score and conditional mean models need to be estimated from data under NP dimensionality. In this paper, we propose a robust procedure for estimating the optimal treatment regime under NP dimensionality. In both steps, penalized regressions are employed with the non-concave penalty function, where the conditional mean model of the response given predictors may be misspecified. The asymptotic properties, such as weak oracle properties, selection consistency and oracle distributions, of the proposed estimators are investigated. In addition, we study the limiting distribution of the estimated value function for the obtained optimal treatment regime. The empirical performance of the proposed estimation method is evaluated by simulations and an application to a depression dataset from the STAR*D study. PMID:28781717

  5. An approach of traffic signal control based on NLRSQP algorithm

    NASA Astrophysics Data System (ADS)

    Zou, Yuan-Yang; Hu, Yu

    2017-11-01

    This paper presents a linear program model with linear complementarity constraints (LPLCC) to solve traffic signal optimization problem. The objective function of the model is to obtain the minimization of total queue length with weight factors at the end of each cycle. Then, a combination algorithm based on the nonlinear least regression and sequence quadratic program (NLRSQP) is proposed, by which the local optimal solution can be obtained. Furthermore, four numerical experiments are proposed to study how to set the initial solution of the algorithm that can get a better local optimal solution more quickly. In particular, the results of numerical experiments show that: The model is effective for different arrival rates and weight factors; and the lower bound of the initial solution is, the better optimal solution can be obtained.

  6. A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China

    PubMed Central

    Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian

    2016-01-01

    In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%–19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides. PMID:27187430

  7. A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China.

    PubMed

    Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian

    2016-05-11

    In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%-19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides.

  8. Method optimization for drug impurity profiling in supercritical fluid chromatography: Application to a pharmaceutical mixture.

    PubMed

    Muscat Galea, Charlene; Didion, David; Clicq, David; Mangelings, Debby; Vander Heyden, Yvan

    2017-12-01

    A supercritical chromatographic method for the separation of a drug and its impurities has been developed and optimized applying an experimental design approach and chromatogram simulations. Stationary phase screening was followed by optimization of the modifier and injection solvent composition. A design-of-experiment (DoE) approach was then used to optimize column temperature, back-pressure and the gradient slope simultaneously. Regression models for the retention times and peak widths of all mixture components were built. The factor levels for different grid points were then used to predict the retention times and peak widths of the mixture components using the regression models and the best separation for the worst separated peak pair in the experimental domain was identified. A plot of the minimal resolutions was used to help identifying the factor levels leading to the highest resolution between consecutive peaks. The effects of the DoE factors were visualized in a way that is familiar to the analytical chemist, i.e. by simulating the resulting chromatogram. The mixture of an active ingredient and seven impurities was separated in less than eight minutes. The approach discussed in this paper demonstrates how SFC methods can be developed and optimized efficiently using simple concepts and tools. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. A short-term and high-resolution distribution system load forecasting approach using support vector regression with hybrid parameters optimization

    DOE PAGES

    Jiang, Huaiguang; Zhang, Yingchen; Muljadi, Eduard; ...

    2016-01-01

    This paper proposes an approach for distribution system load forecasting, which aims to provide highly accurate short-term load forecasting with high resolution utilizing a support vector regression (SVR) based forecaster and a two-step hybrid parameters optimization method. Specifically, because the load profiles in distribution systems contain abrupt deviations, a data normalization is designed as the pretreatment for the collected historical load data. Then an SVR model is trained by the load data to forecast the future load. For better performance of SVR, a two-step hybrid optimization algorithm is proposed to determine the best parameters. In the first step of themore » hybrid optimization algorithm, a designed grid traverse algorithm (GTA) is used to narrow the parameters searching area from a global to local space. In the second step, based on the result of the GTA, particle swarm optimization (PSO) is used to determine the best parameters in the local parameter space. After the best parameters are determined, the SVR model is used to forecast the short-term load deviation in the distribution system. The performance of the proposed approach is compared to some classic methods in later sections of the paper.« less

  10. Ensemble of surrogates-based optimization for identifying an optimal surfactant-enhanced aquifer remediation strategy at heterogeneous DNAPL-contaminated sites

    NASA Astrophysics Data System (ADS)

    Jiang, Xue; Lu, Wenxi; Hou, Zeyu; Zhao, Haiqing; Na, Jin

    2015-11-01

    The purpose of this study was to identify an optimal surfactant-enhanced aquifer remediation (SEAR) strategy for aquifers contaminated by dense non-aqueous phase liquid (DNAPL) based on an ensemble of surrogates-based optimization technique. A saturated heterogeneous medium contaminated by nitrobenzene was selected as case study. A new kind of surrogate-based SEAR optimization employing an ensemble surrogate (ES) model together with a genetic algorithm (GA) is presented. Four methods, namely radial basis function artificial neural network (RBFANN), kriging (KRG), support vector regression (SVR), and kernel extreme learning machines (KELM), were used to create four individual surrogate models, which were then compared. The comparison enabled us to select the two most accurate models (KELM and KRG) to establish an ES model of the SEAR simulation model, and the developed ES model as well as these four stand-alone surrogate models was compared. The results showed that the average relative error of the average nitrobenzene removal rates between the ES model and the simulation model for 20 test samples was 0.8%, which is a high approximation accuracy, and which indicates that the ES model provides more accurate predictions than the stand-alone surrogate models. Then, a nonlinear optimization model was formulated for the minimum cost, and the developed ES model was embedded into this optimization model as a constrained condition. Besides, GA was used to solve the optimization model to provide the optimal SEAR strategy. The developed ensemble surrogate-optimization approach was effective in seeking a cost-effective SEAR strategy for heterogeneous DNAPL-contaminated sites. This research is expected to enrich and develop the theoretical and technical implications for the analysis of remediation strategy optimization of DNAPL-contaminated aquifers.

  11. Ensemble of Surrogates-based Optimization for Identifying an Optimal Surfactant-enhanced Aquifer Remediation Strategy at Heterogeneous DNAPL-contaminated Sites

    NASA Astrophysics Data System (ADS)

    Lu, W., Sr.; Xin, X.; Luo, J.; Jiang, X.; Zhang, Y.; Zhao, Y.; Chen, M.; Hou, Z.; Ouyang, Q.

    2015-12-01

    The purpose of this study was to identify an optimal surfactant-enhanced aquifer remediation (SEAR) strategy for aquifers contaminated by dense non-aqueous phase liquid (DNAPL) based on an ensemble of surrogates-based optimization technique. A saturated heterogeneous medium contaminated by nitrobenzene was selected as case study. A new kind of surrogate-based SEAR optimization employing an ensemble surrogate (ES) model together with a genetic algorithm (GA) is presented. Four methods, namely radial basis function artificial neural network (RBFANN), kriging (KRG), support vector regression (SVR), and kernel extreme learning machines (KELM), were used to create four individual surrogate models, which were then compared. The comparison enabled us to select the two most accurate models (KELM and KRG) to establish an ES model of the SEAR simulation model, and the developed ES model as well as these four stand-alone surrogate models was compared. The results showed that the average relative error of the average nitrobenzene removal rates between the ES model and the simulation model for 20 test samples was 0.8%, which is a high approximation accuracy, and which indicates that the ES model provides more accurate predictions than the stand-alone surrogate models. Then, a nonlinear optimization model was formulated for the minimum cost, and the developed ES model was embedded into this optimization model as a constrained condition. Besides, GA was used to solve the optimization model to provide the optimal SEAR strategy. The developed ensemble surrogate-optimization approach was effective in seeking a cost-effective SEAR strategy for heterogeneous DNAPL-contaminated sites. This research is expected to enrich and develop the theoretical and technical implications for the analysis of remediation strategy optimization of DNAPL-contaminated aquifers.

  12. Statistical modelling for precision agriculture: A case study in optimal environmental schedules for Agaricus Bisporus production via variable domain functional regression.

    PubMed

    Panayi, Efstathios; Peters, Gareth W; Kyriakides, George

    2017-01-01

    Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields.

  13. Statistical modelling for precision agriculture: A case study in optimal environmental schedules for Agaricus Bisporus production via variable domain functional regression

    PubMed Central

    Panayi, Efstathios; Kyriakides, George

    2017-01-01

    Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields. PMID:28961254

  14. Parameterized data-driven fuzzy model based optimal control of a semi-batch reactor.

    PubMed

    Kamesh, Reddi; Rani, K Yamuna

    2016-09-01

    A parameterized data-driven fuzzy (PDDF) model structure is proposed for semi-batch processes, and its application for optimal control is illustrated. The orthonormally parameterized input trajectories, initial states and process parameters are the inputs to the model, which predicts the output trajectories in terms of Fourier coefficients. Fuzzy rules are formulated based on the signs of a linear data-driven model, while the defuzzification step incorporates a linear regression model to shift the domain from input to output domain. The fuzzy model is employed to formulate an optimal control problem for single rate as well as multi-rate systems. Simulation study on a multivariable semi-batch reactor system reveals that the proposed PDDF modeling approach is capable of capturing the nonlinear and time-varying behavior inherent in the semi-batch system fairly accurately, and the results of operating trajectory optimization using the proposed model are found to be comparable to the results obtained using the exact first principles model, and are also found to be comparable to or better than parameterized data-driven artificial neural network model based optimization results. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  15. Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood

    NASA Astrophysics Data System (ADS)

    Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim

    2017-04-01

    Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models

  16. Optimization of operating parameters in polysilicon chemical vapor deposition reactor with response surface methodology

    NASA Astrophysics Data System (ADS)

    An, Li-sha; Liu, Chun-jiao; Liu, Ying-wen

    2018-05-01

    In the polysilicon chemical vapor deposition reactor, the operating parameters are complex to affect the polysilicon's output. Therefore, it is very important to address the coupling problem of multiple parameters and solve the optimization in a computationally efficient manner. Here, we adopted Response Surface Methodology (RSM) to analyze the complex coupling effects of different operating parameters on silicon deposition rate (R) and further achieve effective optimization of the silicon CVD system. Based on finite numerical experiments, an accurate RSM regression model is obtained and applied to predict the R with different operating parameters, including temperature (T), pressure (P), inlet velocity (V), and inlet mole fraction of H2 (M). The analysis of variance is conducted to describe the rationality of regression model and examine the statistical significance of each factor. Consequently, the optimum combination of operating parameters for the silicon CVD reactor is: T = 1400 K, P = 3.82 atm, V = 3.41 m/s, M = 0.91. The validation tests and optimum solution show that the results are in good agreement with those from CFD model and the deviations of the predicted values are less than 4.19%. This work provides a theoretical guidance to operate the polysilicon CVD process.

  17. An Update on Statistical Boosting in Biomedicine.

    PubMed

    Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

    2017-01-01

    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

  18. Reduction of shock induced noise in imperfectly expanded supersonic jets using convex optimization

    NASA Astrophysics Data System (ADS)

    Adhikari, Sam

    2007-11-01

    Imperfectly expanded jets generate screech noise. The imbalance between the backpressure and the exit pressure of the imperfectly expanded jets produce shock cells and expansion or compression waves from the nozzle. The instability waves and the shock cells interact to generate the screech sound. The mathematical model consists of cylindrical coordinate based full Navier-Stokes equations and large-eddy-simulation turbulence modeling. Analytical and computational analysis of the three-dimensional helical effects provide a model that relates several parameters with shock cell patterns, screech frequency and distribution of shock generation locations. Convex optimization techniques minimize the shock cell patterns and the instability waves. The objective functions are (convex) quadratic and the constraint functions are affine. In the quadratic optimization programs, minimization of the quadratic functions over a set of polyhedrons provides the optimal result. Various industry standard methods like regression analysis, distance between polyhedra, bounding variance, Markowitz optimization, and second order cone programming is used for Quadratic Optimization.

  19. Factors affecting species distribution predictions: A simulation modeling experiment

    Treesearch

    Gordon C. Reese; Kenneth R. Wilson; Jennifer A. Hoeting; Curtis H. Flather

    2005-01-01

    Geospatial species sample data (e.g., records with location information from natural history museums or annual surveys) are rarely collected optimally, yet are increasingly used for decisions concerning our biological heritage. Using computer simulations, we examined factors that could affect the performance of autologistic regression (ALR) models that predict species...

  20. Efficient least angle regression for identification of linear-in-the-parameters models

    PubMed Central

    Beach, Thomas H.; Rezgui, Yacine

    2017-01-01

    Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140

  1. On approaches to analyze the sensitivity of simulated hydrologic fluxes to model parameters in the community land model

    DOE PAGES

    Bao, Jie; Hou, Zhangshuan; Huang, Maoyi; ...

    2015-12-04

    Here, effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash-Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalizedmore » linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.« less

  2. Optimization of Fish Protection System to Increase Technosphere Safety

    NASA Astrophysics Data System (ADS)

    Khetsuriani, E. D.; Fesenko, L. N.; Larin, D. S.

    2017-11-01

    The article is concerned with field study data. Drawing upon prior information and considering structural features of fish protection devices, we decided to conduct experimental research while changing three parameters: process pressure PCT, stream velocity Vp and washer nozzle inclination angle αc. The variability intervals of examined factors are shown in the Table 1. The conicity angle was assumed as a constant one. The box design B3 was chosen as a baseline being close to D-optimal designs in its statistical characteristics. The number of device rotations and its fish fry protection efficiency were accepted as the output functions of optimization. The numerical values of regression coefficients of quadratic equations describing the behavior of optimization functions Y1 and Y2 and their formulaic errors were calculated upon the test results in accordance with the planning matrix. The adequacy or inadequacy of the obtained quadratic regression model is judged via checking the condition whether Fexp ≤ Ftheor.

  3. A comparative research of different ensemble surrogate models based on set pair analysis for the DNAPL-contaminated aquifer remediation strategy optimization.

    PubMed

    Hou, Zeyu; Lu, Wenxi; Xue, Haibo; Lin, Jin

    2017-08-01

    Surrogate-based simulation-optimization technique is an effective approach for optimizing the surfactant enhanced aquifer remediation (SEAR) strategy for clearing DNAPLs. The performance of the surrogate model, which is used to replace the simulation model for the aim of reducing computation burden, is the key of corresponding researches. However, previous researches are generally based on a stand-alone surrogate model, and rarely make efforts to improve the approximation accuracy of the surrogate model to the simulation model sufficiently by combining various methods. In this regard, we present set pair analysis (SPA) as a new method to build ensemble surrogate (ES) model, and conducted a comparative research to select a better ES modeling pattern for the SEAR strategy optimization problems. Surrogate models were developed using radial basis function artificial neural network (RBFANN), support vector regression (SVR), and Kriging. One ES model is assembling RBFANN model, SVR model, and Kriging model using set pair weights according their performance, and the other is assembling several Kriging (the best surrogate modeling method of three) models built with different training sample datasets. Finally, an optimization model, in which the ES model was embedded, was established to obtain the optimal remediation strategy. The results showed the residuals of the outputs between the best ES model and simulation model for 100 testing samples were lower than 1.5%. Using an ES model instead of the simulation model was critical for considerably reducing the computation time of simulation-optimization process and maintaining high computation accuracy simultaneously. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. A variable structure fuzzy neural network model of squamous dysplasia and esophageal squamous cell carcinoma based on a global chaotic optimization algorithm.

    PubMed

    Moghtadaei, Motahareh; Hashemi Golpayegani, Mohammad Reza; Malekzadeh, Reza

    2013-02-07

    Identification of squamous dysplasia and esophageal squamous cell carcinoma (ESCC) is of great importance in prevention of cancer incidence. Computer aided algorithms can be very useful for identification of people with higher risks of squamous dysplasia, and ESCC. Such method can limit the clinical screenings to people with higher risks. Different regression methods have been used to predict ESCC and dysplasia. In this paper, a Fuzzy Neural Network (FNN) model is selected for ESCC and dysplasia prediction. The inputs to the classifier are the risk factors. Since the relation between risk factors in the tumor system has a complex nonlinear behavior, in comparison to most of ordinary data, the cost function of its model can have more local optimums. Thus the need for global optimization methods is more highlighted. The proposed method in this paper is a Chaotic Optimization Algorithm (COA) proceeding by the common Error Back Propagation (EBP) local method. Since the model has many parameters, we use a strategy to reduce the dependency among parameters caused by the chaotic series generator. This dependency was not considered in the previous COA methods. The algorithm is compared with logistic regression model as the latest successful methods of ESCC and dysplasia prediction. The results represent a more precise prediction with less mean and variance of error. Copyright © 2012 Elsevier Ltd. All rights reserved.

  5. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    NASA Astrophysics Data System (ADS)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  6. Prediction of Depression in Cancer Patients With Different Classification Criteria, Linear Discriminant Analysis versus Logistic Regression.

    PubMed

    Shayan, Zahra; Mohammad Gholi Mezerji, Naser; Shayan, Leila; Naseri, Parisa

    2015-11-03

    Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.

  7. Optimization and Prediction of Ultimate Tensile Strength in Metal Active Gas Welding.

    PubMed

    Ampaiboon, Anusit; Lasunon, On-Uma; Bubphachot, Bopit

    2015-01-01

    We investigated the effect of welding parameters on ultimate tensile strength of structural steel, ST37-2, welded by Metal Active Gas welding. A fractional factorial design was used for determining the significance of six parameters: wire feed rate, welding voltage, welding speed, travel angle, tip-to-work distance, and shielded gas flow rate. A regression model to predict ultimate tensile strength was developed. Finally, we verified optimization of the process parameters experimentally. We achieved an optimum tensile strength (558 MPa) and wire feed rate, 19 m/min, had the greatest effect, followed by tip-to-work distance, 7 mm, welding speed, 200 mm/min, welding voltage, 30 V, and travel angle, 60°. Shield gas flow rate, 10 L/min, was slightly better but had little effect in the 10-20 L/min range. Tests showed that our regression model was able to predict the ultimate tensile strength within 4%.

  8. An integrated Gaussian process regression for prediction of remaining useful life of slow speed bearings based on acoustic emission

    NASA Astrophysics Data System (ADS)

    Aye, S. A.; Heyns, P. S.

    2017-02-01

    This paper proposes an optimal Gaussian process regression (GPR) for the prediction of remaining useful life (RUL) of slow speed bearings based on a novel degradation assessment index obtained from acoustic emission signal. The optimal GPR is obtained from an integration or combination of existing simple mean and covariance functions in order to capture the observed trend of the bearing degradation as well the irregularities in the data. The resulting integrated GPR model provides an excellent fit to the data and improves over the simple GPR models that are based on simple mean and covariance functions. In addition, it achieves a low percentage error prediction of the remaining useful life of slow speed bearings. These findings are robust under varying operating conditions such as loading and speed and can be applied to nonlinear and nonstationary machine response signals useful for effective preventive machine maintenance purposes.

  9. Optimizing methods for linking cinematic features to fMRI data.

    PubMed

    Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

    2015-04-15

    One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.

  10. Identifying the optimal segmentors for mass classification in mammograms

    NASA Astrophysics Data System (ADS)

    Zhang, Yu; Tomuro, Noriko; Furst, Jacob; Raicu, Daniela S.

    2015-03-01

    In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.

  11. Transformation Model Choice in Nonlinear Regression Analysis of Fluorescence-based Serial Dilution Assays

    PubMed Central

    Fong, Youyi; Yu, Xuesong

    2016-01-01

    Many modern serial dilution assays are based on fluorescence intensity (FI) readouts. We study optimal transformation model choice for fitting five parameter logistic curves (5PL) to FI-based serial dilution assay data. We first develop a generalized least squares-pseudolikelihood type algorithm for fitting heteroscedastic logistic models. Next we show that the 5PL and log 5PL functions can approximate each other well. We then compare four 5PL models with different choices of log transformation and variance modeling through a Monte Carlo study and real data. Our findings are that the optimal choice depends on the intended use of the fitted curves. PMID:27642502

  12. Support vector methods for survival analysis: a comparison between ranking and regression approaches.

    PubMed

    Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K

    2011-10-01

    To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods including only regression or both regression and ranking constraints on clinical data. On high dimensional data, the former model performs better. However, this approach does not have a theoretical link with standard statistical models for survival data. This link can be made by means of transformation models when ranking constraints are included. Copyright © 2011 Elsevier B.V. All rights reserved.

  13. Enhanced index tracking modeling in portfolio optimization with mixed-integer programming z approach

    NASA Astrophysics Data System (ADS)

    Siew, Lam Weng; Jaaman, Saiful Hafizah Hj.; Ismail, Hamizun bin

    2014-09-01

    Enhanced index tracking is a popular form of portfolio management in stock market investment. Enhanced index tracking aims to construct an optimal portfolio to generate excess return over the return achieved by the stock market index without purchasing all of the stocks that make up the index. The objective of this paper is to construct an optimal portfolio using mixed-integer programming model which adopts regression approach in order to generate higher portfolio mean return than stock market index return. In this study, the data consists of 24 component stocks in Malaysia market index which is FTSE Bursa Malaysia Kuala Lumpur Composite Index from January 2010 until December 2012. The results of this study show that the optimal portfolio of mixed-integer programming model is able to generate higher mean return than FTSE Bursa Malaysia Kuala Lumpur Composite Index return with only selecting 30% out of the total stock market index components.

  14. Predictive modelling of flow in a two-dimensional intermediate-scale, heterogeneous porous media

    USGS Publications Warehouse

    Barth, Gilbert R.; Hill, M.C.; Illangasekare, T.H.; Rajaram, H.

    2000-01-01

    To better understand the role of sedimentary structures in flow through porous media, and to determine how small-scale laboratory-measured values of hydraulic conductivity relate to in situ values this work deterministically examines flow through simple, artificial structures constructed for a series of intermediate-scale (10 m long), two-dimensional, heterogeneous, laboratory experiments. Nonlinear regression was used to determine optimal values of in situ hydraulic conductivity, which were compared to laboratory-measured values. Despite explicit numerical representation of the heterogeneity, the optimized values were generally greater than the laboratory-measured values. Discrepancies between measured and optimal values varied depending on the sand sieve size, but their contribution to error in the predicted flow was fairly consistent for all sands. Results indicate that, even under these controlled circumstances, laboratory-measured values of hydraulic conductivity need to be applied to models cautiously.To better understand the role of sedimentary structures in flow through porous media, and to determine how small-scale laboratory-measured values of hydraulic conductivity relate to in situ values this work deterministically examines flow through simple, artificial structures constructed for a series of intermediate-scale (10 m long), two-dimensional, heterogeneous, laboratory experiments. Nonlinear regression was used to determine optimal values of in situ hydraulic conductivity, which were compared to laboratory-measured values. Despite explicit numerical representation of the heterogeneity, the optimized values were generally greater than the laboratory-measured values. Discrepancies between measured and optimal values varied depending on the sand sieve size, but their contribution to error in the predicted flow was fairly consistent for all sands. Results indicate that, even under these controlled circumstances, laboratory-measured values of hydraulic conductivity need to be applied to models cautiously.

  15. Regression modeling and prediction of road sweeping brush load characteristics from finite element analysis and experimental results.

    PubMed

    Wang, Chong; Sun, Qun; Wahab, Magd Abdel; Zhang, Xingyu; Xu, Limin

    2015-09-01

    Rotary cup brushes mounted on each side of a road sweeper undertake heavy debris removal tasks but the characteristics have not been well known until recently. A Finite Element (FE) model that can analyze brush deformation and predict brush characteristics have been developed to investigate the sweeping efficiency and to assist the controller design. However, the FE model requires large amount of CPU time to simulate each brush design and operating scenario, which may affect its applications in a real-time system. This study develops a mathematical regression model to summarize the FE modeled results. The complex brush load characteristic curves were statistically analyzed to quantify the effects of cross-section, length, mounting angle, displacement and rotational speed etc. The data were then fitted by a multiple variable regression model using the maximum likelihood method. The fitted results showed good agreement with the FE analysis results and experimental results, suggesting that the mathematical regression model may be directly used in a real-time system to predict characteristics of different brushes under varying operating conditions. The methodology may also be used in the design and optimization of rotary brush tools. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Stemflow estimation in a redwood forest using model-based stratified random sampling

    Treesearch

    Jack Lewis

    2003-01-01

    Model-based stratified sampling is illustrated by a case study of stemflow volume in a redwood forest. The approach is actually a model-assisted sampling design in which auxiliary information (tree diameter) is utilized in the design of stratum boundaries to optimize the efficiency of a regression or ratio estimator. The auxiliary information is utilized in both the...

  17. Optimization of large animal MI models; a systematic analysis of control groups from preclinical studies.

    PubMed

    Zwetsloot, P P; Kouwenberg, L H J A; Sena, E S; Eding, J E; den Ruijter, H M; Sluijter, J P G; Pasterkamp, G; Doevendans, P A; Hoefer, I E; Chamuleau, S A J; van Hout, G P J; Jansen Of Lorkeers, S J

    2017-10-27

    Large animal models are essential for the development of novel therapeutics for myocardial infarction. To optimize translation, we need to assess the effect of experimental design on disease outcome and model experimental design to resemble the clinical course of MI. The aim of this study is therefore to systematically investigate how experimental decisions affect outcome measurements in large animal MI models. We used control animal-data from two independent meta-analyses of large animal MI models. All variables of interest were pre-defined. We performed univariable and multivariable meta-regression to analyze whether these variables influenced infarct size and ejection fraction. Our analyses incorporated 246 relevant studies. Multivariable meta-regression revealed that infarct size and cardiac function were influenced independently by choice of species, sex, co-medication, occlusion type, occluded vessel, quantification method, ischemia duration and follow-up duration. We provide strong systematic evidence that commonly used endpoints significantly depend on study design and biological variation. This makes direct comparison of different study-results difficult and calls for standardized models. Researchers should take this into account when designing large animal studies to most closely mimic the clinical course of MI and enable translational success.

  18. Hyperspectral analysis of soil organic matter in coal mining regions using wavelets, correlations, and partial least squares regression.

    PubMed

    Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen

    2016-02-01

    Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation--partial least squares regression (PLSR) method effectively solves the information loss problem of correlation--multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R(2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions.

  19. Conservative strategy-based ensemble surrogate model for optimal groundwater remediation design at DNAPLs-contaminated sites

    NASA Astrophysics Data System (ADS)

    Ouyang, Qi; Lu, Wenxi; Lin, Jin; Deng, Wenbing; Cheng, Weiguo

    2017-08-01

    The surrogate-based simulation-optimization techniques are frequently used for optimal groundwater remediation design. When this technique is used, surrogate errors caused by surrogate-modeling uncertainty may lead to generation of infeasible designs. In this paper, a conservative strategy that pushes the optimal design into the feasible region was used to address surrogate-modeling uncertainty. In addition, chance-constrained programming (CCP) was adopted to compare with the conservative strategy in addressing this uncertainty. Three methods, multi-gene genetic programming (MGGP), Kriging (KRG) and support vector regression (SVR), were used to construct surrogate models for a time-consuming multi-phase flow model. To improve the performance of the surrogate model, ensemble surrogates were constructed based on combinations of different stand-alone surrogate models. The results show that: (1) the surrogate-modeling uncertainty was successfully addressed by the conservative strategy, which means that this method is promising for addressing surrogate-modeling uncertainty. (2) The ensemble surrogate model that combines MGGP with KRG showed the most favorable performance, which indicates that this ensemble surrogate can utilize both stand-alone surrogate models to improve the performance of the surrogate model.

  20. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints.

    PubMed

    van der Ploeg, Tjeerd; Austin, Peter C; Steyerberg, Ewout W

    2014-12-22

    Modern modelling techniques may potentially provide more accurate predictions of binary outcomes than classical techniques. We aimed to study the predictive performance of different modelling techniques in relation to the effective sample size ("data hungriness"). We performed simulation studies based on three clinical cohorts: 1282 patients with head and neck cancer (with 46.9% 5 year survival), 1731 patients with traumatic brain injury (22.3% 6 month mortality) and 3181 patients with minor head injury (7.6% with CT scan abnormalities). We compared three relatively modern modelling techniques: support vector machines (SVM), neural nets (NN), and random forests (RF) and two classical techniques: logistic regression (LR) and classification and regression trees (CART). We created three large artificial databases with 20 fold, 10 fold and 6 fold replication of subjects, where we generated dichotomous outcomes according to different underlying models. We applied each modelling technique to increasingly larger development parts (100 repetitions). The area under the ROC-curve (AUC) indicated the performance of each model in the development part and in an independent validation part. Data hungriness was defined by plateauing of AUC and small optimism (difference between the mean apparent AUC and the mean validated AUC <0.01). We found that a stable AUC was reached by LR at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable. Modern modelling techniques such as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available.

  1. Modeling and optimization of red currants vacuum drying process by response surface methodology (RSM).

    PubMed

    Šumić, Zdravko; Vakula, Anita; Tepić, Aleksandra; Čakarević, Jelena; Vitas, Jasmina; Pavlić, Branimir

    2016-07-15

    Fresh red currants were dried by vacuum drying process under different drying conditions. Box-Behnken experimental design with response surface methodology was used for optimization of drying process in terms of physical (moisture content, water activity, total color change, firmness and rehydratation power) and chemical (total phenols, total flavonoids, monomeric anthocyanins and ascorbic acid content and antioxidant activity) properties of dried samples. Temperature (48-78 °C), pressure (30-330 mbar) and drying time (8-16 h) were investigated as independent variables. Experimental results were fitted to a second-order polynomial model where regression analysis and analysis of variance were used to determine model fitness and optimal drying conditions. The optimal conditions of simultaneously optimized responses were temperature of 70.2 °C, pressure of 39 mbar and drying time of 8 h. It could be concluded that vacuum drying provides samples with good physico-chemical properties, similar to lyophilized sample and better than conventionally dried sample. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Modeling and managing risk early in software development

    NASA Technical Reports Server (NTRS)

    Briand, Lionel C.; Thomas, William M.; Hetmanski, Christopher J.

    1993-01-01

    In order to improve the quality of the software development process, we need to be able to build empirical multivariate models based on data collectable early in the software process. These models need to be both useful for prediction and easy to interpret, so that remedial actions may be taken in order to control and optimize the development process. We present an automated modeling technique which can be used as an alternative to regression techniques. We show how it can be used to facilitate the identification and aid the interpretation of the significant trends which characterize 'high risk' components in several Ada systems. Finally, we evaluate the effectiveness of our technique based on a comparison with logistic regression based models.

  3. Subsonic Aircraft With Regression and Neural-Network Approximators Designed

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Hopkins, Dale A.

    2004-01-01

    At the NASA Glenn Research Center, NASA Langley Research Center's Flight Optimization System (FLOPS) and the design optimization testbed COMETBOARDS with regression and neural-network-analysis approximators have been coupled to obtain a preliminary aircraft design methodology. For a subsonic aircraft, the optimal design, that is the airframe-engine combination, is obtained by the simulation. The aircraft is powered by two high-bypass-ratio engines with a nominal thrust of about 35,000 lbf. It is to carry 150 passengers at a cruise speed of Mach 0.8 over a range of 3000 n mi and to operate on a 6000-ft runway. The aircraft design utilized a neural network and a regression-approximations-based analysis tool, along with a multioptimizer cascade algorithm that uses sequential linear programming, sequential quadratic programming, the method of feasible directions, and then sequential quadratic programming again. Optimal aircraft weight versus the number of design iterations is shown. The central processing unit (CPU) time to solution is given. It is shown that the regression-method-based analyzer exhibited a smoother convergence pattern than the FLOPS code. The optimum weight obtained by the approximation technique and the FLOPS code differed by 1.3 percent. Prediction by the approximation technique exhibited no error for the aircraft wing area and turbine entry temperature, whereas it was within 2 percent for most other parameters. Cascade strategy was required by FLOPS as well as the approximators. The regression method had a tendency to hug the data points, whereas the neural network exhibited a propensity to follow a mean path. The performance of the neural network and regression methods was considered adequate. It was at about the same level for small, standard, and large models with redundancy ratios (defined as the number of input-output pairs to the number of unknown coefficients) of 14, 28, and 57, respectively. In an SGI octane workstation (Silicon Graphics, Inc., Mountainview, CA), the regression training required a fraction of a CPU second, whereas neural network training was between 1 and 9 min, as given. For a single analysis cycle, the 3-sec CPU time required by the FLOPS code was reduced to milliseconds by the approximators. For design calculations, the time with the FLOPS code was 34 min. It was reduced to 2 sec with the regression method and to 4 min by the neural network technique. The performance of the regression and neural network methods was found to be satisfactory for the analysis and design optimization of the subsonic aircraft.

  4. Design optimization of tailor-rolled blank thin-walled structures based on ɛ-support vector regression technique and genetic algorithm

    NASA Astrophysics Data System (ADS)

    Duan, Libin; Xiao, Ning-cong; Li, Guangyao; Cheng, Aiguo; Chen, Tao

    2017-07-01

    Tailor-rolled blank thin-walled (TRB-TH) structures have become important vehicle components owing to their advantages of light weight and crashworthiness. The purpose of this article is to provide an efficient lightweight design for improving the energy-absorbing capability of TRB-TH structures under dynamic loading. A finite element (FE) model for TRB-TH structures is established and validated by performing a dynamic axial crash test. Different material properties for individual parts with different thicknesses are considered in the FE model. Then, a multi-objective crashworthiness design of the TRB-TH structure is constructed based on the ɛ-support vector regression (ɛ-SVR) technique and non-dominated sorting genetic algorithm-II. The key parameters (C, ɛ and σ) are optimized to further improve the predictive accuracy of ɛ-SVR under limited sample points. Finally, the technique for order preference by similarity to the ideal solution method is used to rank the solutions in Pareto-optimal frontiers and find the best compromise optima. The results demonstrate that the light weight and crashworthiness performance of the optimized TRB-TH structures are superior to their uniform thickness counterparts. The proposed approach provides useful guidance for designing TRB-TH energy absorbers for vehicle bodies.

  5. SU-F-BRD-01: A Logistic Regression Model to Predict Objective Function Weights in Prostate Cancer IMRT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boutilier, J; Chan, T; Lee, T

    2014-06-15

    Purpose: To develop a statistical model that predicts optimization objective function weights from patient geometry for intensity-modulation radiotherapy (IMRT) of prostate cancer. Methods: A previously developed inverse optimization method (IOM) is applied retrospectively to determine optimal weights for 51 treated patients. We use an overlap volume ratio (OVR) of bladder and rectum for different PTV expansions in order to quantify patient geometry in explanatory variables. Using the optimal weights as ground truth, we develop and train a logistic regression (LR) model to predict the rectum weight and thus the bladder weight. Post hoc, we fix the weights of the leftmore » femoral head, right femoral head, and an artificial structure that encourages conformity to the population average while normalizing the bladder and rectum weights accordingly. The population average of objective function weights is used for comparison. Results: The OVR at 0.7cm was found to be the most predictive of the rectum weights. The LR model performance is statistically significant when compared to the population average over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and mean voxel dose to the bladder, rectum, CTV, and PTV. On average, the LR model predicted bladder and rectum weights that are both 63% closer to the optimal weights compared to the population average. The treatment plans resulting from the LR weights have, on average, a rectum V70Gy that is 35% closer to the clinical plan and a bladder V70Gy that is 43% closer. Similar results are seen for bladder V54Gy and rectum V54Gy. Conclusion: Statistical modelling from patient anatomy can be used to determine objective function weights in IMRT for prostate cancer. Our method allows the treatment planners to begin the personalization process from an informed starting point, which may lead to more consistent clinical plans and reduce overall planning time.« less

  6. A CNN Regression Approach for Real-Time 2D/3D Registration.

    PubMed

    Shun Miao; Wang, Z Jane; Rui Liao

    2016-05-01

    In this paper, we present a Convolutional Neural Network (CNN) regression approach to address the two major limitations of existing intensity-based 2-D/3-D registration technology: 1) slow computation and 2) small capture range. Different from optimization-based methods, which iteratively optimize the transformation parameters over a scalar-valued metric function representing the quality of the registration, the proposed method exploits the information embedded in the appearances of the digitally reconstructed radiograph and X-ray images, and employs CNN regressors to directly estimate the transformation parameters. An automatic feature extraction step is introduced to calculate 3-D pose-indexed features that are sensitive to the variables to be regressed while robust to other factors. The CNN regressors are then trained for local zones and applied in a hierarchical manner to break down the complex regression task into multiple simpler sub-tasks that can be learned separately. Weight sharing is furthermore employed in the CNN regression model to reduce the memory footprint. The proposed approach has been quantitatively evaluated on 3 potential clinical applications, demonstrating its significant advantage in providing highly accurate real-time 2-D/3-D registration with a significantly enlarged capture range when compared to intensity-based methods.

  7. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting

    NASA Astrophysics Data System (ADS)

    Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu

    2016-06-01

    To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.

  8. Non-linear auto-regressive models for cross-frequency coupling in neural time series

    PubMed Central

    Tallot, Lucille; Grabot, Laetitia; Doyère, Valérie; Grenier, Yves; Gramfort, Alexandre

    2017-01-01

    We address the issue of reliably detecting and quantifying cross-frequency coupling (CFC) in neural time series. Based on non-linear auto-regressive models, the proposed method provides a generative and parametric model of the time-varying spectral content of the signals. As this method models the entire spectrum simultaneously, it avoids the pitfalls related to incorrect filtering or the use of the Hilbert transform on wide-band signals. As the model is probabilistic, it also provides a score of the model “goodness of fit” via the likelihood, enabling easy and legitimate model selection and parameter comparison; this data-driven feature is unique to our model-based approach. Using three datasets obtained with invasive neurophysiological recordings in humans and rodents, we demonstrate that these models are able to replicate previous results obtained with other metrics, but also reveal new insights such as the influence of the amplitude of the slow oscillation. Using simulations, we demonstrate that our parametric method can reveal neural couplings with shorter signals than non-parametric methods. We also show how the likelihood can be used to find optimal filtering parameters, suggesting new properties on the spectrum of the driving signal, but also to estimate the optimal delay between the coupled signals, enabling a directionality estimation in the coupling. PMID:29227989

  9. Research on Fault Rate Prediction Method of T/R Component

    NASA Astrophysics Data System (ADS)

    Hou, Xiaodong; Yang, Jiangping; Bi, Zengjun; Zhang, Yu

    2017-07-01

    T/R component is an important part of the large phased array radar antenna array, because of its large numbers, high fault rate, it has important significance for fault prediction. Aiming at the problems of traditional grey model GM(1,1) in practical operation, the discrete grey model is established based on the original model in this paper, and the optimization factor is introduced to optimize the background value, and the linear form of the prediction model is added, the improved discrete grey model of linear regression is proposed, finally, an example is simulated and compared with other models. The results show that the method proposed in this paper has higher accuracy and the solution is simple and the application scope is more extensive.

  10. Convex Regression with Interpretable Sharp Partitions

    PubMed Central

    Petersen, Ashley; Simon, Noah; Witten, Daniela

    2016-01-01

    We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set. PMID:27635120

  11. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches.

    PubMed

    Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.

  12. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches

    PubMed Central

    Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969

  13. Modelling and multi objective optimization of WEDM of commercially Monel super alloy using evolutionary algorithms

    NASA Astrophysics Data System (ADS)

    Varun, Sajja; Reddy, Kalakada Bhargav Bal; Vardhan Reddy, R. R. Vishnu

    2016-09-01

    In this research work, development of a multi response optimization technique has been undertaken, using traditional desirability analysis and non-traditional particle swarm optimization techniques (for different customer's priorities) in wire electrical discharge machining (WEDM). Monel 400 has been selected as work material for experimentation. The effect of key process parameters such as pulse on time (TON), pulse off time (TOFF), peak current (IP), wire feed (WF) were on material removal rate (MRR) and surface roughness(SR) in WEDM operation were investigated. Further, the responses such as MRR and SR were modelled empirically through regression analysis. The developed models can be used by the machinists to predict the MRR and SR over a wide range of input parameters. The optimization of multiple responses has been done for satisfying the priorities of multiple users by using Taguchi-desirability function method and particle swarm optimization technique. The analysis of variance (ANOVA) is also applied to investigate the effect of influential parameters. Finally, the confirmation experiments were conducted for the optimal set of machining parameters, and the betterment has been proved.

  14. Development of a hybrid proximal sensing method for rapid identification of petroleum contaminated soils.

    PubMed

    Chakraborty, Somsubhra; Weindorf, David C; Li, Bin; Ali Aldabaa, Abdalsamad Abdalsatar; Ghosh, Rakesh Kumar; Paul, Sathi; Nasim Ali, Md

    2015-05-01

    Using 108 petroleum contaminated soil samples, this pilot study proposed a new analytical approach of combining visible near-infrared diffuse reflectance spectroscopy (VisNIR DRS) and portable X-ray fluorescence spectrometry (PXRF) for rapid and improved quantification of soil petroleum contamination. Results indicated that an advanced fused model where VisNIR DRS spectra-based penalized spline regression (PSR) was used to predict total petroleum hydrocarbon followed by PXRF elemental data-based random forest regression was used to model the PSR residuals, it outperformed (R(2)=0.78, residual prediction deviation (RPD)=2.19) all other models tested, even producing better generalization than using VisNIR DRS alone (RPD's of 1.64, 1.86, and 1.96 for random forest, penalized spline regression, and partial least squares regression, respectively). Additionally, unsupervised principal component analysis using the PXRF+VisNIR DRS system qualitatively separated contaminated soils from control samples. Fusion of PXRF elemental data and VisNIR derivative spectra produced an optimized model for total petroleum hydrocarbon quantification in soils. Copyright © 2015 Elsevier B.V. All rights reserved.

  15. Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery.

    PubMed

    Liu, Han; Wang, Lie; Zhao, Tuo

    2015-08-01

    We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models. Compared with existing methods, CMR calibrates regularization for each regression task with respect to its noise level so that it simultaneously attains improved finite-sample performance and tuning insensitiveness. Theoretically, we provide sufficient conditions under which CMR achieves the optimal rate of convergence in parameter estimation. Computationally, we propose an efficient smoothed proximal gradient algorithm with a worst-case numerical rate of convergence O (1/ ϵ ), where ϵ is a pre-specified accuracy of the objective function value. We conduct thorough numerical simulations to illustrate that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR to solve a brain activity prediction problem and find that it is as competitive as a handcrafted model created by human experts. The R package camel implementing the proposed method is available on the Comprehensive R Archive Network http://cran.r-project.org/web/packages/camel/.

  16. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    PubMed

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  17. Optimal Reflectance, Transmittance, and Absorptance Wavebands and Band Ratios for the Estimation of Leaf Chlorophyll Concentration

    NASA Technical Reports Server (NTRS)

    Carter, Gregory A.; Spiering, Bruce A.

    2000-01-01

    The present study utilized regression analysis to identify: wavebands and band ratios within the 400-850 nm range that could be used to estimate total chlorophyll concentration with minimal error; and simple regression models that were most effective in estimating chlorophyll concentrations were measured for two broadleaved species, a broadleaved vine, a needle-leaved conifer, and a representative of the grass family.Overall, reflectance, transmittance, and absorptance corresponded most precisely with chlorophyll concentration at wavelengths near 700 nm, although regressions were strong as well in the 550-625 nm range.

  18. Detecting influential observations in nonlinear regression modeling of groundwater flow

    USGS Publications Warehouse

    Yager, Richard M.

    1998-01-01

    Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.

  19. The dynamic model of enterprise revenue management

    NASA Astrophysics Data System (ADS)

    Mitsel, A. A.; Kataev, M. Yu; Kozlov, S. V.; Korepanov, K. V.

    2017-01-01

    The article presents the dynamic model of enterprise revenue management. This model is based on the quadratic criterion and linear control law. The model is founded on multiple regression that links revenues with the financial performance of the enterprise. As a result, optimal management is obtained so as to provide the given enterprise revenue, namely, the values of financial indicators that ensure the planned profit of the organization are acquired.

  20. Incorporating Uncertainty into Spacecraft Mission and Trajectory Design

    NASA Astrophysics Data System (ADS)

    Juliana D., Feldhacker

    The complex nature of many astrodynamic systems often leads to high computational costs or degraded accuracy in the analysis and design of spacecraft missions, and the incorporation of uncertainty into the trajectory optimization process often becomes intractable. This research applies mathematical modeling techniques to reduce computational cost and improve tractability for design, optimization, uncertainty quantication (UQ) and sensitivity analysis (SA) in astrodynamic systems and develops a method for trajectory optimization under uncertainty (OUU). This thesis demonstrates the use of surrogate regression models and polynomial chaos expansions for the purpose of design and UQ in the complex three-body system. Results are presented for the application of the models to the design of mid-eld rendezvous maneuvers for spacecraft in three-body orbits. The models are shown to provide high accuracy with no a priori knowledge on the sample size required for convergence. Additionally, a method is developed for the direct incorporation of system uncertainties into the design process for the purpose of OUU and robust design; these methods are also applied to the rendezvous problem. It is shown that the models can be used for constrained optimization with orders of magnitude fewer samples than is required for a Monte Carlo approach to the same problem. Finally, this research considers an application for which regression models are not well-suited, namely UQ for the kinetic de ection of potentially hazardous asteroids under the assumptions of real asteroid shape models and uncertainties in the impact trajectory and the surface material properties of the asteroid, which produce a non-smooth system response. An alternate set of models is presented that enables analytic computation of the uncertainties in the imparted momentum from impact. Use of these models for a survey of asteroids allows conclusions to be drawn on the eects of an asteroid's shape on the ability to successfully divert the asteroid via kinetic impactor.

  1. The Bayesian group lasso for confounded spatial data

    USGS Publications Warehouse

    Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.

    2017-01-01

    Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.

  2. Demonstration of leapfrogging for implementing nonlinear model predictive control on a heat exchanger.

    PubMed

    Sridhar, Upasana Manimegalai; Govindarajan, Anand; Rhinehart, R Russell

    2016-01-01

    This work reveals the applicability of a relatively new optimization technique, Leapfrogging, for both nonlinear regression modeling and a methodology for nonlinear model-predictive control. Both are relatively simple, yet effective. The application on a nonlinear, pilot-scale, shell-and-tube heat exchanger reveals practicability of the techniques. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  3. A consensus least squares support vector regression (LS-SVR) for analysis of near-infrared spectra of plant samples.

    PubMed

    Li, Yankun; Shao, Xueguang; Cai, Wensheng

    2007-04-15

    Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.

  4. A dynamic multi-level optimal design method with embedded finite-element modeling for power transformers

    NASA Astrophysics Data System (ADS)

    Zhang, Yunpeng; Ho, Siu-lau; Fu, Weinong

    2018-05-01

    This paper proposes a dynamic multi-level optimal design method for power transformer design optimization (TDO) problems. A response surface generated by second-order polynomial regression analysis is updated dynamically by adding more design points, which are selected by Shifted Hammersley Method (SHM) and calculated by finite-element method (FEM). The updating stops when the accuracy requirement is satisfied, and optimized solutions of the preliminary design are derived simultaneously. The optimal design level is modulated through changing the level of error tolerance. Based on the response surface of the preliminary design, a refined optimal design is added using multi-objective genetic algorithm (MOGA). The effectiveness of the proposed optimal design method is validated through a classic three-phase power TDO problem.

  5. Modeling and Predicting the Electrical Conductivity of Composite Cathode for Solid Oxide Fuel Cell by Using Support Vector Regression

    NASA Astrophysics Data System (ADS)

    Tang, J. L.; Cai, C. Z.; Xiao, T. T.; Huang, S. J.

    2012-07-01

    The electrical conductivity of solid oxide fuel cell (SOFC) cathode is one of the most important indices affecting the efficiency of SOFC. In order to improve the performance of fuel cell system, it is advantageous to have accurate model with which one can predict the electrical conductivity. In this paper, a model utilizing support vector regression (SVR) approach combined with particle swarm optimization (PSO) algorithm for its parameter optimization was established to modeling and predicting the electrical conductivity of Ba0.5Sr0.5Co0.8Fe0.2 O3-δ-xSm0.5Sr0.5CoO3-δ (BSCF-xSSC) composite cathode under two influence factors, including operating temperature (T) and SSC content (x) in BSCF-xSSC composite cathode. The leave-one-out cross validation (LOOCV) test result by SVR strongly supports that the generalization ability of SVR model is high enough. The absolute percentage error (APE) of 27 samples does not exceed 0.05%. The mean absolute percentage error (MAPE) of all 30 samples is only 0.09% and the correlation coefficient (R2) as high as 0.999. This investigation suggests that the hybrid PSO-SVR approach may be not only a promising and practical methodology to simulate the properties of fuel cell system, but also a powerful tool to be used for optimal designing or controlling the operating process of a SOFC system.

  6. Glucose Oxidase Biosensor Modeling and Predictors Optimization by Machine Learning Methods.

    PubMed

    Gonzalez-Navarro, Felix F; Stilianova-Stoytcheva, Margarita; Renteria-Gutierrez, Livier; Belanche-Muñoz, Lluís A; Flores-Rios, Brenda L; Ibarra-Esquer, Jorge E

    2016-10-26

    Biosensors are small analytical devices incorporating a biological recognition element and a physico-chemical transducer to convert a biological signal into an electrical reading. Nowadays, their technological appeal resides in their fast performance, high sensitivity and continuous measuring capabilities; however, a full understanding is still under research. This paper aims to contribute to this growing field of biotechnology, with a focus on Glucose-Oxidase Biosensor (GOB) modeling through statistical learning methods from a regression perspective. We model the amperometric response of a GOB with dependent variables under different conditions, such as temperature, benzoquinone, pH and glucose concentrations, by means of several machine learning algorithms. Since the sensitivity of a GOB response is strongly related to these dependent variables, their interactions should be optimized to maximize the output signal, for which a genetic algorithm and simulated annealing are used. We report a model that shows a good generalization error and is consistent with the optimization.

  7. Optimal population prediction of sandhill crane recruitment based on climate-mediated habitat limitations

    USGS Publications Warehouse

    Gerber, Brian D.; Kendall, William L.; Hooten, Mevin B.; Dubovsky, James A.; Drewien, Roderick C.

    2015-01-01

    Prediction is fundamental to scientific enquiry and application; however, ecologists tend to favour explanatory modelling. We discuss a predictive modelling framework to evaluate ecological hypotheses and to explore novel/unobserved environmental scenarios to assist conservation and management decision-makers. We apply this framework to develop an optimal predictive model for juvenile (<1 year old) sandhill crane Grus canadensis recruitment of the Rocky Mountain Population (RMP). We consider spatial climate predictors motivated by hypotheses of how drought across multiple time-scales and spring/summer weather affects recruitment.Our predictive modelling framework focuses on developing a single model that includes all relevant predictor variables, regardless of collinearity. This model is then optimized for prediction by controlling model complexity using a data-driven approach that marginalizes or removes irrelevant predictors from the model. Specifically, we highlight two approaches of statistical regularization, Bayesian least absolute shrinkage and selection operator (LASSO) and ridge regression.Our optimal predictive Bayesian LASSO and ridge regression models were similar and on average 37% superior in predictive accuracy to an explanatory modelling approach. Our predictive models confirmed a priori hypotheses that drought and cold summers negatively affect juvenile recruitment in the RMP. The effects of long-term drought can be alleviated by short-term wet spring–summer months; however, the alleviation of long-term drought has a much greater positive effect on juvenile recruitment. The number of freezing days and snowpack during the summer months can also negatively affect recruitment, while spring snowpack has a positive effect.Breeding habitat, mediated through climate, is a limiting factor on population growth of sandhill cranes in the RMP, which could become more limiting with a changing climate (i.e. increased drought). These effects are likely not unique to cranes. The alteration of hydrological patterns and water levels by drought may impact many migratory, wetland nesting birds in the Rocky Mountains and beyond.Generalizable predictive models (trained by out-of-sample fit and based on ecological hypotheses) are needed by conservation and management decision-makers. Statistical regularization improves predictions and provides a general framework for fitting models with a large number of predictors, even those with collinearity, to simultaneously identify an optimal predictive model while conducting rigorous Bayesian model selection. Our framework is important for understanding population dynamics under a changing climate and has direct applications for making harvest and habitat management decisions.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boutilier, Justin J., E-mail: j.boutilier@mail.utoronto.ca; Lee, Taewoo; Craig, Tim

    Purpose: To develop and evaluate the clinical applicability of advanced machine learning models that simultaneously predict multiple optimization objective function weights from patient geometry for intensity-modulated radiation therapy of prostate cancer. Methods: A previously developed inverse optimization method was applied retrospectively to determine optimal objective function weights for 315 treated patients. The authors used an overlap volume ratio (OV) of bladder and rectum for different PTV expansions and overlap volume histogram slopes (OVSR and OVSB for the rectum and bladder, respectively) as explanatory variables that quantify patient geometry. Using the optimal weights as ground truth, the authors trained and appliedmore » three prediction models: logistic regression (LR), multinomial logistic regression (MLR), and weighted K-nearest neighbor (KNN). The population average of the optimal objective function weights was also calculated. Results: The OV at 0.4 cm and OVSR at 0.1 cm features were found to be the most predictive of the weights. The authors observed comparable performance (i.e., no statistically significant difference) between LR, MLR, and KNN methodologies, with LR appearing to perform the best. All three machine learning models outperformed the population average by a statistically significant amount over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and dose to the bladder, rectum, CTV, and PTV. When comparing the weights directly, the LR model predicted bladder and rectum weights that had, on average, a 73% and 74% relative improvement over the population average weights, respectively. The treatment plans resulting from the LR weights had, on average, a rectum V70Gy that was 35% closer to the clinical plan and a bladder V70Gy that was 29% closer, compared to the population average weights. Similar results were observed for all other clinical metrics. Conclusions: The authors demonstrated that the KNN and MLR weight prediction methodologies perform comparably to the LR model and can produce clinical quality treatment plans by simultaneously predicting multiple weights that capture trade-offs associated with sparing multiple OARs.« less

  9. Data mining-based coefficient of influence factors optimization of test paper reliability

    NASA Astrophysics Data System (ADS)

    Xu, Peiyao; Jiang, Huiping; Wei, Jieyao

    2018-05-01

    Test is a significant part of the teaching process. It demonstrates the final outcome of school teaching through teachers' teaching level and students' scores. The analysis of test paper is a complex operation that has the characteristics of non-linear relation in the length of the paper, time duration and the degree of difficulty. It is therefore difficult to optimize the coefficient of influence factors under different conditions in order to get text papers with clearly higher reliability with general methods [1]. With data mining techniques like Support Vector Regression (SVR) and Genetic Algorithm (GA), we can model the test paper analysis and optimize the coefficient of impact factors for higher reliability. It's easy to find that the combination of SVR and GA can get an effective advance in reliability from the test results. The optimal coefficient of influence factors optimization has a practicability in actual application, and the whole optimizing operation can offer model basis for test paper analysis.

  10. Support vector machine firefly algorithm based optimization of lens system.

    PubMed

    Shamshirband, Shahaboddin; Petković, Dalibor; Pavlović, Nenad T; Ch, Sudheer; Altameem, Torki A; Gani, Abdullah

    2015-01-01

    Lens system design is an important factor in image quality. The main aspect of the lens system design methodology is the optimization procedure. Since optimization is a complex, nonlinear task, soft computing optimization algorithms can be used. There are many tools that can be employed to measure optical performance, but the spot diagram is the most useful. The spot diagram gives an indication of the image of a point object. In this paper, the spot size radius is considered an optimization criterion. Intelligent soft computing scheme support vector machines (SVMs) coupled with the firefly algorithm (FFA) are implemented. The performance of the proposed estimators is confirmed with the simulation results. The result of the proposed SVM-FFA model has been compared with support vector regression (SVR), artificial neural networks, and generic programming methods. The results show that the SVM-FFA model performs more accurately than the other methodologies. Therefore, SVM-FFA can be used as an efficient soft computing technique in the optimization of lens system designs.

  11. Optimism, well-being, depressive symptoms, and perceived physical health: a study among Stroke survivors.

    PubMed

    Shifren, Kim; Anzaldi, Kristen

    2018-01-01

    The investigation of the relation of positive personality characteristics to mental and physical health among Stroke survivors has been a neglected area of research. The purpose of this study was to examine the relationship between optimism, well-being, depressive symptoms, and perceived physical health among Stroke survivors. It was hypothesized that Stroke survivors' optimism would explain variance in their physical health above and beyond the variance explained by demographic variables, diagnostic variables, and mental health. One hundred seventy-six Stroke survivors (97 females, 79 males) completed the Revised Life Orientation Test, the Center for Epidemiological Studies Depression Scale, two items on perceived physical health from the 36-item Short Form of the Medical Outcomes study, and the Identity scale of the Illness Perception Questionnaire. Pearson correlations, hierarchical regression analyses, and the PROCESS approach to determining mediators were used to assess hypothesized relations between variables. Stroke survivors' level of optimism explained additional variance in overall health in regression models controlling for demographic and diagnostic variables, and mental health. Analyses revealed that optimism played a partial mediator role between mental health (well-being, depressive symptoms and total score on CES-D) variables and overall health.

  12. Optimal Fermentation Conditions of Hyaluronidase Inhibition Activity on Asparagus cochinchinensis Merrill by Weissella cibaria.

    PubMed

    Kim, Minji; Kim, Won-Baek; Koo, Kyoung Yoon; Kim, Bo Ram; Kim, Doohyun; Lee, Seoyoun; Son, Hong Joo; Hwang, Dae Youn; Kim, Dong Seob; Lee, Chung Yeoul; Lee, Heeseob

    2017-04-28

    This study was conducted to evaluate the hyaluronidase (HAase) inhibition activity of Asparagus cochinchinesis (AC) extracts following fermentation by Weissella cibaria through response surface methodology. To optimize the HAase inhibition activity, a central composite design was introduced based on four variables: the concentration of AC extract ( X 1 : 1-5%), amount of starter culture ( X 2 : 1-5%), pH ( X 3 : 4-8), and fermentation time ( X 4 : 0-10 days). The experimental data were fitted to quadratic regression equations, the accuracy of the equations was analyzed by ANOVA, and the regression coefficients for the surface quadratic model of HAase inhibition activity in the fermented AC extract were estimated by the F test and the corresponding p values. The HAase inhibition activity indicated that fermentation time was most significant among the parameters within the conditions tested. To validate the model, two different conditions among those generated by the Design Expert program were selected. Under both conditions, predicted and experimental data agreed well. Moreover, the content of protodioscin (a well-known compound related to anti-inflammation activity) was elevated after fermentation of the AC extract at the optimized fermentation condition.

  13. [Optimal extraction of effective constituents from Aralia elata by central composite design and response surface methodology].

    PubMed

    Lv, Shao-Wa; Liu, Dong; Hu, Pan-Pan; Ye, Xu-Yan; Xiao, Hong-Bin; Kuang, Hai-Xue

    2010-03-01

    To optimize the process of extracting effective constituents from Aralia elata by response surface methodology. The independent variables were ethanol concentration, reflux time and solvent fold, the dependent variable was extraction rate of total saponins in Aralia elata. Linear or no-linear mathematic models were used to estimate the relationship between independent and dependent variables. Response surface methodology was used to optimize the process of extraction. The prediction was carried out through comparing the observed and predicted values. Regression coefficient of binomial fitting complex model was as high as 0.9617, the optimum conditions of extraction process were 70% ethanol, 2.5 hours for reflux, 20-fold solvent and 3 times for extraction. The bias between observed and predicted values was -2.41%. It shows the optimum model is highly predictive.

  14. Statistical optimization of medium components and growth conditions by response surface methodology to enhance phenol degradation by Pseudomonas putida.

    PubMed

    Annadurai, Gurusamy; Ling, Lai Yi; Lee, Jiunn-Fwu

    2008-02-28

    In this work, a four-level Box-Behnken factorial design was employed combining with response surface methodology (RSM) to optimize the medium composition for the degradation of phenol by pseudomonas putida (ATCC 31800). A mathematical model was then developed to show the effect of each medium composition and their interactions on the biodegradation of phenol. Response surface method was using four levels like glucose, yeast extract, ammonium sulfate and sodium chloride, which also enabled the identification of significant effects of interactions for the batch studies. The biodegradation of phenol on Pseudomonas putida (ATCC 31800) was determined to be pH-dependent and the maximum degradation capacity of microorganism at 30 degrees C when the phenol concentration was 0.2 g/L and the pH of the solution was 7.0. Second order polynomial regression model was used for analysis of the experiment. Cubic and quadratic terms were incorporated into the regression model through variable selection procedures. The experimental values are in good agreement with predicted values and the correlation coefficient was found to be 0.9980.

  15. Cross-validation pitfalls when selecting and assessing regression and classification models.

    PubMed

    Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon

    2014-03-29

    We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.

  16. Plateletpheresis efficiency and mathematical correction of software-derived platelet yield prediction: A linear regression and ROC modeling approach.

    PubMed

    Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David

    2017-10-01

    Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P < .001). Means of software machine-derived values differed significantly from actual PLT yield, 4.72 × 10 11 vs.6.12 × 10 11 , respectively, (P < .001). The following equation was developed to adjust these values: actual PLT yield= 0.221 + (1.254 × theoretical platelet yield). ROC curve model showed an optimal apheresis device software prediction cut-off of 4.65 × 10 11 to obtain a DP, with a sensitivity of 82.2%, specificity of 93.3%, and an area under the curve (AUC) of 0.909. Trima Accel v6.0 software consistently underestimated PLT yields. Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.

  17. A Machine-Learning and Filtering Based Data Assimilation Framework for Geologic Carbon Sequestration Monitoring Optimization

    NASA Astrophysics Data System (ADS)

    Chen, B.; Harp, D. R.; Lin, Y.; Keating, E. H.; Pawar, R.

    2017-12-01

    Monitoring is a crucial aspect of geologic carbon sequestration (GCS) risk management. It has gained importance as a means to ensure CO2 is safely and permanently stored underground throughout the lifecycle of a GCS project. Three issues are often involved in a monitoring project: (i) where is the optimal location to place the monitoring well(s), (ii) what type of data (pressure, rate and/or CO2 concentration) should be measured, and (iii) What is the optimal frequency to collect the data. In order to address these important issues, a filtering-based data assimilation procedure is developed to perform the monitoring optimization. The optimal monitoring strategy is selected based on the uncertainty reduction of the objective of interest (e.g., cumulative CO2 leak) for all potential monitoring strategies. To reduce the computational cost of the filtering-based data assimilation process, two machine-learning algorithms: Support Vector Regression (SVR) and Multivariate Adaptive Regression Splines (MARS) are used to develop the computationally efficient reduced-order-models (ROMs) from full numerical simulations of CO2 and brine flow. The proposed framework for GCS monitoring optimization is demonstrated with two examples: a simple 3D synthetic case and a real field case named Rock Spring Uplift carbon storage site in Southwestern Wyoming.

  18. Product unit neural network models for predicting the growth limits of Listeria monocytogenes.

    PubMed

    Valero, A; Hervás, C; García-Gimeno, R M; Zurera, G

    2007-08-01

    A new approach to predict the growth/no growth interface of Listeria monocytogenes as a function of storage temperature, pH, citric acid (CA) and ascorbic acid (AA) is presented. A linear logistic regression procedure was performed and a non-linear model was obtained by adding new variables by means of a Neural Network model based on Product Units (PUNN). The classification efficiency of the training data set and the generalization data of the new Logistic Regression PUNN model (LRPU) were compared with Linear Logistic Regression (LLR) and Polynomial Logistic Regression (PLR) models. 92% of the total cases from the LRPU model were correctly classified, an improvement on the percentage obtained using the PLR model (90%) and significantly higher than the results obtained with the LLR model, 80%. On the other hand predictions of LRPU were closer to data observed which permits to design proper formulations in minimally processed foods. This novel methodology can be applied to predictive microbiology for describing growth/no growth interface of food-borne microorganisms such as L. monocytogenes. The optimal balance is trying to find models with an acceptable interpretation capacity and with good ability to fit the data on the boundaries of variable range. The results obtained conclude that these kinds of models might well be very a valuable tool for mathematical modeling.

  19. Integrated design and manufacturing for the high speed civil transport (a combined aerodynamics/propulsion optimization study)

    NASA Technical Reports Server (NTRS)

    Baecher, Juergen; Bandte, Oliver; DeLaurentis, Dan; Lewis, Kemper; Sicilia, Jose; Soboleski, Craig

    1995-01-01

    This report documents the efforts of a Georgia Tech High Speed Civil Transport (HSCT) aerospace student design team in completing a design methodology demonstration under NASA's Advanced Design Program (ADP). Aerodynamic and propulsion analyses are integrated into the synthesis code FLOPS in order to improve its prediction accuracy. Executing the integrated product and process development (IPPD) methodology proposed at the Aerospace Systems Design Laboratory (ASDL), an improved sizing process is described followed by a combined aero-propulsion optimization, where the objective function, average yield per revenue passenger mile ($/RPM), is constrained by flight stability, noise, approach speed, and field length restrictions. Primary goals include successful demonstration of the application of the response surface methodolgy (RSM) to parameter design, introduction to higher fidelity disciplinary analysis than normally feasible at the conceptual and early preliminary level, and investigations of relationships between aerodynamic and propulsion design parameters and their effect on the objective function, $/RPM. A unique approach to aircraft synthesis is developed in which statistical methods, specifically design of experiments and the RSM, are used to more efficiently search the design space for optimum configurations. In particular, two uses of these techniques are demonstrated. First, response model equations are formed which represent complex analysis in the form of a regression polynomial. Next, a second regression equation is constructed, not for modeling purposes, but instead for the purpose of optimization at the system level. Such an optimization problem with the given tools normally would be difficult due to the need for hard connections between the various complex codes involved. The statistical methodology presents an alternative and is demonstrated via an example of aerodynamic modeling and planform optimization for a HSCT.

  20. TH-E-BRF-06: Kinetic Modeling of Tumor Response to Fractionated Radiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhong, H; Gordon, J; Chetty, I

    2014-06-15

    Purpose: Accurate calibration of radiobiological parameters is crucial to predicting radiation treatment response. Modeling differences may have a significant impact on calibrated parameters. In this study, we have integrated two existing models with kinetic differential equations to formulate a new tumor regression model for calibrating radiobiological parameters for individual patients. Methods: A system of differential equations that characterizes the birth-and-death process of tumor cells in radiation treatment was analytically solved. The solution of this system was used to construct an iterative model (Z-model). The model consists of three parameters: tumor doubling time Td, half-life of dying cells Tr and cellmore » survival fraction SFD under dose D. The Jacobian determinant of this model was proposed as a constraint to optimize the three parameters for six head and neck cancer patients. The derived parameters were compared with those generated from the two existing models, Chvetsov model (C-model) and Lim model (L-model). The C-model and L-model were optimized with the parameter Td fixed. Results: With the Jacobian-constrained Z-model, the mean of the optimized cell survival fractions is 0.43±0.08, and the half-life of dying cells averaged over the six patients is 17.5±3.2 days. The parameters Tr and SFD optimized with the Z-model differ by 1.2% and 20.3% from those optimized with the Td-fixed C-model, and by 32.1% and 112.3% from those optimized with the Td-fixed L-model, respectively. Conclusion: The Z-model was analytically constructed from the cellpopulation differential equations to describe changes in the number of different tumor cells during the course of fractionated radiation treatment. The Jacobian constraints were proposed to optimize the three radiobiological parameters. The developed modeling and optimization methods may help develop high-quality treatment regimens for individual patients.« less

  1. A surrogate model for thermal characteristics of stratospheric airship

    NASA Astrophysics Data System (ADS)

    Zhao, Da; Liu, Dongxu; Zhu, Ming

    2018-06-01

    A simple and accurate surrogate model is extremely needed to reduce the analysis complexity of thermal characteristics for a stratospheric airship. In this paper, a surrogate model based on the Least Squares Support Vector Regression (LSSVR) is proposed. The Gravitational Search Algorithm (GSA) is used to optimize hyper parameters. A novel framework consisting of a preprocessing classifier and two regression models is designed to train the surrogate model. Various temperature datasets of the airship envelope and the internal gas are obtained by a three-dimensional transient model for thermal characteristics. Using these thermal datasets, two-factor and multi-factor surrogate models are trained and several comparison simulations are conducted. Results illustrate that the surrogate models based on LSSVR-GSA have good fitting and generalization abilities. The pre-treated classification strategy proposed in this paper plays a significant role in improving the accuracy of the surrogate model.

  2. Explicit criteria for prioritization of cataract surgery

    PubMed Central

    Ma Quintana, José; Escobar, Antonio; Bilbao, Amaia

    2006-01-01

    Background Consensus techniques have been used previously to create explicit criteria to prioritize cataract extraction; however, the appropriateness of the intervention was not included explicitly in previous studies. We developed a prioritization tool for cataract extraction according to the RAND method. Methods Criteria were developed using a modified Delphi panel judgment process. A panel of 11 ophthalmologists was assembled. Ratings were analyzed regarding the level of agreement among panelists. We studied the effect of all variables on the final panel score using general linear and logistic regression models. Priority scoring systems were developed by means of optimal scaling and general linear models. The explicit criteria developed were summarized by means of regression tree analysis. Results Eight variables were considered to create the indications. Of the 310 indications that the panel evaluated, 22.6% were considered high priority, 52.3% intermediate priority, and 25.2% low priority. Agreement was reached for 31.9% of the indications and disagreement for 0.3%. Logistic regression and general linear models showed that the preoperative visual acuity of the cataractous eye, visual function, and anticipated visual acuity postoperatively were the most influential variables. Alternative and simple scoring systems were obtained by optimal scaling and general linear models where the previous variables were also the most important. The decision tree also shows the importance of the previous variables and the appropriateness of the intervention. Conclusion Our results showed acceptable validity as an evaluation and management tool for prioritizing cataract extraction. It also provides easy algorithms for use in clinical practice. PMID:16512893

  3. A modified temporal criterion to meta-optimize the extended Kalman filter for land cover classification of remotely sensed time series

    NASA Astrophysics Data System (ADS)

    Salmon, B. P.; Kleynhans, W.; Olivier, J. C.; van den Bergh, F.; Wessels, K. J.

    2018-05-01

    Humans are transforming land cover at an ever-increasing rate. Accurate geographical maps on land cover, especially rural and urban settlements are essential to planning sustainable development. Time series extracted from MODerate resolution Imaging Spectroradiometer (MODIS) land surface reflectance products have been used to differentiate land cover classes by analyzing the seasonal patterns in reflectance values. The proper fitting of a parametric model to these time series usually requires several adjustments to the regression method. To reduce the workload, a global setting of parameters is done to the regression method for a geographical area. In this work we have modified a meta-optimization approach to setting a regression method to extract the parameters on a per time series basis. The standard deviation of the model parameters and magnitude of residuals are used as scoring function. We successfully fitted a triply modulated model to the seasonal patterns of our study area using a non-linear extended Kalman filter (EKF). The approach uses temporal information which significantly reduces the processing time and storage requirements to process each time series. It also derives reliability metrics for each time series individually. The features extracted using the proposed method are classified with a support vector machine and the performance of the method is compared to the original approach on our ground truth data.

  4. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    PubMed

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  5. Information Search in Judgment Tasks: The Effects of Unequal Cue Validity and Cost.

    DTIC Science & Technology

    1984-05-01

    bookbag before betting on the contents of the bag being sampled ( Edwards , 1965). They proposed an alternative model for the regression (or continuous...ly displaced vertically for clarity.) The analogous relationship for the Bayesian model is developed by Edwards (1965). Snapper and Peterson (1971...A re- gression model and some preliminary findings." Organizational Behavior and Human Performance, 1982, 30, 330-350. Edwards , W.: "Optimal

  6. Measures of Residual Risk with Connections to Regression, Risk Tracking, Surrogate Models, and Ambiguity

    DTIC Science & Technology

    2015-05-20

    original variable. Residual risk can be exempli ed as a quanti cation of the improved situation faced by a hedging investor compared to that of a...distributional information about Yx for every x as well as the computational cost of evaluating R(Yx) for numerous x, for example within an optimization...Still, when g is costly to evaluate , it might be desirable to develop an approximation of R(Yx), x ∈ IRn through regression based on observations {xj

  7. An RSM Study of the Effects of Simulation Work and Metamodel Specification on the Statistical Quality of Metamodel Estimates

    DTIC Science & Technology

    1994-03-01

    optimize, and perform "what-if" analysis on a complicated simulation model of the greenhouse effect . Regression metamodels were applied to several modules of...the large integrated assessment model of the greenhouse effect . In this study, the metamodels gave "acceptable forecast errors" and were shown to

  8. Launch Vehicle Propulsion Design with Multiple Selection Criteria

    NASA Technical Reports Server (NTRS)

    Shelton, Joey D.; Frederick, Robert A.; Wilhite, Alan W.

    2005-01-01

    The approach and techniques described herein define an optimization and evaluation approach for a liquid hydrogen/liquid oxygen single-stage-to-orbit system. The method uses Monte Carlo simulations, genetic algorithm solvers, a propulsion thermo-chemical code, power series regression curves for historical data, and statistical models in order to optimize a vehicle system. The system, including parameters for engine chamber pressure, area ratio, and oxidizer/fuel ratio, was modeled and optimized to determine the best design for seven separate design weight and cost cases by varying design and technology parameters. Significant model results show that a 53% increase in Design, Development, Test and Evaluation cost results in a 67% reduction in Gross Liftoff Weight. Other key findings show the sensitivity of propulsion parameters, technology factors, and cost factors and how these parameters differ when cost and weight are optimized separately. Each of the three key propulsion parameters; chamber pressure, area ratio, and oxidizer/fuel ratio, are optimized in the seven design cases and results are plotted to show impacts to engine mass and overall vehicle mass.

  9. Box-Behnken design for investigation of microwave-assisted extraction of patchouli oil

    NASA Astrophysics Data System (ADS)

    Kusuma, Heri Septya; Mahfud, Mahfud

    2015-12-01

    Microwave-assisted extraction (MAE) technique was employed to extract the essential oil from patchouli (Pogostemon cablin). The optimal conditions for microwave-assisted extraction of patchouli oil were determined by response surface methodology. A Box-Behnken design (BBD) was applied to evaluate the effects of three independent variables (microwave power (A: 400-800 W), plant material to solvent ratio (B: 0.10-0.20 g mL-1) and extraction time (C: 20-60 min)) on the extraction yield of patchouli oil. The correlation analysis of the mathematical-regression model indicated that quadratic polynomial model could be employed to optimize the microwave extraction of patchouli oil. The optimal extraction conditions of patchouli oil was microwave power 634.024 W, plant material to solvent ratio 0.147648 g ml-1 and extraction time 51.6174 min. The maximum patchouli oil yield was 2.80516% under these optimal conditions. Under the extraction condition, the experimental values agreed with the predicted results by analysis of variance. It indicated high fitness of the model used and the success of response surface methodology for optimizing and reflect the expected extraction condition.

  10. A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression.

    PubMed

    Liu, Fang; Eugenio, Evercita C

    2018-04-01

    Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.

  11. Studies in Software Cost Model Behavior: Do We Really Understand Cost Model Performance?

    NASA Technical Reports Server (NTRS)

    Lum, Karen; Hihn, Jairus; Menzies, Tim

    2006-01-01

    While there exists extensive literature on software cost estimation techniques, industry practice continues to rely upon standard regression-based algorithms. These software effort models are typically calibrated or tuned to local conditions using local data. This paper cautions that current approaches to model calibration often produce sub-optimal models because of the large variance problem inherent in cost data and by including far more effort multipliers than the data supports. Building optimal models requires that a wider range of models be considered while correctly calibrating these models requires rejection rules that prune variables and records and use multiple criteria for evaluating model performance. The main contribution of this paper is to document a standard method that integrates formal model identification, estimation, and validation. It also documents what we call the large variance problem that is a leading cause of cost model brittleness or instability.

  12. SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

    PubMed Central

    Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

    2014-01-01

    The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844

  13. Efficient logistic regression designs under an imperfect population identifier.

    PubMed

    Albert, Paul S; Liu, Aiyi; Nansel, Tonja

    2014-03-01

    Motivated by actual study designs, this article considers efficient logistic regression designs where the population is identified with a binary test that is subject to diagnostic error. We consider the case where the imperfect test is obtained on all participants, while the gold standard test is measured on a small chosen subsample. Under maximum-likelihood estimation, we evaluate the optimal design in terms of sample selection as well as verification. We show that there may be substantial efficiency gains by choosing a small percentage of individuals who test negative on the imperfect test for inclusion in the sample (e.g., verifying 90% test-positive cases). We also show that a two-stage design may be a good practical alternative to a fixed design in some situations. Under optimal and nearly optimal designs, we compare maximum-likelihood and semi-parametric efficient estimators under correct and misspecified models with simulations. The methodology is illustrated with an analysis from a diabetes behavioral intervention trial. © 2013, The International Biometric Society.

  14. RRegrs: an R package for computer-aided model selection with multiple regression models.

    PubMed

    Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L

    2015-01-01

    Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.

  15. Mechanistic analysis of challenge-response experiments.

    PubMed

    Shotwell, M S; Drake, K J; Sidorov, V Y; Wikswo, J P

    2013-09-01

    We present an application of mechanistic modeling and nonlinear longitudinal regression in the context of biomedical response-to-challenge experiments, a field where these methods are underutilized. In this type of experiment, a system is studied by imposing an experimental challenge, and then observing its response. The combination of mechanistic modeling and nonlinear longitudinal regression has brought new insight, and revealed an unexpected opportunity for optimal design. Specifically, the mechanistic aspect of our approach enables the optimal design of experimental challenge characteristics (e.g., intensity, duration). This article lays some groundwork for this approach. We consider a series of experiments wherein an isolated rabbit heart is challenged with intermittent anoxia. The heart responds to the challenge onset, and recovers when the challenge ends. The mean response is modeled by a system of differential equations that describe a candidate mechanism for cardiac response to anoxia challenge. The cardiac system behaves more variably when challenged than when at rest. Hence, observations arising from this experiment exhibit complex heteroscedasticity and sharp changes in central tendency. We present evidence that an asymptotic statistical inference strategy may fail to adequately account for statistical uncertainty. Two alternative methods are critiqued qualitatively (i.e., for utility in the current context), and quantitatively using an innovative Monte-Carlo method. We conclude with a discussion of the exciting opportunities in optimal design of response-to-challenge experiments. © 2013, The International Biometric Society.

  16. Optimism predicts positive health in repatriated prisoners of war.

    PubMed

    Segovia, Francine; Moore, Jeffrey L; Linnville, Steven E; Hoyt, Robert E

    2015-05-01

    "Positive health," defined as a state beyond the mere absence of disease, was used as a model to examine factors for enhancing health despite extreme trauma. The study examined the United States' longest detained American prisoners of war, those held in Vietnam in the 1960s through early 1970s. Positive health was measured using a physical and a psychological composite score for each individual, based on 9 physical and 9 psychological variables. Physical and psychological health was correlated with optimism obtained postrepatriation (circa 1973). Linear regressions were employed to determine which variables contributed most to health ratings. Optimism was the strongest predictor of physical health (β = -.33, t = -2.73, p = .008), followed by fewer sleep complaints (β = -.29, t = -2.52, p = .01). This model accounted for 25% of the variance. Optimism was also the strongest predictor of psychological health (β = -.41, t = -2.87, p = .006), followed by Minnesota Multiphasic Personality Inventory-Psychopathic Deviate (MMPI-PD; McKinley & Hathaway, 1944) scores (β = -.23, t = -1.88, p = .07). This model strongly suggests that optimism is a significant predictor of positive physical and psychological health, and optimism also provides long-term protective benefits. These findings and the utility of this model suggest a promising area for future research and intervention. (c) 2015 APA, all rights reserved).

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.

    Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less

  18. Development and Testing of Building Energy Model Using Non-Linear Auto Regression Artificial Neural Networks

    NASA Astrophysics Data System (ADS)

    Arida, Maya Ahmad

    In 1972 sustainable development concept existed and during The years it became one of the most important solution to save natural resources and energy, but now with rising energy costs and increasing awareness of the effect of global warming, the development of building energy saving methods and models become apparently more necessary for sustainable future. According to U.S. Energy Information Administration EIA (EIA), today buildings in the U.S. consume 72 percent of electricity produced, and use 55 percent of U.S. natural gas. Buildings account for about 40 percent of the energy consumed in the United States, more than industry and transportation. Of this energy, heating and cooling systems use about 55 percent. If energy-use trends continue, buildings will become the largest consumer of global energy by 2025. This thesis proposes procedures and analysis techniques for building energy system and optimization methods using time series auto regression artificial neural networks. The model predicts whole building energy consumptions as a function of four input variables, dry bulb and wet bulb outdoor air temperatures, hour of day and type of day. The proposed model and the optimization process are tested using data collected from an existing building located in Greensboro, NC. The testing results show that the model can capture very well the system performance, and The optimization method was also developed to automate the process of finding the best model structure that can produce the best accurate prediction against the actual data. The results show that the developed model can provide results sufficiently accurate for its use in various energy efficiency and saving estimation applications.

  19. Optimal Control Method of Robot End Position and Orientation Based on Dynamic Tracking Measurement

    NASA Astrophysics Data System (ADS)

    Liu, Dalong; Xu, Lijuan

    2018-01-01

    In order to improve the accuracy of robot pose positioning and control, this paper proposed a dynamic tracking measurement robot pose optimization control method based on the actual measurement of D-H parameters of the robot, the parameters is taken with feedback compensation of the robot, according to the geometrical parameters obtained by robot pose tracking measurement, improved multi sensor information fusion the extended Kalan filter method, with continuous self-optimal regression, using the geometric relationship between joint axes for kinematic parameters in the model, link model parameters obtained can timely feedback to the robot, the implementation of parameter correction and compensation, finally we can get the optimal attitude angle, realize the robot pose optimization control experiments were performed. 6R dynamic tracking control of robot joint robot with independent research and development is taken as experimental subject, the simulation results show that the control method improves robot positioning accuracy, and it has the advantages of versatility, simplicity, ease of operation and so on.

  20. Impacts of land use and population density on seasonal surface water quality using a modified geographically weighted regression.

    PubMed

    Chen, Qiang; Mei, Kun; Dahlgren, Randy A; Wang, Ting; Gong, Jian; Zhang, Minghua

    2016-12-01

    As an important regulator of pollutants in overland flow and interflow, land use has become an essential research component for determining the relationships between surface water quality and pollution sources. This study investigated the use of ordinary least squares (OLS) and geographically weighted regression (GWR) models to identify the impact of land use and population density on surface water quality in the Wen-Rui Tang River watershed of eastern China. A manual variable excluding-selecting method was explored to resolve multicollinearity issues. Standard regression coefficient analysis coupled with cluster analysis was introduced to determine which variable had the greatest influence on water quality. Results showed that: (1) Impact of land use on water quality varied with spatial and seasonal scales. Both positive and negative effects for certain land-use indicators were found in different subcatchments. (2) Urban land was the dominant factor influencing N, P and chemical oxygen demand (COD) in highly urbanized regions, but the relationship was weak as the pollutants were mainly from point sources. Agricultural land was the primary factor influencing N and P in suburban and rural areas; the relationship was strong as the pollutants were mainly from agricultural surface runoff. Subcatchments located in suburban areas were identified with urban land as the primary influencing factor during the wet season while agricultural land was identified as a more prevalent influencing factor during the dry season. (3) Adjusted R 2 values in OLS models using the manual variable excluding-selecting method averaged 14.3% higher than using stepwise multiple linear regressions. However, the corresponding GWR models had adjusted R 2 ~59.2% higher than the optimal OLS models, confirming that GWR models demonstrated better prediction accuracy. Based on our findings, water resource protection policies should consider site-specific land-use conditions within each watershed to optimize mitigation strategies for contrasting land-use characteristics and seasonal variations. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. A New Predictive Model of Centerline Segregation in Continuous Cast Steel Slabs by Using Multivariate Adaptive Regression Splines Approach

    PubMed Central

    García Nieto, Paulino José; González Suárez, Victor Manuel; Álvarez Antón, Juan Carlos; Mayo Bayón, Ricardo; Sirgo Blanco, José Ángel; Díaz Fernández, Ana María

    2015-01-01

    The aim of this study was to obtain a predictive model able to perform an early detection of central segregation severity in continuous cast steel slabs. Segregation in steel cast products is an internal defect that can be very harmful when slabs are rolled in heavy plate mills. In this research work, the central segregation was studied with success using the data mining methodology based on multivariate adaptive regression splines (MARS) technique. For this purpose, the most important physical-chemical parameters are considered. The results of the present study are two-fold. In the first place, the significance of each physical-chemical variable on the segregation is presented through the model. Second, a model for forecasting segregation is obtained. Regression with optimal hyperparameters was performed and coefficients of determination equal to 0.93 for continuity factor estimation and 0.95 for average width were obtained when the MARS technique was applied to the experimental dataset, respectively. The agreement between experimental data and the model confirmed the good performance of the latter.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data,more » optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed (c) DTB and (d) DTF regression models to predict the T. pyriformis toxicity of diverse chemicals. - Highlights: • Ensemble learning (EL) based models constructed for toxicity prediction of chemicals • Predictive models used a few simple non-quantum mechanical molecular descriptors. • EL-based DTB/DTF models successfully discriminated toxic and non-toxic chemicals. • DTB/DTF regression models precisely predicted toxicity of chemicals in multi-species. • Proposed EL based models can be used as tool to predict toxicity of new chemicals.« less

  3. Optimizing complex phenotypes through model-guided multiplex genome engineering

    DOE PAGES

    Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.; ...

    2017-05-25

    Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.

  4. Optimizing complex phenotypes through model-guided multiplex genome engineering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.

    Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.

  5. Optimal design of clinical trials with biologics using dose-time-response models.

    PubMed

    Lange, Markus R; Schmidli, Heinz

    2014-12-30

    Biologics, in particular monoclonal antibodies, are important therapies in serious diseases such as cancer, psoriasis, multiple sclerosis, or rheumatoid arthritis. While most conventional drugs are given daily, the effect of monoclonal antibodies often lasts for months, and hence, these biologics require less frequent dosing. A good understanding of the time-changing effect of the biologic for different doses is needed to determine both an adequate dose and an appropriate time-interval between doses. Clinical trials provide data to estimate the dose-time-response relationship with semi-mechanistic nonlinear regression models. We investigate how to best choose the doses and corresponding sample size allocations in such clinical trials, so that the nonlinear dose-time-response model can be precisely estimated. We consider both local and conservative Bayesian D-optimality criteria for the design of clinical trials with biologics. For determining the optimal designs, computer-intensive numerical methods are needed, and we focus here on the particle swarm optimization algorithm. This metaheuristic optimizer has been successfully used in various areas but has only recently been applied in the optimal design context. The equivalence theorem is used to verify the optimality of the designs. The methodology is illustrated based on results from a clinical study in patients with gout, treated by a monoclonal antibody. Copyright © 2014 John Wiley & Sons, Ltd.

  6. Improving regression-model-based streamwater constituent load estimates derived from serially correlated data

    USGS Publications Warehouse

    Aulenbach, Brent T.

    2013-01-01

    A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model’s calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration–discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.

  7. Comparative study of surrogate models for groundwater contamination source identification at DNAPL-contaminated sites

    NASA Astrophysics Data System (ADS)

    Hou, Zeyu; Lu, Wenxi

    2018-05-01

    Knowledge of groundwater contamination sources is critical for effectively protecting groundwater resources, estimating risks, mitigating disaster, and designing remediation strategies. Many methods for groundwater contamination source identification (GCSI) have been developed in recent years, including the simulation-optimization technique. This study proposes utilizing a support vector regression (SVR) model and a kernel extreme learning machine (KELM) model to enrich the content of the surrogate model. The surrogate model was itself key in replacing the simulation model, reducing the huge computational burden of iterations in the simulation-optimization technique to solve GCSI problems, especially in GCSI problems of aquifers contaminated by dense nonaqueous phase liquids (DNAPLs). A comparative study between the Kriging, SVR, and KELM models is reported. Additionally, there is analysis of the influence of parameter optimization and the structure of the training sample dataset on the approximation accuracy of the surrogate model. It was found that the KELM model was the most accurate surrogate model, and its performance was significantly improved after parameter optimization. The approximation accuracy of the surrogate model to the simulation model did not always improve with increasing numbers of training samples. Using the appropriate number of training samples was critical for improving the performance of the surrogate model and avoiding unnecessary computational workload. It was concluded that the KELM model developed in this work could reasonably predict system responses in given operation conditions. Replacing the simulation model with a KELM model considerably reduced the computational burden of the simulation-optimization process and also maintained high computation accuracy.

  8. Attrition in Psychotherapy: A Survival Analysis

    ERIC Educational Resources Information Center

    Roseborough, David John; McLeod, Jeffrey T.; Wright, Florence I.

    2016-01-01

    Purpose: Attrition is a common problem in psychotherapy and can be defined as clients ending treatment before achieving an optimal response. Method: This longitudinal, archival study utilized data for 3,728 clients, using the Outcome Questionnaire 45.2. A Cox regression proportional hazards (hazard ratios) model was used in order to better…

  9. Optimizing Treatment of Lung Cancer Patients with Comorbidities

    DTIC Science & Technology

    2017-10-01

    of treatment options, comorbid illness, age, sex , histology, and tumor size. We will simulate base case scenarios for stage I NSCLC for all possible...fitting adjusted logistic regression models controlling for age, sex and cancer stage. Results Overall, 5,644 (80.4%) and 1,377 (19.6%) patients

  10. Optimization design of LED heat dissipation structure based on strip fins

    NASA Astrophysics Data System (ADS)

    Xue, Lingyun; Wan, Wenbin; Chen, Qingguang; Rao, Huanle; Xu, Ping

    2018-03-01

    To solve the heat dissipation problem of LED, a radiator structure based on strip fins is designed and the method to optimize the structure parameters of strip fins is proposed in this paper. The combination of RBF neural networks and particle swarm optimization (PSO) algorithm is used for modeling and optimization respectively. During the experiment, the 150 datasets of LED junction temperature when structure parameters of number of strip fins, length, width and height of the fins have different values are obtained by ANSYS software. Then RBF neural network is applied to build the non-linear regression model and the parameters optimization of structure based on particle swarm optimization algorithm is performed with this model. The experimental results show that the lowest LED junction temperature reaches 43.88 degrees when the number of hidden layer nodes in RBF neural network is 10, the two learning factors in particle swarm optimization algorithm are 0.5, 0.5 respectively, the inertia factor is 1 and the maximum number of iterations is 100, and now the number of fins is 64, the distribution structure is 8*8, and the length, width and height of fins are 4.3mm, 4.48mm and 55.3mm respectively. To compare the modeling and optimization results, LED junction temperature at the optimized structure parameters was simulated and the result is 43.592°C which approximately equals to the optimal result. Compared with the ordinary plate-fin-type radiator structure whose temperature is 56.38°C, the structure greatly enhances heat dissipation performance of the structure.

  11. Adaptive adjustment of interval predictive control based on combined model and application in shell brand petroleum distillation tower

    NASA Astrophysics Data System (ADS)

    Sun, Chao; Zhang, Chunran; Gu, Xinfeng; Liu, Bin

    2017-10-01

    Constraints of the optimization objective are often unable to be met when predictive control is applied to industrial production process. Then, online predictive controller will not find a feasible solution or a global optimal solution. To solve this problem, based on Back Propagation-Auto Regressive with exogenous inputs (BP-ARX) combined control model, nonlinear programming method is used to discuss the feasibility of constrained predictive control, feasibility decision theorem of the optimization objective is proposed, and the solution method of soft constraint slack variables is given when the optimization objective is not feasible. Based on this, for the interval control requirements of the controlled variables, the slack variables that have been solved are introduced, the adaptive weighted interval predictive control algorithm is proposed, achieving adaptive regulation of the optimization objective and automatically adjust of the infeasible interval range, expanding the scope of the feasible region, and ensuring the feasibility of the interval optimization objective. Finally, feasibility and effectiveness of the algorithm is validated through the simulation comparative experiments.

  12. Two-dimensional advective transport in ground-water flow parameter estimation

    USGS Publications Warehouse

    Anderman, E.R.; Hill, M.C.; Poeter, E.P.

    1996-01-01

    Nonlinear regression is useful in ground-water flow parameter estimation, but problems of parameter insensitivity and correlation often exist given commonly available hydraulic-head and head-dependent flow (for example, stream and lake gain or loss) observations. To address this problem, advective-transport observations are added to the ground-water flow, parameter-estimation model MODFLOWP using particle-tracking methods. The resulting model is used to investigate the importance of advective-transport observations relative to head-dependent flow observations when either or both are used in conjunction with hydraulic-head observations in a simulation of the sewage-discharge plume at Otis Air Force Base, Cape Cod, Massachusetts, USA. The analysis procedure for evaluating the probable effect of new observations on the regression results consists of two steps: (1) parameter sensitivities and correlations calculated at initial parameter values are used to assess the model parameterization and expected relative contributions of different types of observations to the regression; and (2) optimal parameter values are estimated by nonlinear regression and evaluated. In the Cape Cod parameter-estimation model, advective-transport observations did not significantly increase the overall parameter sensitivity; however: (1) inclusion of advective-transport observations decreased parameter correlation enough for more unique parameter values to be estimated by the regression; (2) realistic uncertainties in advective-transport observations had a small effect on parameter estimates relative to the precision with which the parameters were estimated; and (3) the regression results and sensitivity analysis provided insight into the dynamics of the ground-water flow system, especially the importance of accurate boundary conditions. In this work, advective-transport observations improved the calibration of the model and the estimation of ground-water flow parameters, and use of regression and related techniques produced significant insight into the physical system.

  13. Optimal population prediction of sandhill crane recruitment based on climate-mediated habitat limitations.

    PubMed

    Gerber, Brian D; Kendall, William L; Hooten, Mevin B; Dubovsky, James A; Drewien, Roderick C

    2015-09-01

    1. Prediction is fundamental to scientific enquiry and application; however, ecologists tend to favour explanatory modelling. We discuss a predictive modelling framework to evaluate ecological hypotheses and to explore novel/unobserved environmental scenarios to assist conservation and management decision-makers. We apply this framework to develop an optimal predictive model for juvenile (<1 year old) sandhill crane Grus canadensis recruitment of the Rocky Mountain Population (RMP). We consider spatial climate predictors motivated by hypotheses of how drought across multiple time-scales and spring/summer weather affects recruitment. 2. Our predictive modelling framework focuses on developing a single model that includes all relevant predictor variables, regardless of collinearity. This model is then optimized for prediction by controlling model complexity using a data-driven approach that marginalizes or removes irrelevant predictors from the model. Specifically, we highlight two approaches of statistical regularization, Bayesian least absolute shrinkage and selection operator (LASSO) and ridge regression. 3. Our optimal predictive Bayesian LASSO and ridge regression models were similar and on average 37% superior in predictive accuracy to an explanatory modelling approach. Our predictive models confirmed a priori hypotheses that drought and cold summers negatively affect juvenile recruitment in the RMP. The effects of long-term drought can be alleviated by short-term wet spring-summer months; however, the alleviation of long-term drought has a much greater positive effect on juvenile recruitment. The number of freezing days and snowpack during the summer months can also negatively affect recruitment, while spring snowpack has a positive effect. 4. Breeding habitat, mediated through climate, is a limiting factor on population growth of sandhill cranes in the RMP, which could become more limiting with a changing climate (i.e. increased drought). These effects are likely not unique to cranes. The alteration of hydrological patterns and water levels by drought may impact many migratory, wetland nesting birds in the Rocky Mountains and beyond. 5. Generalizable predictive models (trained by out-of-sample fit and based on ecological hypotheses) are needed by conservation and management decision-makers. Statistical regularization improves predictions and provides a general framework for fitting models with a large number of predictors, even those with collinearity, to simultaneously identify an optimal predictive model while conducting rigorous Bayesian model selection. Our framework is important for understanding population dynamics under a changing climate and has direct applications for making harvest and habitat management decisions. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.

  14. Global Positioning System (GPS) Precipitable Water in Forecasting Lightning at Spaceport Canaveral

    NASA Technical Reports Server (NTRS)

    Kehrer, Kristen C.; Graf, Brian; Roeder, William

    2006-01-01

    This paper evaluates the use of precipitable water (PW) from Global Positioning System (GPS) in lightning prediction. Additional independent verification of an earlier model is performed. This earlier model used binary logistic regression with the following four predictor variables optimally selected from a candidate list of 23 candidate predictors: the current precipitable water value for a given time of the day, the change in GPS-PW over the past 9 hours, the KIndex, and the electric field mill value. This earlier model was not optimized for any specific forecast interval, but showed promise for 6 hour and 1.5 hour forecasts. Two new models were developed and verified. These new models were optimized for two operationally significant forecast intervals. The first model was optimized for the 0.5 hour lightning advisories issued by the 45th Weather Squadron. An additional 1.5 hours was allowed for sensor dwell, communication, calculation, analysis, and advisory decision by the forecaster. Therefore the 0.5 hour advisory model became a 2 hour forecast model for lightning within the 45th Weather Squadron advisory areas. The second model was optimized for major ground processing operations supported by the 45th Weather Squadron, which can require lightning forecasts with a lead-time of up to 7.5 hours. Using the same 1.5 lag as in the other new model, this became a 9 hour forecast model for lightning within 37 km (20 NM)) of the 45th Weather Squadron advisory areas. The two new models were built using binary logistic regression from a list of 26 candidate predictor variables: the current GPS-PW value, the change of GPS-PW over 0.5 hour increments from 0.5 to 12 hours, and the K-index. The new 2 hour model found the following for predictors to be statistically significant, listed in decreasing order of contribution to the forecast: the 0.5 hour change in GPS-PW, the 7.5 hour change in GPS-PW, the current GPS-PW value, and the KIndex. The new 9 hour forecast model found the following five independent variables to be statistically significant, listed in decreasing order of contribution to the forecast: the current GPSPW value, the 8.5 hour change in GPS-PW, the 3.5 hour change in GPS-PW, the 12 hour change in GPS-PW, and the K-Index. In both models, the GPS-PW parameters had better correlation to the lightning forecast than the K-Index, a widely used thunderstorm index. Possible future improvements to this study are discussed.

  15. Post-processing through linear regression

    NASA Astrophysics Data System (ADS)

    van Schaeybroeck, B.; Vannitsem, S.

    2011-03-01

    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  16. Chain pooling to minimize prediction error in subset regression. [Monte Carlo studies using population models

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1974-01-01

    Monte Carlo studies using population models intended to represent response surface applications are reported. Simulated experiments were generated by adding pseudo random normally distributed errors to population values to generate observations. Model equations were fitted to the observations and the decision procedure was used to delete terms. Comparison of values predicted by the reduced models with the true population values enabled the identification of deletion strategies that are approximately optimal for minimizing prediction errors.

  17. Optimization of critical quality attributes in continuous twin-screw wet granulation via design space validated with pilot scale experimental data.

    PubMed

    Liu, Huolong; Galbraith, S C; Ricart, Brendon; Stanton, Courtney; Smith-Goettler, Brandye; Verdi, Luke; O'Connor, Thomas; Lee, Sau; Yoon, Seongkyu

    2017-06-15

    In this study, the influence of key process variables (screw speed, throughput and liquid to solid (L/S) ratio) of a continuous twin screw wet granulation (TSWG) was investigated using a central composite face-centered (CCF) experimental design method. Regression models were developed to predict the process responses (motor torque, granule residence time), granule properties (size distribution, volume average diameter, yield, relative width, flowability) and tablet properties (tensile strength). The effects of the three key process variables were analyzed via contour and interaction plots. The experimental results have demonstrated that all the process responses, granule properties and tablet properties are influenced by changing the screw speed, throughput and L/S ratio. The TSWG process was optimized to produce granules with specific volume average diameter of 150μm and the yield of 95% based on the developed regression models. A design space (DS) was built based on volume average granule diameter between 90 and 200μm and the granule yield larger than 75% with a failure probability analysis using Monte Carlo simulations. Validation experiments successfully validated the robustness and accuracy of the DS generated using the CCF experimental design in optimizing a continuous TSWG process. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Modeling and optimization of dough recipe for breadsticks

    NASA Astrophysics Data System (ADS)

    Krivosheev, A. Yu; Ponomareva, E. I.; Zhuravlev, A. A.; Lukina, S. I.; Alekhina, N. N.

    2018-05-01

    During the work, the authors studied the combined effect of non-traditional raw materials on indicators of quality breadsticks, mathematical methods of experiment planning were applied. The main factors chosen were the dosages of flaxseed flour and grape seed oil. The output parameters were the swelling factor of the products and their strength. Optimization of the formulation composition of the dough for bread sticks was carried out by experimental- statistical methods. As a result of the experiment, mathematical models were constructed in the form of regression equations, adequately describing the process of studies. The statistical processing of the experimental data was carried out by the criteria of Student, Cochran and Fisher (with a confidence probability of 0.95). A mathematical interpretation of the regression equations was given. Optimization of the formulation of the dough for bread sticks was carried out by the method of uncertain Lagrange multipliers. The rational values of the factors were determined: the dosage of flaxseed flour - 14.22% and grape seed oil - 7.8%, ensuring the production of products with the best combination of swelling ratio and strength. On the basis of the data obtained, a recipe and a method for the production of breadsticks "Idea" were proposed (TU (Russian Technical Specifications) 9117-443-02068106-2017).

  19. Robust experimental design for optimizing the microbial inhibitor test for penicillin detection in milk.

    PubMed

    Nagel, O G; Molina, M P; Basílico, J C; Zapata, M L; Althaus, R L

    2009-06-01

    To use experimental design techniques and a multiple logistic regression model to optimize a microbiological inhibition test with dichotomous response for the detection of Penicillin G in milk. A 2(3) x 2(2) robust experimental design with two replications was used. The effects of three control factors (V: culture medium volume, S: spore concentration of Geobacillus stearothermophilus, I: indicator concentration), two noise factors (Dt: diffusion time, Ip: incubation period) and their interactions were studied. The V, S, Dt, Ip factors and V x S, V x Ip, S x Ip interactions showed significant effects. The use of 100 microl culture medium volume, 2 x 10(5) spores ml(-1), 60 min diffusion time and 3 h incubation period is recommended. In these elaboration conditions, the penicillin detection limit was of 3.9 microg l(-1), similar to the maximum residue limit (MRL). Of the two noise factors studied, the incubation period can be controlled by means of the culture medium volume and spore concentration. We were able to optimize bioassays of dichotomous response using an experimental design and logistic regression model for the detection of residues at the level of MRL, aiding in the avoidance of health problems in the consumer.

  20. Glucose Oxidase Biosensor Modeling and Predictors Optimization by Machine Learning Methods †

    PubMed Central

    Gonzalez-Navarro, Felix F.; Stilianova-Stoytcheva, Margarita; Renteria-Gutierrez, Livier; Belanche-Muñoz, Lluís A.; Flores-Rios, Brenda L.; Ibarra-Esquer, Jorge E.

    2016-01-01

    Biosensors are small analytical devices incorporating a biological recognition element and a physico-chemical transducer to convert a biological signal into an electrical reading. Nowadays, their technological appeal resides in their fast performance, high sensitivity and continuous measuring capabilities; however, a full understanding is still under research. This paper aims to contribute to this growing field of biotechnology, with a focus on Glucose-Oxidase Biosensor (GOB) modeling through statistical learning methods from a regression perspective. We model the amperometric response of a GOB with dependent variables under different conditions, such as temperature, benzoquinone, pH and glucose concentrations, by means of several machine learning algorithms. Since the sensitivity of a GOB response is strongly related to these dependent variables, their interactions should be optimized to maximize the output signal, for which a genetic algorithm and simulated annealing are used. We report a model that shows a good generalization error and is consistent with the optimization. PMID:27792165

  1. Fast function-on-scalar regression with penalized basis expansions.

    PubMed

    Reiss, Philip T; Huang, Lei; Mennes, Maarten

    2010-01-01

    Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.

  2. Sequential optimization of a terrestrial biosphere model constrained by multiple satellite based products

    NASA Astrophysics Data System (ADS)

    Ichii, K.; Kondo, M.; Wang, W.; Hashimoto, H.; Nemani, R. R.

    2012-12-01

    Various satellite-based spatial products such as evapotranspiration (ET) and gross primary productivity (GPP) are now produced by integration of ground and satellite observations. Effective use of these multiple satellite-based products in terrestrial biosphere models is an important step toward better understanding of terrestrial carbon and water cycles. However, due to the complexity of terrestrial biosphere models with large number of model parameters, the application of these spatial data sets in terrestrial biosphere models is difficult. In this study, we established an effective but simple framework to refine a terrestrial biosphere model, Biome-BGC, using multiple satellite-based products as constraints. We tested the framework in the monsoon Asia region covered by AsiaFlux observations. The framework is based on the hierarchical analysis (Wang et al. 2009) with model parameter optimization constrained by satellite-based spatial data. The Biome-BGC model is separated into several tiers to minimize the freedom of model parameter selections and maximize the independency from the whole model. For example, the snow sub-model is first optimized using MODIS snow cover product, followed by soil water sub-model optimized by satellite-based ET (estimated by an empirical upscaling method; Support Vector Regression (SVR) method; Yang et al. 2007), photosynthesis model optimized by satellite-based GPP (based on SVR method), and respiration and residual carbon cycle models optimized by biomass data. As a result of initial assessment, we found that most of default sub-models (e.g. snow, water cycle and carbon cycle) showed large deviations from remote sensing observations. However, these biases were removed by applying the proposed framework. For example, gross primary productivities were initially underestimated in boreal and temperate forest and overestimated in tropical forests. However, the parameter optimization scheme successfully reduced these biases. Our analysis shows that terrestrial carbon and water cycle simulations in monsoon Asia were greatly improved, and the use of multiple satellite observations with this framework is an effective way for improving terrestrial biosphere models.

  3. Using an optimal CC-PLSR-RBFNN model and NIR spectroscopy for the starch content determination in corn

    NASA Astrophysics Data System (ADS)

    Jiang, Hao; Lu, Jiangang

    2018-05-01

    Corn starch is an important material which has been traditionally used in the fields of food and chemical industry. In order to enhance the rapidness and reliability of the determination for starch content in corn, a methodology is proposed in this work, using an optimal CC-PLSR-RBFNN calibration model and near-infrared (NIR) spectroscopy. The proposed model was developed based on the optimal selection of crucial parameters and the combination of correlation coefficient method (CC), partial least squares regression (PLSR) and radial basis function neural network (RBFNN). To test the performance of the model, a standard NIR spectroscopy data set was introduced, containing spectral information and chemical reference measurements of 80 corn samples. For comparison, several other models based on the identical data set were also briefly discussed. In this process, the root mean square error of prediction (RMSEP) and coefficient of determination (Rp2) in the prediction set were used to make evaluations. As a result, the proposed model presented the best predictive performance with the smallest RMSEP (0.0497%) and the highest Rp2 (0.9968). Therefore, the proposed method combining NIR spectroscopy with the optimal CC-PLSR-RBFNN model can be helpful to determine starch content in corn.

  4. Comparison of response surface methodology and artificial neural network to enhance the release of reducing sugars from non-edible seed cake by autoclave assisted HCl hydrolysis.

    PubMed

    Shet, Vinayaka B; Palan, Anusha M; Rao, Shama U; Varun, C; Aishwarya, Uday; Raja, Selvaraj; Goveas, Louella Concepta; Vaman Rao, C; Ujwal, P

    2018-02-01

    In the current investigation, statistical approaches were adopted to hydrolyse non-edible seed cake (NESC) of Pongamia and optimize the hydrolysis process by response surface methodology (RSM). Through the RSM approach, the optimized conditions were found to be 1.17%v/v of HCl concentration at 54.12 min for hydrolysis. Under optimized conditions, the release of reducing sugars was found to be 53.03 g/L. The RSM data were used to train the artificial neural network (ANN) and the predictive ability of both models was compared by calculating various statistical parameters. A three-layered ANN model consisting of 2:12:1 topology was developed; the response of the ANN model indicates that it is precise when compared with the RSM model. The fit of the models was expressed with the regression coefficient R 2 , which was found to be 0.975 and 0.888, respectively, for the ANN and RSM models. This further demonstrated that the performance of ANN was better than that of RSM.

  5. Neural Network and Regression Soft Model Extended for PAX-300 Aircraft Engine

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Hopkins, Dale A.

    2002-01-01

    In fiscal year 2001, the neural network and regression capabilities of NASA Glenn Research Center's COMETBOARDS design optimization testbed were extended to generate approximate models for the PAX-300 aircraft engine. The analytical model of the engine is defined through nine variables: the fan efficiency factor, the low pressure of the compressor, the high pressure of the compressor, the high pressure of the turbine, the low pressure of the turbine, the operating pressure, and three critical temperatures (T(sub 4), T(sub vane), and T(sub metal)). Numerical Propulsion System Simulation (NPSS) calculations of the specific fuel consumption (TSFC), as a function of the variables can become time consuming, and numerical instabilities can occur during these design calculations. "Soft" models can alleviate both deficiencies. These approximate models are generated from a set of high-fidelity input-output pairs obtained from the NPSS code and a design of the experiment strategy. A neural network and a regression model with 45 weight factors were trained for the input/output pairs. Then, the trained models were validated through a comparison with the original NPSS code. Comparisons of TSFC versus the operating pressure and of TSFC versus the three temperatures (T(sub 4), T(sub vane), and T(sub metal)) are depicted in the figures. The overall performance was satisfactory for both the regression and the neural network model. The regression model required fewer calculations than the neural network model, and it produced marginally superior results. Training the approximate methods is time consuming. Once trained, the approximate methods generated the solution with only a trivial computational effort, reducing the solution time from hours to less than a minute.

  6. R programming for parameters estimation of geographically weighted ordinal logistic regression (GWOLR) model based on Newton Raphson

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Saputro, Dewi Retno Sari

    2017-03-01

    GWOLR model used for represent relationship between dependent variable has categories and scale of category is ordinal with independent variable influenced the geographical location of the observation site. Parameters estimation of GWOLR model use maximum likelihood provide system of nonlinear equations and hard to be found the result in analytic resolution. By finishing it, it means determine the maximum completion, this thing associated with optimizing problem. The completion nonlinear system of equations optimize use numerical approximation, which one is Newton Raphson method. The purpose of this research is to make iteration algorithm Newton Raphson and program using R software to estimate GWOLR model. Based on the research obtained that program in R can be used to estimate the parameters of GWOLR model by forming a syntax program with command "while".

  7. On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction.

    PubMed

    Crop, F; Van Rompaye, B; Paelinck, L; Vakaet, L; Thierens, H; De Wagter, C

    2008-07-21

    The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry.

  8. The cross-validated AUC for MCP-logistic regression with high-dimensional data.

    PubMed

    Jiang, Dingfeng; Huang, Jian; Zhang, Ying

    2013-10-01

    We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

  9. Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach

    NASA Astrophysics Data System (ADS)

    Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew

    2017-05-01

    This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.

  10. Geometrical quality evaluation in laser cutting of Inconel-718 sheet by using Taguchi based regression analysis and particle swarm optimization

    NASA Astrophysics Data System (ADS)

    Shrivastava, Prashant Kumar; Pandey, Arun Kumar

    2018-03-01

    The Inconel-718 is one of the most demanding advanced engineering materials because of its superior quality. The conventional machining techniques are facing many problems to cut intricate profiles on these materials due to its minimum thermal conductivity, minimum elastic property and maximum chemical affinity at magnified temperature. The laser beam cutting is one of the advanced cutting method that may be used to achieve the geometrical accuracy with more precision by the suitable management of input process parameters. In this research work, the experimental investigation during the pulsed Nd:YAG laser cutting of Inconel-718 has been carried out. The experiments have been conducted by using the well planned orthogonal array L27. The experimentally measured values of different quality characteristics have been used for developing the second order regression models of bottom kerf deviation (KD), bottom kerf width (KW) and kerf taper (KT). The developed models of different quality characteristics have been utilized as a quality function for single-objective optimization by using particle swarm optimization (PSO) method. The optimum results obtained by the proposed hybrid methodology have been compared with experimental results. The comparison of optimized results with the experimental results shows that an individual improvement of 75%, 12.67% and 33.70% in bottom kerf deviation, bottom kerf width, and kerf taper has been observed. The parametric effects of different most significant input process parameters on quality characteristics have also been discussed.

  11. Forecasting urban water demand: A meta-regression analysis.

    PubMed

    Sebri, Maamar

    2016-12-01

    Water managers and planners require accurate water demand forecasts over the short-, medium- and long-term for many purposes. These range from assessing water supply needs over spatial and temporal patterns to optimizing future investments and planning future allocations across competing sectors. This study surveys the empirical literature on the urban water demand forecasting using the meta-analytical approach. Specifically, using more than 600 estimates, a meta-regression analysis is conducted to identify explanations of cross-studies variation in accuracy of urban water demand forecasting. Our study finds that accuracy depends significantly on study characteristics, including demand periodicity, modeling method, forecasting horizon, model specification and sample size. The meta-regression results remain robust to different estimators employed as well as to a series of sensitivity checks performed. The importance of these findings lies in the conclusions and implications drawn out for regulators and policymakers and for academics alike. Copyright © 2016. Published by Elsevier Ltd.

  12. NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.

    PubMed

    Zou, Meng; Liu, Zhaoqi; Zhang, Xiang-Sun; Wang, Yong

    2015-10-15

    In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need. In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named Nearest Centroid Classifier for AUC optimization (NCC-AUC). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell's concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB Non-Small-Cell Lung Cancer (NSCLC). For the genomic data, NCC-AUC outperforms Support Vector Machine (SVM) and Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories. In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis. NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC. ywang@amss.ac.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Pareto fronts for multiobjective optimization design on materials data

    NASA Astrophysics Data System (ADS)

    Gopakumar, Abhijith; Balachandran, Prasanna; Gubernatis, James E.; Lookman, Turab

    Optimizing multiple properties simultaneously is vital in materials design. Here we apply infor- mation driven, statistical optimization strategies blended with machine learning methods, to address multi-objective optimization tasks on materials data. These strategies aim to find the Pareto front consisting of non-dominated data points from a set of candidate compounds with known character- istics. The objective is to find the pareto front in as few additional measurements or calculations as possible. We show how exploration of the data space to find the front is achieved by using uncer- tainties in predictions from regression models. We test our proposed design strategies on multiple, independent data sets including those from computations as well as experiments. These include data sets for Max phases, piezoelectrics and multicomponent alloys.

  14. Analysis and improvement measures of flight delay in China

    NASA Astrophysics Data System (ADS)

    Zang, Yuhang

    2017-03-01

    Firstly, this paper establishes the principal component regression model to analyze the data quantitatively, based on principal component analysis to get the three principal component factors of flight delays. Then the least square method is used to analyze the factors and obtained the regression equation expression by substitution, and then found that the main reason for flight delays is airlines, followed by weather and traffic. Aiming at the above problems, this paper improves the controllable aspects of traffic flow control. For reasons of traffic flow control, an adaptive genetic queuing model is established for the runway terminal area. This paper, establish optimization method that fifteen planes landed simultaneously on the three runway based on Beijing capital international airport, comparing the results with the existing FCFS algorithm, the superiority of the model is proved.

  15. Multi-parameters monitoring during traditional Chinese medicine concentration process with near infrared spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Liu, Ronghua; Sun, Qiaofeng; Hu, Tian; Li, Lian; Nie, Lei; Wang, Jiayue; Zhou, Wanhui; Zang, Hengchang

    2018-03-01

    As a powerful process analytical technology (PAT) tool, near infrared (NIR) spectroscopy has been widely used in real-time monitoring. In this study, NIR spectroscopy was applied to monitor multi-parameters of traditional Chinese medicine (TCM) Shenzhiling oral liquid during the concentration process to guarantee the quality of products. Five lab scale batches were employed to construct quantitative models to determine five chemical ingredients and physical change (samples density) during concentration process. The paeoniflorin, albiflorin, liquiritin and samples density were modeled by partial least square regression (PLSR), while the content of the glycyrrhizic acid and cinnamic acid were modeled by support vector machine regression (SVMR). Standard normal variate (SNV) and/or Savitzkye-Golay (SG) smoothing with derivative methods were adopted for spectra pretreatment. Variable selection methods including correlation coefficient (CC), competitive adaptive reweighted sampling (CARS) and interval partial least squares regression (iPLS) were performed for optimizing the models. The results indicated that NIR spectroscopy was an effective tool to successfully monitoring the concentration process of Shenzhiling oral liquid.

  16. A hybrid neural network model for noisy data regression.

    PubMed

    Lee, Eric W M; Lim, Chee Peng; Yuen, Richard K K; Lo, S M

    2004-04-01

    A hybrid neural network model, based on the fusion of fuzzy adaptive resonance theory (FA ART) and the general regression neural network (GRNN), is proposed in this paper. Both FA and the GRNN are incremental learning systems and are very fast in network training. The proposed hybrid model, denoted as GRNNFA, is able to retain these advantages and, at the same time, to reduce the computational requirements in calculating and storing information of the kernels. A clustering version of the GRNN is designed with data compression by FA for noise removal. An adaptive gradient-based kernel width optimization algorithm has also been devised. Convergence of the gradient descent algorithm can be accelerated by the geometric incremental growth of the updating factor. A series of experiments with four benchmark datasets have been conducted to assess and compare effectiveness of GRNNFA with other approaches. The GRNNFA model is also employed in a novel application task for predicting the evacuation time of patrons at typical karaoke centers in Hong Kong in the event of fire. The results positively demonstrate the applicability of GRNNFA in noisy data regression problems.

  17. Comparative analysis of used car price evaluation models

    NASA Astrophysics Data System (ADS)

    Chen, Chuancan; Hao, Lulu; Xu, Cong

    2017-05-01

    An accurate used car price evaluation is a catalyst for the healthy development of used car market. Data mining has been applied to predict used car price in several articles. However, little is studied on the comparison of using different algorithms in used car price estimation. This paper collects more than 100,000 used car dealing records throughout China to do empirical analysis on a thorough comparison of two algorithms: linear regression and random forest. These two algorithms are used to predict used car price in three different models: model for a certain car make, model for a certain car series and universal model. Results show that random forest has a stable but not ideal effect in price evaluation model for a certain car make, but it shows great advantage in the universal model compared with linear regression. This indicates that random forest is an optimal algorithm when handling complex models with a large number of variables and samples, yet it shows no obvious advantage when coping with simple models with less variables.

  18. Modeling the BOD of Danube River in Serbia using spatial, temporal, and input variables optimized artificial neural network models.

    PubMed

    Šiljić Tomić, Aleksandra N; Antanasijević, Davor Z; Ristić, Mirjana Đ; Perić-Grujić, Aleksandra A; Pocajt, Viktor V

    2016-05-01

    This paper describes the application of artificial neural network models for the prediction of biological oxygen demand (BOD) levels in the Danube River. Eighteen regularly monitored water quality parameters at 17 stations on the river stretch passing through Serbia were used as input variables. The optimization of the model was performed in three consecutive steps: firstly, the spatial influence of a monitoring station was examined; secondly, the monitoring period necessary to reach satisfactory performance was determined; and lastly, correlation analysis was applied to evaluate the relationship among water quality parameters. Root-mean-square error (RMSE) was used to evaluate model performance in the first two steps, whereas in the last step, multiple statistical indicators of performance were utilized. As a result, two optimized models were developed, a general regression neural network model (labeled GRNN-1) that covers the monitoring stations from the Danube inflow to the city of Novi Sad and a GRNN model (labeled GRNN-2) that covers the stations from the city of Novi Sad to the border with Romania. Both models demonstrated good agreement between the predicted and actually observed BOD values.

  19. A statistical experiment design approach for optimizing biodegradation of weathered crude oil in coastal sediments.

    PubMed

    Mohajeri, Leila; Aziz, Hamidi Abdul; Isa, Mohamed Hasnain; Zahed, Mohammad Ali

    2010-02-01

    This work studied the bioremediation of weathered crude oil (WCO) in coastal sediment samples using central composite face centered design (CCFD) under response surface methodology (RSM). Initial oil concentration, biomass, nitrogen and phosphorus concentrations were used as independent variables (factors) and oil removal as dependent variable (response) in a 60 days trial. A statistically significant model for WCO removal was obtained. The coefficient of determination (R(2)=0.9732) and probability value (P<0.0001) demonstrated significance for the regression model. Numerical optimization based on desirability function were carried out for initial oil concentration of 2, 16 and 30 g per kg sediment and 83.13, 78.06 and 69.92 per cent removal were observed respectively, compare to 77.13, 74.17 and 69.87 per cent removal for un-optimized results.

  20. Multi-objective Optimization of Pulsed Gas Metal Arc Welding Process Using Neuro NSGA-II

    NASA Astrophysics Data System (ADS)

    Pal, Kamal; Pal, Surjya K.

    2018-05-01

    Weld quality is a critical issue in fabrication industries where products are custom-designed. Multi-objective optimization results number of solutions in the pareto-optimal front. Mathematical regression model based optimization methods are often found to be inadequate for highly non-linear arc welding processes. Thus, various global evolutionary approaches like artificial neural network, genetic algorithm (GA) have been developed. The present work attempts with elitist non-dominated sorting GA (NSGA-II) for optimization of pulsed gas metal arc welding process using back propagation neural network (BPNN) based weld quality feature models. The primary objective to maintain butt joint weld quality is the maximization of tensile strength with minimum plate distortion. BPNN has been used to compute the fitness of each solution after adequate training, whereas NSGA-II algorithm generates the optimum solutions for two conflicting objectives. Welding experiments have been conducted on low carbon steel using response surface methodology. The pareto-optimal front with three ranked solutions after 20th generations was considered as the best without further improvement. The joint strength as well as transverse shrinkage was found to be drastically improved over the design of experimental results as per validated pareto-optimal solutions obtained.

  1. [Application of simulated annealing method and neural network on optimizing soil sampling schemes based on road distribution].

    PubMed

    Han, Zong-wei; Huang, Wei; Luo, Yun; Zhang, Chun-di; Qi, Da-cheng

    2015-03-01

    Taking the soil organic matter in eastern Zhongxiang County, Hubei Province, as a research object, thirteen sample sets from different regions were arranged surrounding the road network, the spatial configuration of which was optimized by the simulated annealing approach. The topographic factors of these thirteen sample sets, including slope, plane curvature, profile curvature, topographic wetness index, stream power index and sediment transport index, were extracted by the terrain analysis. Based on the results of optimization, a multiple linear regression model with topographic factors as independent variables was built. At the same time, a multilayer perception model on the basis of neural network approach was implemented. The comparison between these two models was carried out then. The results revealed that the proposed approach was practicable in optimizing soil sampling scheme. The optimal configuration was capable of gaining soil-landscape knowledge exactly, and the accuracy of optimal configuration was better than that of original samples. This study designed a sampling configuration to study the soil attribute distribution by referring to the spatial layout of road network, historical samples, and digital elevation data, which provided an effective means as well as a theoretical basis for determining the sampling configuration and displaying spatial distribution of soil organic matter with low cost and high efficiency.

  2. A deep belief network with PLSR for nonlinear system modeling.

    PubMed

    Qiao, Junfei; Wang, Gongming; Li, Wenjing; Li, Xiaoli

    2018-08-01

    Nonlinear system modeling plays an important role in practical engineering, and deep learning-based deep belief network (DBN) is now popular in nonlinear system modeling and identification because of the strong learning ability. However, the existing weights optimization for DBN is based on gradient, which always leads to a local optimum and a poor training result. In this paper, a DBN with partial least square regression (PLSR-DBN) is proposed for nonlinear system modeling, which focuses on the problem of weights optimization for DBN using PLSR. Firstly, unsupervised contrastive divergence (CD) algorithm is used in weights initialization. Secondly, initial weights derived from CD algorithm are optimized through layer-by-layer PLSR modeling from top layer to bottom layer. Instead of gradient method, PLSR-DBN can determine the optimal weights using several PLSR models, so that a better performance of PLSR-DBN is achieved. Then, the analysis of convergence is theoretically given to guarantee the effectiveness of the proposed PLSR-DBN model. Finally, the proposed PLSR-DBN is tested on two benchmark nonlinear systems and an actual wastewater treatment system as well as a handwritten digit recognition (nonlinear mapping and modeling) with high-dimension input data. The experiment results show that the proposed PLSR-DBN has better performances of time and accuracy on nonlinear system modeling than that of other methods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. School Cost Functions: A Meta-Regression Analysis

    ERIC Educational Resources Information Center

    Colegrave, Andrew D.; Giles, Margaret J.

    2008-01-01

    The education cost literature includes econometric studies attempting to determine economies of scale, or estimate an optimal school or district size. Not only do their results differ, but the studies use dissimilar data, techniques, and models. To derive value from these studies requires that the estimates be made comparable. One method to do…

  4. Optimization and modeling of flow characteristics of low-oil DDGS using regression techniques

    USDA-ARS?s Scientific Manuscript database

    Storage conditions such as temperature, relative humidity (RH), consolidation pressure (CP), and time affect flow behavior of bulk solids like distillers dried grains with solubles (DDGS), which is widely used as animal feed by the U.S. cattle and swine industries. The typical dry grind DDGS product...

  5. Bayesian Optimization for Neuroimaging Pre-processing in Brain Age Classification and Prediction

    PubMed Central

    Lancaster, Jenessa; Lorenz, Romy; Leech, Rob; Cole, James H.

    2018-01-01

    Neuroimaging-based age prediction using machine learning is proposed as a biomarker of brain aging, relating to cognitive performance, health outcomes and progression of neurodegenerative disease. However, even leading age-prediction algorithms contain measurement error, motivating efforts to improve experimental pipelines. T1-weighted MRI is commonly used for age prediction, and the pre-processing of these scans involves normalization to a common template and resampling to a common voxel size, followed by spatial smoothing. Resampling parameters are often selected arbitrarily. Here, we sought to improve brain-age prediction accuracy by optimizing resampling parameters using Bayesian optimization. Using data on N = 2003 healthy individuals (aged 16–90 years) we trained support vector machines to (i) distinguish between young (<22 years) and old (>50 years) brains (classification) and (ii) predict chronological age (regression). We also evaluated generalisability of the age-regression model to an independent dataset (CamCAN, N = 648, aged 18–88 years). Bayesian optimization was used to identify optimal voxel size and smoothing kernel size for each task. This procedure adaptively samples the parameter space to evaluate accuracy across a range of possible parameters, using independent sub-samples to iteratively assess different parameter combinations to arrive at optimal values. When distinguishing between young and old brains a classification accuracy of 88.1% was achieved, (optimal voxel size = 11.5 mm3, smoothing kernel = 2.3 mm). For predicting chronological age, a mean absolute error (MAE) of 5.08 years was achieved, (optimal voxel size = 3.73 mm3, smoothing kernel = 3.68 mm). This was compared to performance using default values of 1.5 mm3 and 4mm respectively, resulting in MAE = 5.48 years, though this 7.3% improvement was not statistically significant. When assessing generalisability, best performance was achieved when applying the entire Bayesian optimization framework to the new dataset, out-performing the parameters optimized for the initial training dataset. Our study outlines the proof-of-principle that neuroimaging models for brain-age prediction can use Bayesian optimization to derive case-specific pre-processing parameters. Our results suggest that different pre-processing parameters are selected when optimization is conducted in specific contexts. This potentially motivates use of optimization techniques at many different points during the experimental process, which may improve statistical sensitivity and reduce opportunities for experimenter-led bias. PMID:29483870

  6. Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas

    PubMed Central

    Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian

    2016-01-01

    The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R2 of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R2 between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R2 value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R2 and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible. PMID:27367708

  7. Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas.

    PubMed

    Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian

    2016-06-28

    The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R² of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R² between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R² value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R² and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible.

  8. A hybrid constructed wetland for organic-material and nutrient removal from sewage: Process performance and multi-kinetic models.

    PubMed

    Nguyen, X Cuong; Chang, S Woong; Nguyen, Thi Loan; Ngo, H Hao; Kumar, Gopalakrishnan; Banu, J Rajesh; Vu, M Cuong; Le, H Sinh; Nguyen, D Duc

    2018-09-15

    A pilot-scale hybrid constructed wetland with vertical flow and horizontal flow in series was constructed and used to investigate organic material and nutrient removal rate constants for wastewater treatment and establish a practical predictive model for use. For this purpose, the performance of multiple parameters was statistically evaluated during the process and predictive models were suggested. The measurement of the kinetic rate constant was based on the use of the first-order derivation and Monod kinetic derivation (Monod) paired with a plug flow reactor (PFR) and a continuously stirred tank reactor (CSTR). Both the Lindeman, Merenda, and Gold (LMG) analysis and Bayesian model averaging (BMA) method were employed for identifying the relative importance of variables and their optimal multiple regression (MR). The results showed that the first-order-PFR (M 2 ) model did not fit the data (P > 0.05, and R 2  < 0.5), whereas the first-order-CSTR (M 1 ) model for the chemical oxygen demand (COD Cr ) and Monod-CSTR (M 3 ) model for the COD Cr and ammonium nitrogen (NH 4 -N) showed a high correlation with the experimental data (R 2  > 0.5). The pollutant removal rates in the case of M 1 were 0.19 m/d (COD Cr ) and those for M 3 were 25.2 g/m 2 ∙d for COD Cr and 2.63 g/m 2 ∙d for NH 4 -N. By applying a multi-variable linear regression method, the optimal empirical models were established for predicting the final effluent concentration of five days' biochemical oxygen demand (BOD 5 ) and NH 4 -N. In general, the hydraulic loading rate was considered an important variable having a high value of relative importance, which appeared in all the optimal predictive models. Copyright © 2018 Elsevier Ltd. All rights reserved.

  9. Successive Projections Algorithm-Multivariable Linear Regression Classifier for the Detection of Contaminants on Chicken Carcasses in Hyperspectral Images

    NASA Astrophysics Data System (ADS)

    Wu, W.; Chen, G. Y.; Kang, R.; Xia, J. C.; Huang, Y. P.; Chen, K. J.

    2017-07-01

    During slaughtering and further processing, chicken carcasses are inevitably contaminated by microbial pathogen contaminants. Due to food safety concerns, many countries implement a zero-tolerance policy that forbids the placement of visibly contaminated carcasses in ice-water chiller tanks during processing. Manual detection of contaminants is labor consuming and imprecise. Here, a successive projections algorithm (SPA)-multivariable linear regression (MLR) classifier based on an optimal performance threshold was developed for automatic detection of contaminants on chicken carcasses. Hyperspectral images were obtained using a hyperspectral imaging system. A regression model of the classifier was established by MLR based on twelve characteristic wavelengths (505, 537, 561, 562, 564, 575, 604, 627, 656, 665, 670, and 689 nm) selected by SPA , and the optimal threshold T = 1 was obtained from the receiver operating characteristic (ROC) analysis. The SPA-MLR classifier provided the best detection results when compared with the SPA-partial least squares (PLS) regression classifier and the SPA-least squares supported vector machine (LS-SVM) classifier. The true positive rate (TPR) of 100% and the false positive rate (FPR) of 0.392% indicate that the SPA-MLR classifier can utilize spatial and spectral information to effectively detect contaminants on chicken carcasses.

  10. Optimal Sparse Upstream Sensor Placement for Hydrokinetic Turbines

    NASA Astrophysics Data System (ADS)

    Cavagnaro, Robert; Strom, Benjamin; Ross, Hannah; Hill, Craig; Polagye, Brian

    2016-11-01

    Accurate measurement of the flow field incident upon a hydrokinetic turbine is critical for performance evaluation during testing and setting boundary conditions in simulation. Additionally, turbine controllers may leverage real-time flow measurements. Particle image velocimetry (PIV) is capable of rendering a flow field over a wide spatial domain in a controlled, laboratory environment. However, PIV's lack of suitability for natural marine environments, high cost, and intensive post-processing diminish its potential for control applications. Conversely, sensors such as acoustic Doppler velocimeters (ADVs), are designed for field deployment and real-time measurement, but over a small spatial domain. Sparsity-promoting regression analysis such as LASSO is utilized to improve the efficacy of point measurements for real-time applications by determining optimal spatial placement for a small number of ADVs using a training set of PIV velocity fields and turbine data. The study is conducted in a flume (0.8 m2 cross-sectional area, 1 m/s flow) with laboratory-scale axial and cross-flow turbines. Predicted turbine performance utilizing the optimal sparse sensor network and associated regression model is compared to actual performance with corresponding PIV measurements.

  11. Batch and semi-continuous anaerobic co-digestion of goose manure with alkali solubilized wheat straw: A case of carbon to nitrogen ratio and organic loading rate regression optimization.

    PubMed

    Hassan, Muhammad; Ding, Weimin; Umar, Muhammad; Rasool, Ghulam

    2017-04-01

    The present study focused on carbon to nitrogen ratio (C/N) and organic loading rate (OLR) optimization of goose manure (GM) and wheat straw (WS). Dealing the anaerobic digestion of poultry manure on industrial scale; the question of optimum C/N (mixing ratio) and OLR (daily feeding concentration) have significant importance still lack in literature. Therefore, Batch and CSTR co-digestion experiments of the GM and WS were carried out at mesophilic condition. The alkali (NaOH) solubilization pretreatment for the WS had greatly enhanced its anaerobic digestibility. The highest methane production was evaluated between the C/N of 20-30 during Batch experimentation while for CSTRs; the second applied OLR of (3g.VS/L.d) was proved as the optimum with maximum methane production capability of 254.65ml/g.VS for reactor B at C/N of 25. The C/N and OLR regression optimization models were developed for their commercial scale usefulness. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression

    PubMed Central

    Yang, Aiyuan; Yan, Chunxia; Zhu, Feng; Zhao, Zhongmeng; Cao, Zhi

    2013-01-01

    Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR) is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR), which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds. PMID:23984382

  13. From leaf longevity to canopy seasonality: a carbon optimality phenology model for tropical evergreen forests

    NASA Astrophysics Data System (ADS)

    Xu, X.; Medvigy, D.; Wu, J.; Wright, S. J.; Kitajima, K.; Pacala, S. W.

    2016-12-01

    Tropical evergreen forests play a key role in the global carbon, water and energy cycles. Despite apparent evergreenness, this biome shows strong seasonality in leaf litter and photosynthesis. Recent studies have suggested that this seasonality is not directly related to environmental variability but is dominated by seasonal changes of leaf development and senescence. Meanwhile, current terrestrial biosphere models (TBMs) can not capture this pattern because leaf life cycle is highly underrepresented. One challenge to model this leaf life cycle is the remarkable diversity in leaf longevity, ranging from several weeks to multiple years. Ecologists have proposed models where leaf longevity is regarded as a strategy to optimize carbon gain. However previous optimality models can not be readily integrated into TBMs because (i) there are still large biases in predicted leaf longevity and (ii) it is never tested whether the carbon optimality model can capture the observed seasonality in leaf demography and canopy photosynthesis. In this study, we develop a new carbon optimality model for leaf demography. The novelty of our approach is two-fold. First, we incorporate a mechanistic photosynthesis model that can better estimate leaf carbon gain. Second, we consider the interspecific variations in leaf senescence rate, which strongly influence the modelled optimal carbon gain. We test our model with a leaf trait database for Panamanian evergreen forests. Then, we apply the model at seasonal scale and compare simulated seasonality of leaf litter and canopy photosynthesis with in-situ observations from several Amazonian forest sites. We find that (i) compared with original optimality model, the regression slope between observed and predicted leaf longevity increases from 0.15 to 1.04 in our new model and (ii) that our new model can capture the observed seasonal variations of leaf demography and canopy photosynthesis. Our results suggest that the phenology in tropical evergreen forests might result from plant adaptation to optimize canopy carbon gain. Finally, this proposed trait-driven prognostic phenology model could potentially be incorporated into next generation TBMs to improve simulation of carbon and water fluxes in the tropics.

  14. Random regression analysis for body weights and main morphological traits in genetically improved farmed tilapia (Oreochromis niloticus).

    PubMed

    He, Jie; Zhao, Yunfeng; Zhao, Jingli; Gao, Jin; Xu, Pao; Yang, Runqing

    2018-02-01

    To genetically analyse growth traits in genetically improved farmed tilapia (GIFT), the body weight (BWE) and main morphological traits, including body length (BL), body depth (BD), body width (BWI), head length (HL) and length of the caudal peduncle (CPL), were measured six times in growth duration on 1451 fish from 45 mixed families of full and half sibs. A random regression model (RRM) was used to model genetic changes of the growth traits with days of age and estimate the heritability for any growth point and genetic correlations between pairwise growth points. Using the covariance function based on optimal RRMs, the heritabilities were estimated to be from 0.102 to 0.662 for BWE, 0.157 to 0.591 for BL, 0.047 to 0.621 for BD, 0.018 to 0.577 for BWI, 0.075 to 0.597 for HL and 0.032 to 0.610 for CPL between 60 and 140 days of age. All genetic correlations exceeded 0.5 between pairwise growth points. Moreover, the traits at initial days of age showed less correlation with those at later days of age. With phenotypes observed repeatedly, the model choice showed that the optimal RRMs could more precisely predict breeding values at a specific growth time than repeatability models or multiple trait animal models, which enhanced the efficiency of selection for the BWE and main morphological traits.

  15. Ozone indices based on simple meteorological parameters: potentials and limitations of regression and neural network models

    NASA Astrophysics Data System (ADS)

    Soja, G.; Soja, A.-M.

    This study tested the usefulness of extremely simple meteorological models for the prediction of ozone indices. The models were developed with the input parameters of daily maximum temperature and sunshine duration and are based on a data collection period of three years. For a rural environment in eastern Austria, the meteorological and ozone data of three summer periods have been used to develop functions to describe three ozone exposure indices (daily maximum, 7 h mean 9.00-16.00 h, accumulated ozone dose AOT40). Data sets for other years or stations not included in the development of the models were used as test data to validate the performance of the models. Generally, optimized regression models performed better than simplest linear models, especially in the case of AOT40. For the description of the summer period from May to September, the mean absolute daily differences between observed and calculated indices were 8±6 ppb for the maximum half hour mean value, 6±5 ppb for the 7 h mean and 41±40 ppb h for the AOT40. When the parameters were further optimized to describe individual months separately, the mean absolute residuals decreased by ⩽10%. Neural network models did not always perform better than the regression models. This is attributed to the low number of inputs in this comparison and to the simple architecture of these models (2-2-1). Further factorial analyses of those days when the residuals were higher than the mean plus one standard deviation should reveal possible reasons why the models did not perform well on certain days. It was observed that overestimations by the models mainly occurred on days with partly overcast, hazy or very windy conditions. Underestimations more frequently occurred on weekdays than on weekends. It is suggested that the application of this kind of meteorological model will be more successful in topographically homogeneous regions and in rural environments with relatively constant rates of emission and long-range transport of ozone precursors. Under conditions too demanding for advanced physico/chemical models, the presented models may offer useful alternatives to derive ecologically relevant ozone indices directly from meteorological parameters.

  16. Baseline Correction of Diffuse Reflection Near-Infrared Spectra Using Searching Region Standard Normal Variate (SRSNV).

    PubMed

    Genkawa, Takuma; Shinzawa, Hideyuki; Kato, Hideaki; Ishikawa, Daitaro; Murayama, Kodai; Komiyama, Makoto; Ozaki, Yukihiro

    2015-12-01

    An alternative baseline correction method for diffuse reflection near-infrared (NIR) spectra, searching region standard normal variate (SRSNV), was proposed. Standard normal variate (SNV) is an effective pretreatment method for baseline correction of diffuse reflection NIR spectra of powder and granular samples; however, its baseline correction performance depends on the NIR region used for SNV calculation. To search for an optimal NIR region for baseline correction using SNV, SRSNV employs moving window partial least squares regression (MWPLSR), and an optimal NIR region is identified based on the root mean square error (RMSE) of cross-validation of the partial least squares regression (PLSR) models with the first latent variable (LV). The performance of SRSNV was evaluated using diffuse reflection NIR spectra of mixture samples consisting of wheat flour and granular glucose (0-100% glucose at 5% intervals). From the obtained NIR spectra of the mixture in the 10 000-4000 cm(-1) region at 4 cm intervals (1501 spectral channels), a series of spectral windows consisting of 80 spectral channels was constructed, and then SNV spectra were calculated for each spectral window. Using these SNV spectra, a series of PLSR models with the first LV for glucose concentration was built. A plot of RMSE versus the spectral window position obtained using the PLSR models revealed that the 8680–8364 cm(-1) region was optimal for baseline correction using SNV. In the SNV spectra calculated using the 8680–8364 cm(-1) region (SRSNV spectra), a remarkable relative intensity change between a band due to wheat flour at 8500 cm(-1) and that due to glucose at 8364 cm(-1) was observed owing to successful baseline correction using SNV. A PLSR model with the first LV based on the SRSNV spectra yielded a determination coefficient (R2) of 0.999 and an RMSE of 0.70%, while a PLSR model with three LVs based on SNV spectra calculated in the full spectral region gave an R2 of 0.995 and an RMSE of 2.29%. Additional evaluation of SRSNV was carried out using diffuse reflection NIR spectra of marzipan and corn samples, and PLSR models based on SRSNV spectra showed good prediction results. These evaluation results indicate that SRSNV is effective in baseline correction of diffuse reflection NIR spectra and provides regression models with good prediction accuracy.

  17. Spatial interpolation schemes of daily precipitation for hydrologic modeling

    USGS Publications Warehouse

    Hwang, Y.; Clark, M.R.; Rajagopalan, B.; Leavesley, G.

    2012-01-01

    Distributed hydrologic models typically require spatial estimates of precipitation interpolated from sparsely located observational points to the specific grid points. We compare and contrast the performance of regression-based statistical methods for the spatial estimation of precipitation in two hydrologically different basins and confirmed that widely used regression-based estimation schemes fail to describe the realistic spatial variability of daily precipitation field. The methods assessed are: (1) inverse distance weighted average; (2) multiple linear regression (MLR); (3) climatological MLR; and (4) locally weighted polynomial regression (LWP). In order to improve the performance of the interpolations, the authors propose a two-step regression technique for effective daily precipitation estimation. In this simple two-step estimation process, precipitation occurrence is first generated via a logistic regression model before estimate the amount of precipitation separately on wet days. This process generated the precipitation occurrence, amount, and spatial correlation effectively. A distributed hydrologic model (PRMS) was used for the impact analysis in daily time step simulation. Multiple simulations suggested noticeable differences between the input alternatives generated by three different interpolation schemes. Differences are shown in overall simulation error against the observations, degree of explained variability, and seasonal volumes. Simulated streamflows also showed different characteristics in mean, maximum, minimum, and peak flows. Given the same parameter optimization technique, LWP input showed least streamflow error in Alapaha basin and CMLR input showed least error (still very close to LWP) in Animas basin. All of the two-step interpolation inputs resulted in lower streamflow error compared to the directly interpolated inputs. ?? 2011 Springer-Verlag.

  18. Multiple local feature representations and their fusion based on an SVR model for iris recognition using optimized Gabor filters

    NASA Astrophysics Data System (ADS)

    He, Fei; Liu, Yuanning; Zhu, Xiaodong; Huang, Chun; Han, Ye; Dong, Hongxing

    2014-12-01

    Gabor descriptors have been widely used in iris texture representations. However, fixed basic Gabor functions cannot match the changing nature of diverse iris datasets. Furthermore, a single form of iris feature cannot overcome difficulties in iris recognition, such as illumination variations, environmental conditions, and device variations. This paper provides multiple local feature representations and their fusion scheme based on a support vector regression (SVR) model for iris recognition using optimized Gabor filters. In our iris system, a particle swarm optimization (PSO)- and a Boolean particle swarm optimization (BPSO)-based algorithm is proposed to provide suitable Gabor filters for each involved test dataset without predefinition or manual modulation. Several comparative experiments on JLUBR-IRIS, CASIA-I, and CASIA-V4-Interval iris datasets are conducted, and the results show that our work can generate improved local Gabor features by using optimized Gabor filters for each dataset. In addition, our SVR fusion strategy may make full use of their discriminative ability to improve accuracy and reliability. Other comparative experiments show that our approach may outperform other popular iris systems.

  19. Noninvasive and fast measurement of blood glucose in vivo by near infrared (NIR) spectroscopy

    NASA Astrophysics Data System (ADS)

    Jintao, Xue; Liming, Ye; Yufei, Liu; Chunyan, Li; Han, Chen

    2017-05-01

    This research was to develop a method for noninvasive and fast blood glucose assay in vivo. Near-infrared (NIR) spectroscopy, a more promising technique compared to other methods, was investigated in rats with diabetes and normal rats. Calibration models are generated by two different multivariate strategies: partial least squares (PLS) as linear regression method and artificial neural networks (ANN) as non-linear regression method. The PLS model was optimized individually by considering spectral range, spectral pretreatment methods and number of model factors, while the ANN model was studied individually by selecting spectral pretreatment methods, parameters of network topology, number of hidden neurons, and times of epoch. The results of the validation showed the two models were robust, accurate and repeatable. Compared to the ANN model, the performance of the PLS model was much better, with lower root mean square error of validation (RMSEP) of 0.419 and higher correlation coefficients (R) of 96.22%.

  20. Decision Support Model for Optimal Management of Coastal Gate

    NASA Astrophysics Data System (ADS)

    Ditthakit, Pakorn; Chittaladakorn, Suwatana

    2010-05-01

    The coastal areas are intensely settled by human beings owing to their fertility of natural resources. However, at present those areas are facing with water scarcity problems: inadequate water and poor water quality as a result of saltwater intrusion and inappropriate land-use management. To solve these problems, several measures have been exploited. The coastal gate construction is a structural measure widely performed in several countries. This manner requires the plan for suitably operating coastal gates. Coastal gate operation is a complicated task and usually concerns with the management of multiple purposes, which are generally conflicted one another. This paper delineates the methodology and used theories for developing decision support modeling for coastal gate operation scheduling. The developed model was based on coupling simulation and optimization model. The weighting optimization technique based on Differential Evolution (DE) was selected herein for solving multiple objective problems. The hydrodynamic and water quality models were repeatedly invoked during searching the optimal gate operations. In addition, two forecasting models:- Auto Regressive model (AR model) and Harmonic Analysis model (HA model) were applied for forecasting water levels and tide levels, respectively. To demonstrate the applicability of the developed model, it was applied to plan the operations for hypothetical system of Pak Phanang coastal gate system, located in Nakhon Si Thammarat province, southern part of Thailand. It was found that the proposed model could satisfyingly assist decision-makers for operating coastal gates under various environmental, ecological and hydraulic conditions.

  1. Application of principal component regression and artificial neural network in FT-NIR soluble solids content determination of intact pear fruit

    NASA Astrophysics Data System (ADS)

    Ying, Yibin; Liu, Yande; Fu, Xiaping; Lu, Huishan

    2005-11-01

    The artificial neural networks (ANNs) have been used successfully in applications such as pattern recognition, image processing, automation and control. However, majority of today's applications of ANNs is back-propagate feed-forward ANN (BP-ANN). In this paper, back-propagation artificial neural networks (BP-ANN) were applied for modeling soluble solid content (SSC) of intact pear from their Fourier transform near infrared (FT-NIR) spectra. One hundred and sixty-four pear samples were used to build the calibration models and evaluate the models predictive ability. The results are compared to the classical calibration approaches, i.e. principal component regression (PCR), partial least squares (PLS) and non-linear PLS (NPLS). The effects of the optimal methods of training parameters on the prediction model were also investigated. BP-ANN combine with principle component regression (PCR) resulted always better than the classical PCR, PLS and Weight-PLS methods, from the point of view of the predictive ability. Based on the results, it can be concluded that FT-NIR spectroscopy and BP-ANN models can be properly employed for rapid and nondestructive determination of fruit internal quality.

  2. Large-area landslide susceptibility with optimized slope-units

    NASA Astrophysics Data System (ADS)

    Alvioli, Massimiliano; Marchesini, Ivan; Reichenbach, Paola; Rossi, Mauro; Ardizzone, Francesca; Fiorucci, Federica; Guzzetti, Fausto

    2017-04-01

    A Slope-Unit (SU) is a type of morphological terrain unit bounded by drainage and divide lines that maximize the within-unit homogeneity and the between-unit heterogeneity across distinct physical and geographical boundaries [1]. Compared to other terrain subdivisions, SU are morphological terrain unit well related to the natural (i.e., geological, geomorphological, hydrological) processes that shape and characterize natural slopes. This makes SU easily recognizable in the field or in topographic base maps, and well suited for environmental and geomorphological analysis, in particular for landslide susceptibility (LS) modelling. An optimal subdivision of an area into a set of SU depends on multiple factors: size and complexity of the study area, quality and resolution of the available terrain elevation data, purpose of the terrain subdivision, scale and resolution of the phenomena for which SU are delineated. We use the recently developed r.slopeunits software [2,3] for the automatic, parametric delineation of SU within the open source GRASS GIS based on terrain elevation data and a small number of user-defined parameters. The software provides subdivisions consisting of SU with different shapes and sizes, as a function of the input parameters. In this work, we describe a procedure for the optimal selection of the user parameters through the production of a large number of realizations of the LS model. We tested the software and the optimization procedure in a 2,000 km2 area in Umbria, Central Italy. For LS zonation we adopt a logistic regression model implemented in an well-known software [4,5], using about 50 independent variables. To select the optimal SU partition for LS zonation, we want to define a metric which is able to quantify simultaneously: (i) slope-unit internal homogeneity (ii) slope-unit external heterogeneity (iii) landslide susceptibility model performance. To this end, we define a comprehensive objective function S, as the product of three normalized objective functions dealing with the points (i)-(ii)-(iii) independently. We use an intra-segment variance function V, the Moran's autocorrelation index I and the AUCROC function R arising from the application of the logistic regression model. Maximization of the objective function S = f(I,V,R) as a function of the r.slopeunits input parameters provides an objective and reproducible way to select the optimal parameter combination for a proper SU subdivision for LS modelling. We further perform an analysis of the statistical significance of the LS models as a function of the r.slopeunits input parameters, focusing on the degree of coarseness of each subdivision. We find that the LRM, when applied to subdivisions with large average SU size, has a very poor statistical significance, resulting in only few (5%, typically lithological) variables being used in the regression due to the large heterogeneity of all variables within each unit, while up to 35% of the variables are used when SU are very small. This behavior was largely expected and provides further evidence that an objective method to select SU size is highly desirable. [1] Guzzetti, F. et al., Geomorphology 31, (1999) 181-216 [2] Alvioli, M. et al., Geoscientific Model Development 9 (2016), 3975-3991 [3] http://geomorphology.irpi.cnr.it/tools/slope-units [4] Rossi, M. et al., Geomorphology 114, (2010) 129-142 [5] Rossi, M. and Reichenbach, P., Geoscientific Model Development 9 (2016), 3533-3543

  3. Finite Element Vibration Modeling and Experimental Validation for an Aircraft Engine Casing

    NASA Astrophysics Data System (ADS)

    Rabbitt, Christopher

    This thesis presents a procedure for the development and validation of a theoretical vibration model, applies this procedure to a pair of aircraft engine casings, and compares select parameters from experimental testing of those casings to those from a theoretical model using the Modal Assurance Criterion (MAC) and linear regression coefficients. A novel method of determining the optimal MAC between axisymmetric results is developed and employed. It is concluded that the dynamic finite element models developed as part of this research are fully capable of modelling the modal parameters within the frequency range of interest. Confidence intervals calculated in this research for correlation coefficients provide important information regarding the reliability of predictions, and it is recommended that these intervals be calculated for all comparable coefficients. The procedure outlined for aligning mode shapes around an axis of symmetry proved useful, and the results are promising for the development of further optimization techniques.

  4. Identification of immune correlates of protection in Shigella infection by application of machine learning.

    PubMed

    Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K

    2017-10-01

    Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Optimization of turning process through the analytic flank wear modelling

    NASA Astrophysics Data System (ADS)

    Del Prete, A.; Franchi, R.; De Lorenzis, D.

    2018-05-01

    In the present work, the approach used for the optimization of the process capabilities for Oil&Gas components machining will be described. These components are machined by turning of stainless steel castings workpieces. For this purpose, a proper Design Of Experiments (DOE) plan has been designed and executed: as output of the experimentation, data about tool wear have been collected. The DOE has been designed starting from the cutting speed and feed values recommended by the tools manufacturer; the depth of cut parameter has been maintained as a constant. Wear data has been obtained by means the observation of the tool flank wear under an optical microscope: the data acquisition has been carried out at regular intervals of working times. Through a statistical data and regression analysis, analytical models of the flank wear and the tool life have been obtained. The optimization approach used is a multi-objective optimization, which minimizes the production time and the number of cutting tools used, under the constraint on a defined flank wear level. The technique used to solve the optimization problem is a Multi Objective Particle Swarm Optimization (MOPS). The optimization results, validated by the execution of a further experimental campaign, highlighted the reliability of the work and confirmed the usability of the optimized process parameters and the potential benefit for the company.

  6. ELASTIC NET FOR COX'S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM.

    PubMed

    Wu, Yichao

    2012-01-01

    For least squares regression, Efron et al. (2004) proposed an efficient solution path algorithm, the least angle regression (LAR). They showed that a slight modification of the LAR leads to the whole LASSO solution path. Both the LAR and LASSO solution paths are piecewise linear. Recently Wu (2011) extended the LAR to generalized linear models and the quasi-likelihood method. In this work we extend the LAR further to handle Cox's proportional hazards model. The goal is to develop a solution path algorithm for the elastic net penalty (Zou and Hastie (2005)) in Cox's proportional hazards model. This goal is achieved in two steps. First we extend the LAR to optimizing the log partial likelihood plus a fixed small ridge term. Then we define a path modification, which leads to the solution path of the elastic net regularized log partial likelihood. Our solution path is exact and piecewise determined by ordinary differential equation systems.

  7. An integrated simulation and optimization approach for managing human health risks of atmospheric pollutants by coal-fired power plants.

    PubMed

    Dai, C; Cai, X H; Cai, Y P; Guo, H C; Sun, W; Tan, Q; Huang, G H

    2014-06-01

    This research developed a simulation-aided nonlinear programming model (SNPM). This model incorporated the consideration of pollutant dispersion modeling, and the management of coal blending and the related human health risks within a general modeling framework In SNPM, the simulation effort (i.e., California puff [CALPUFF]) was used to forecast the fate of air pollutants for quantifying the health risk under various conditions, while the optimization studies were to identify the optimal coal blending strategies from a number of alternatives. To solve the model, a surrogate-based indirect search approach was proposed, where the support vector regression (SVR) was used to create a set of easy-to-use and rapid-response surrogates for identifying the function relationships between coal-blending operating conditions and health risks. Through replacing the CALPUFF and the corresponding hazard quotient equation with the surrogates, the computation efficiency could be improved. The developed SNPM was applied to minimize the human health risk associated with air pollutants discharged from Gaojing and Shijingshan power plants in the west of Beijing. Solution results indicated that it could be used for reducing the health risk of the public in the vicinity of the two power plants, identifying desired coal blending strategies for decision makers, and considering a proper balance between coal purchase cost and human health risk. A simulation-aided nonlinear programming model (SNPM) is developed. It integrates the advantages of CALPUFF and nonlinear programming model. To solve the model, a surrogate-based indirect search approach based on the combination of support vector regression and genetic algorithm is proposed. SNPM is applied to reduce the health risk caused by air pollutants discharged from Gaojing and Shijingshan power plants in the west of Beijing. Solution results indicate that it is useful for generating coal blending schemes, reducing the health risk of the public, reflecting the trade-offbetween coal purchase cost and health risk.

  8. Local Composite Quantile Regression Smoothing for Harris Recurrent Markov Processes

    PubMed Central

    Li, Degui; Li, Runze

    2016-01-01

    In this paper, we study the local polynomial composite quantile regression (CQR) smoothing method for the nonlinear and nonparametric models under the Harris recurrent Markov chain framework. The local polynomial CQR regression method is a robust alternative to the widely-used local polynomial method, and has been well studied in stationary time series. In this paper, we relax the stationarity restriction on the model, and allow that the regressors are generated by a general Harris recurrent Markov process which includes both the stationary (positive recurrent) and nonstationary (null recurrent) cases. Under some mild conditions, we establish the asymptotic theory for the proposed local polynomial CQR estimator of the mean regression function, and show that the convergence rate for the estimator in nonstationary case is slower than that in stationary case. Furthermore, a weighted type local polynomial CQR estimator is provided to improve the estimation efficiency, and a data-driven bandwidth selection is introduced to choose the optimal bandwidth involved in the nonparametric estimators. Finally, we give some numerical studies to examine the finite sample performance of the developed methodology and theory. PMID:27667894

  9. Quantum State Tomography via Linear Regression Estimation

    PubMed Central

    Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan

    2013-01-01

    A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519

  10. SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.

    PubMed

    Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman

    2017-03-01

    We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).

  11. [Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].

    PubMed

    Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao

    2016-03-01

    Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.

  12. Nonlinear Recurrent Neural Network Predictive Control for Energy Distribution of a Fuel Cell Powered Robot

    PubMed Central

    Chen, Qihong; Long, Rong; Quan, Shuhai

    2014-01-01

    This paper presents a neural network predictive control strategy to optimize power distribution for a fuel cell/ultracapacitor hybrid power system of a robot. We model the nonlinear power system by employing time variant auto-regressive moving average with exogenous (ARMAX), and using recurrent neural network to represent the complicated coefficients of the ARMAX model. Because the dynamic of the system is viewed as operating- state- dependent time varying local linear behavior in this frame, a linear constrained model predictive control algorithm is developed to optimize the power splitting between the fuel cell and ultracapacitor. The proposed algorithm significantly simplifies implementation of the controller and can handle multiple constraints, such as limiting substantial fluctuation of fuel cell current. Experiment and simulation results demonstrate that the control strategy can optimally split power between the fuel cell and ultracapacitor, limit the change rate of the fuel cell current, and so as to extend the lifetime of the fuel cell. PMID:24707206

  13. Optimization and comparison of three spatial interpolation methods for electromagnetic levels in the AM band within an urban area.

    PubMed

    Rufo, Montaña; Antolín, Alicia; Paniagua, Jesús M; Jiménez, Antonio

    2018-04-01

    A comparative study was made of three methods of interpolation - inverse distance weighting (IDW), spline and ordinary kriging - after optimization of their characteristic parameters. These interpolation methods were used to represent the electric field levels for three emission frequencies (774kHz, 900kHz, and 1107kHz) and for the electrical stimulation quotient, Q E , characteristic of complex electromagnetic environments. Measurements were made with a spectrum analyser in a village in the vicinity of medium-wave radio broadcasting antennas. The accuracy of the models was quantified by comparing their predictions with levels measured at the control points not used to generate the models. The results showed that optimizing the characteristic parameters of each interpolation method allows any of them to be used. However, the best results in terms of the regression coefficient between each model's predictions and the actual control point field measurements were for the IDW method. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Modeling soil parameters using hyperspectral image reflectance in subtropical coastal wetlands

    NASA Astrophysics Data System (ADS)

    Anne, Naveen J. P.; Abd-Elrahman, Amr H.; Lewis, David B.; Hewitt, Nicole A.

    2014-12-01

    Developing spectral models of soil properties is an important frontier in remote sensing and soil science. Several studies have focused on modeling soil properties such as total pools of soil organic matter and carbon in bare soils. We extended this effort to model soil parameters in areas densely covered with coastal vegetation. Moreover, we investigated soil properties indicative of soil functions such as nutrient and organic matter turnover and storage. These properties include the partitioning of mineral and organic soil between particulate (>53 μm) and fine size classes, and the partitioning of soil carbon and nitrogen pools between stable and labile fractions. Soil samples were obtained from Avicennia germinans mangrove forest and Juncus roemerianus salt marsh plots on the west coast of central Florida. Spectra corresponding to field plot locations from Hyperion hyperspectral image were extracted and analyzed. The spectral information was regressed against the soil variables to determine the best single bands and optimal band combinations for the simple ratio (SR) and normalized difference index (NDI) indices. The regression analysis yielded levels of correlation for soil variables with R2 values ranging from 0.21 to 0.47 for best individual bands, 0.28 to 0.81 for two-band indices, and 0.53 to 0.96 for partial least-squares (PLS) regressions for the Hyperion image data. Spectral models using Hyperion data adequately (RPD > 1.4) predicted particulate organic matter (POM), silt + clay, labile carbon (C), and labile nitrogen (N) (where RPD = ratio of standard deviation to root mean square error of cross-validation [RMSECV]). The SR (0.53 μm, 2.11 μm) model of labile N with R2 = 0.81, RMSECV= 0.28, and RPD = 1.94 produced the best results in this study. Our results provide optimism that remote-sensing spectral models can successfully predict soil properties indicative of ecosystem nutrient and organic matter turnover and storage, and do so in areas with dense canopy cover.

  15. Optimal Estimation of Clock Values and Trends from Finite Data

    NASA Technical Reports Server (NTRS)

    Greenhall, Charles

    2005-01-01

    We show how to solve two problems of optimal linear estimation from a finite set of phase data. Clock noise is modeled as a stochastic process with stationary dth increments. The covariance properties of such a process are contained in the generalized autocovariance function (GACV). We set up two principles for optimal estimation: with the help of the GACV, these principles lead to a set of linear equations for the regression coefficients and some auxiliary parameters. The mean square errors of the estimators are easily calculated. The method can be used to check the results of other methods and to find good suboptimal estimators based on a small subset of the available data.

  16. Study on optimization method of test conditions for fatigue crack detection using lock-in vibrothermography

    NASA Astrophysics Data System (ADS)

    Min, Qing-xu; Zhu, Jun-zhen; Feng, Fu-zhou; Xu, Chao; Sun, Ji-wei

    2017-06-01

    In this paper, the lock-in vibrothermography (LVT) is utilized for defect detection. Specifically, for a metal plate with an artificial fatigue crack, the temperature rise of the defective area is used for analyzing the influence of different test conditions, i.e. engagement force, excitation intensity, and modulated frequency. The multivariate nonlinear and logistic regression models are employed to estimate the POD (probability of detection) and POA (probability of alarm) of fatigue crack, respectively. The resulting optimal selection of test conditions is presented. The study aims to provide an optimized selection method of the test conditions in the vibrothermography system with the enhanced detection ability.

  17. Gaussian process regression for geometry optimization

    NASA Astrophysics Data System (ADS)

    Denzel, Alexander; Kästner, Johannes

    2018-03-01

    We implemented a geometry optimizer based on Gaussian process regression (GPR) to find minimum structures on potential energy surfaces. We tested both a two times differentiable form of the Matérn kernel and the squared exponential kernel. The Matérn kernel performs much better. We give a detailed description of the optimization procedures. These include overshooting the step resulting from GPR in order to obtain a higher degree of interpolation vs. extrapolation. In a benchmark against the Limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer of the DL-FIND library on 26 test systems, we found the new optimizer to generally reduce the number of required optimization steps.

  18. Selection of optimal complexity for ENSO-EMR model by minimum description length principle

    NASA Astrophysics Data System (ADS)

    Loskutov, E. M.; Mukhin, D.; Mukhina, A.; Gavrilov, A.; Kondrashov, D. A.; Feigin, A. M.

    2012-12-01

    One of the main problems arising in modeling of data taken from natural system is finding a phase space suitable for construction of the evolution operator model. Since we usually deal with strongly high-dimensional behavior, we are forced to construct a model working in some projection of system phase space corresponding to time scales of interest. Selection of optimal projection is non-trivial problem since there are many ways to reconstruct phase variables from given time series, especially in the case of a spatio-temporal data field. Actually, finding optimal projection is significant part of model selection, because, on the one hand, the transformation of data to some phase variables vector can be considered as a required component of the model. On the other hand, such an optimization of a phase space makes sense only in relation to the parametrization of the model we use, i.e. representation of evolution operator, so we should find an optimal structure of the model together with phase variables vector. In this paper we propose to use principle of minimal description length (Molkov et al., 2009) for selection models of optimal complexity. The proposed method is applied to optimization of Empirical Model Reduction (EMR) of ENSO phenomenon (Kravtsov et al. 2005, Kondrashov et. al., 2005). This model operates within a subset of leading EOFs constructed from spatio-temporal field of SST in Equatorial Pacific, and has a form of multi-level stochastic differential equations (SDE) with polynomial parameterization of the right-hand side. Optimal values for both the number of EOF, the order of polynomial and number of levels are estimated from the Equatorial Pacific SST dataset. References: Ya. Molkov, D. Mukhin, E. Loskutov, G. Fidelin and A. Feigin, Using the minimum description length principle for global reconstruction of dynamic systems from noisy time series, Phys. Rev. E, Vol. 80, P 046207, 2009 Kravtsov S, Kondrashov D, Ghil M, 2005: Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability. J. Climate, 18 (21): 4404-4424. D. Kondrashov, S. Kravtsov, A. W. Robertson and M. Ghil, 2005. A hierarchy of data-based ENSO models. J. Climate, 18, 4425-4444.

  19. Estimating overall exposure effects for the clustered and censored outcome using random effect Tobit regression models.

    PubMed

    Wang, Wei; Griswold, Michael E

    2016-11-30

    The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the 'Average Predicted Value' method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss-Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Modeling and optimization of trihalomethanes formation potential of surface water (a drinking water source) using Box-Behnken design.

    PubMed

    Singh, Kunwar P; Rai, Premanjali; Pandey, Priyanka; Sinha, Sarita

    2012-01-01

    The present research aims to investigate the individual and interactive effects of chlorine dose/dissolved organic carbon ratio, pH, temperature, bromide concentration, and reaction time on trihalomethanes (THMs) formation in surface water (a drinking water source) during disinfection by chlorination in a prototype laboratory-scale simulation and to develop a model for the prediction and optimization of THMs levels in chlorinated water for their effective control. A five-factor Box-Behnken experimental design combined with response surface and optimization modeling was used for predicting the THMs levels in chlorinated water. The adequacy of the selected model and statistical significance of the regression coefficients, independent variables, and their interactions were tested by the analysis of variance and t test statistics. The THMs levels predicted by the model were very close to the experimental values (R(2) = 0.95). Optimization modeling predicted maximum (192 μg/l) TMHs formation (highest risk) level in water during chlorination was very close to the experimental value (186.8 ± 1.72 μg/l) determined in laboratory experiments. The pH of water followed by reaction time and temperature were the most significant factors that affect the THMs formation during chlorination. The developed model can be used to determine the optimum characteristics of raw water and chlorination conditions for maintaining the THMs levels within the safe limit.

  1. Process optimization for osmo-dehydrated carambola (Averrhoa carambola L) slices and its storage studies.

    PubMed

    Roopa, N; Chauhan, O P; Raju, P S; Das Gupta, D K; Singh, R K R; Bawa, A S

    2014-10-01

    An osmotic-dehydration process protocol for Carambola (Averrhoacarambola L.,), an exotic star shaped tropical fruit, was developed. The process was optimized using Response Surface Methodology (RSM) following Central Composite Rotatable Design (CCRD). The experimental variables selected for the optimization were soak solution concentration (°Brix), soaking temperature (°C) and soaking time (min) with 6 experiments at central point. The effect of process variables was studied on solid gain and water loss during osmotic dehydration process. The data obtained were analyzed employing multiple regression technique to generate suitable mathematical models. Quadratic models were found to fit well (R(2), 95.58 - 98.64 %) in describing the effect of variables on the responses studied. The optimized levels of the process variables were achieved at 70°Brix, 48 °C and 144 min for soak solution concentration, soaking temperature and soaking time, respectively. The predicted and experimental results at optimized levels of variables showed high correlation. The osmo-dehydrated product prepared at optimized conditions showed a shelf-life of 10, 8 and 6 months at 5 °C, ambient (30 ± 2 °C) and 37 °C, respectively.

  2. A combined geostatistical-optimization model for the optimal design of a groundwater quality monitoring network

    NASA Astrophysics Data System (ADS)

    Kolosionis, Konstantinos; Papadopoulou, Maria P.

    2017-04-01

    Monitoring networks provide essential information for water resources management especially in areas with significant groundwater exploitation due to extensive agricultural activities. In this work, a simulation-optimization framework is developed based on heuristic optimization methodologies and geostatistical modeling approaches to obtain an optimal design for a groundwater quality monitoring network. Groundwater quantity and quality data obtained from 43 existing observation locations at 3 different hydrological periods in Mires basin in Crete, Greece will be used in the proposed framework in terms of Regression Kriging to develop the spatial distribution of nitrates concentration in the aquifer of interest. Based on the existing groundwater quality mapping, the proposed optimization tool will determine a cost-effective observation wells network that contributes significant information to water managers and authorities. The elimination of observation wells that add little or no beneficial information to groundwater level and quality mapping of the area can be obtain using estimations uncertainty and statistical error metrics without effecting the assessment of the groundwater quality. Given the high maintenance cost of groundwater monitoring networks, the proposed tool could used by water regulators in the decision-making process to obtain a efficient network design that is essential.

  3. SU-G-BRA-08: Diaphragm Motion Tracking Based On KV CBCT Projections with a Constrained Linear Regression Optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, J; Chao, M

    2016-06-15

    Purpose: To develop a novel strategy to extract the respiratory motion of the thoracic diaphragm from kilovoltage cone beam computed tomography (CBCT) projections by a constrained linear regression optimization technique. Methods: A parabolic function was identified as the geometric model and was employed to fit the shape of the diaphragm on the CBCT projections. The search was initialized by five manually placed seeds on a pre-selected projection image. Temporal redundancies, the enabling phenomenology in video compression and encoding techniques, inherent in the dynamic properties of the diaphragm motion together with the geometrical shape of the diaphragm boundary and the associatedmore » algebraic constraint that significantly reduced the searching space of viable parabolic parameters was integrated, which can be effectively optimized by a constrained linear regression approach on the subsequent projections. The innovative algebraic constraints stipulating the kinetic range of the motion and the spatial constraint preventing any unphysical deviations was able to obtain the optimal contour of the diaphragm with minimal initialization. The algorithm was assessed by a fluoroscopic movie acquired at anteriorposterior fixed direction and kilovoltage CBCT projection image sets from four lung and two liver patients. The automatic tracing by the proposed algorithm and manual tracking by a human operator were compared in both space and frequency domains. Results: The error between the estimated and manual detections for the fluoroscopic movie was 0.54mm with standard deviation (SD) of 0.45mm, while the average error for the CBCT projections was 0.79mm with SD of 0.64mm for all enrolled patients. The submillimeter accuracy outcome exhibits the promise of the proposed constrained linear regression approach to track the diaphragm motion on rotational projection images. Conclusion: The new algorithm will provide a potential solution to rendering diaphragm motion and ultimately improving tumor motion management for radiation therapy of cancer patients.« less

  4. Modeling of urban growth using cellular automata (CA) optimized by Particle Swarm Optimization (PSO)

    NASA Astrophysics Data System (ADS)

    Khalilnia, M. H.; Ghaemirad, T.; Abbaspour, R. A.

    2013-09-01

    In this paper, two satellite images of Tehran, the capital city of Iran, which were taken by TM and ETM+ for years 1988 and 2010 are used as the base information layers to study the changes in urban patterns of this metropolis. The patterns of urban growth for the city of Tehran are extracted in a period of twelve years using cellular automata setting the logistic regression functions as transition functions. Furthermore, the weighting coefficients of parameters affecting the urban growth, i.e. distance from urban centers, distance from rural centers, distance from agricultural centers, and neighborhood effects were selected using PSO. In order to evaluate the results of the prediction, the percent correct match index is calculated. According to the results, by combining optimization techniques with cellular automata model, the urban growth patterns can be predicted with accuracy up to 75 %.

  5. Application of Fourier transform near-infrared spectroscopy to optimization of green tea steaming process conditions.

    PubMed

    Ono, Daiki; Bamba, Takeshi; Oku, Yuichi; Yonetani, Tsutomu; Fukusaki, Eiichiro

    2011-09-01

    In this study, we constructed prediction models by metabolic fingerprinting of fresh green tea leaves using Fourier transform near-infrared (FT-NIR) spectroscopy and partial least squares (PLS) regression analysis to objectively optimize of the steaming process conditions in green tea manufacture. The steaming process is the most important step for manufacturing high quality green tea products. However, the parameter setting of the steamer is currently determined subjectively by the manufacturer. Therefore, a simple and robust system that can be used to objectively set the steaming process parameters is necessary. We focused on FT-NIR spectroscopy because of its simple operation, quick measurement, and low running costs. After removal of noise in the spectral data by principal component analysis (PCA), PLS regression analysis was performed using spectral information as independent variables, and the steaming parameters set by experienced manufacturers as dependent variables. The prediction models were successfully constructed with satisfactory accuracy. Moreover, the results of the demonstrated experiment suggested that the green tea steaming process parameters could be predicted on a larger manufacturing scale. This technique will contribute to improvement of the quality and productivity of green tea because it can objectively optimize the complicated green tea steaming process and will be suitable for practical use in green tea manufacture. Copyright © 2011 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  6. [Application of ordinary Kriging method in entomologic ecology].

    PubMed

    Zhang, Runjie; Zhou, Qiang; Chen, Cuixian; Wang, Shousong

    2003-01-01

    Geostatistics is a statistic method based on regional variables and using the tool of variogram to analyze the spatial structure and the patterns of organism. In simulating the variogram within a great range, though optimal simulation cannot be obtained, the simulation method of a dialogue between human and computer can be used to optimize the parameters of the spherical models. In this paper, the method mentioned above and the weighted polynomial regression were utilized to simulate the one-step spherical model, the two-step spherical model and linear function model, and the available nearby samples were used to draw on the ordinary Kriging procedure, which provided a best linear unbiased estimate of the constraint of the unbiased estimation. The sum of square deviation between the estimating and measuring values of varying theory models were figured out, and the relative graphs were shown. It was showed that the simulation based on the two-step spherical model was the best simulation, and the one-step spherical model was better than the linear function model.

  7. Multiple balance tests improve the assessment of postural stability in subjects with Parkinson's disease

    PubMed Central

    Jacobs, J V; Horak, F B; Tran, V K; Nutt, J G

    2006-01-01

    Objectives Clinicians often base the implementation of therapies on the presence of postural instability in subjects with Parkinson's disease (PD). These decisions are frequently based on the pull test from the Unified Parkinson's Disease Rating Scale (UPDRS). We sought to determine whether combining the pull test, the one‐leg stance test, the functional reach test, and UPDRS items 27–29 (arise from chair, posture, and gait) predicts balance confidence and falling better than any test alone. Methods The study included 67 subjects with PD. Subjects performed the one‐leg stance test, the functional reach test, and the UPDRS motor exam. Subjects also responded to the Activities‐specific Balance Confidence (ABC) scale and reported how many times they fell during the previous year. Regression models determined the combination of tests that optimally predicted mean ABC scores or categorised fall frequency. Results When all tests were included in a stepwise linear regression, only gait (UPDRS item 29), the pull test (UPDRS item 30), and the one‐leg stance test, in combination, represented significant predictor variables for mean ABC scores (r2 = 0.51). A multinomial logistic regression model including the one‐leg stance test and gait represented the model with the fewest significant predictor variables that correctly identified the most subjects as fallers or non‐fallers (85% of subjects were correctly identified). Conclusions Multiple balance tests (including the one‐leg stance test, and the gait and pull test items of the UPDRS) that assess different types of postural stress provide an optimal assessment of postural stability in subjects with PD. PMID:16484639

  8. Optimization of CMCase production from sorghum straw by Aspergillus terreus SUK-1 under solid substrate fermentation using response surface methodology

    NASA Astrophysics Data System (ADS)

    Tibin, El Mubarak Musa; Al-Shorgani, Najeeb Kaid Naseer; Abuelhassan, Nawal Noureldaim; Hamid, Aidil Abdul; Kalil, Mohd Sahaid; Yusoff, Wan Mohtar Wan

    2013-11-01

    The cellulase production using sorghum straw as substrate by fungal culture of Aspergillus terreus SUK-1 was investigated in solid substrate fermentation (SSF). The optimum CMCase was achieved by testing most effective fermentation parameters which were: incubation temperature, pH and moisture content using Response Surface Methodology (RSM) based on Central Composite Design (CCD). The carboxymethyl cellulase activity (CMCase) was measured as the defining factor. The results were analysed by analysis of variance (ANOVA) and the regression quadratic model was obtained. The model was found to be significant (p<0.05) and the effect of temperature (25-40°C) and pH (4-7) was found to be not significant on CMCase activity whereas the moisture content was significant in the SSF conditions employed. The high yield of predicted CMCase activity (0.2 U/ml) was obtained under the optimized conditions (temperature 40 □C, pH 5.4 and moisture content of 80%). The model was validated by applying the optimized conditions and it was found that the model was valid.

  9. Optimality in Microwave-Assisted Drying of Aloe Vera ( Aloe barbadensis Miller) Gel using Response Surface Methodology and Artificial Neural Network Modeling

    NASA Astrophysics Data System (ADS)

    Das, Chandan; Das, Arijit; Kumar Golder, Animes

    2016-10-01

    The present work illustrates the Microwave-Assisted Drying (MWAD) characteristic of aloe vera gel combined with process optimization and artificial neural network modeling. The influence of microwave power (160-480 W), gel quantity (4-8 g) and drying time (1-9 min) on the moisture ratio was investigated. The drying of aloe gel exhibited typical diffusion-controlled characteristics with a predominant interaction between input power and drying time. Falling rate period was observed for the entire MWAD of aloe gel. Face-centered Central Composite Design (FCCD) developed a regression model to evaluate their effects on moisture ratio. The optimal MWAD conditions were established as microwave power of 227.9 W, sample amount of 4.47 g and 5.78 min drying time corresponding to the moisture ratio of 0.15. A computer-stimulated Artificial Neural Network (ANN) model was generated for mapping between process variables and the desired response. `Levenberg-Marquardt Back Propagation' algorithm with 3-5-1 architect gave the best prediction, and it showed a clear superiority over FCCD.

  10. On the Asymptotic Relative Efficiency of Planned Missingness Designs.

    PubMed

    Rhemtulla, Mijke; Savalei, Victoria; Little, Todd D

    2016-03-01

    In planned missingness (PM) designs, certain data are set a priori to be missing. PM designs can increase validity and reduce cost; however, little is known about the loss of efficiency that accompanies these designs. The present paper compares PM designs to reduced sample (RN) designs that have the same total number of data points concentrated in fewer participants. In 4 studies, we consider models for both observed and latent variables, designs that do or do not include an "X set" of variables with complete data, and a full range of between- and within-set correlation values. All results are obtained using asymptotic relative efficiency formulas, and thus no data are generated; this novel approach allows us to examine whether PM designs have theoretical advantages over RN designs removing the impact of sampling error. Our primary findings are that (a) in manifest variable regression models, estimates of regression coefficients have much lower relative efficiency in PM designs as compared to RN designs, (b) relative efficiency of factor correlation or latent regression coefficient estimates is maximized when the indicators of each latent variable come from different sets, and (c) the addition of an X set improves efficiency in manifest variable regression models only for the parameters that directly involve the X-set variables, but it substantially improves efficiency of most parameters in latent variable models. We conclude that PM designs can be beneficial when the model of interest is a latent variable model; recommendations are made for how to optimize such a design.

  11. Development of a chromatographic method with multi-criteria decision making design for simultaneous determination of nifedipine and atenolol in content uniformity testing.

    PubMed

    Ahmed, Sameh; Alqurshi, Abdulmalik; Mohamed, Abdel-Maaboud Ismail

    2018-07-01

    A new robust and reliable high-performance liquid chromatography (HPLC) method with multi-criteria decision making (MCDM) approach was developed to allow simultaneous quantification of atenolol (ATN) and nifedipine (NFD) in content uniformity testing. Felodipine (FLD) was used as an internal standard (I.S.) in this study. A novel marriage between a new interactive response optimizer and a HPLC method was suggested for multiple response optimizations of target responses. An interactive response optimizer was used as a decision and prediction tool for the optimal settings of target responses, according to specified criteria, based on Derringer's desirability. Four independent variables were considered in this study: Acetonitrile%, buffer pH and concentration along with column temperature. Eight responses were optimized: retention times of ATN, NFD, and FLD, resolutions between ATN/NFD and NFD/FLD, and plate numbers for ATN, NFD, and FLD. Multiple regression analysis was applied in order to scan the influences of the most significant variables for the regression models. The experimental design was set to give minimum retention times, maximum resolution and plate numbers. The interactive response optimizer allowed prediction of optimum conditions according to these criteria with a good composite desirability value of 0.98156. The developed method was validated according to the International Conference on Harmonization (ICH) guidelines with the aid of the experimental design. The developed MCDM-HPLC method showed superior robustness and resolution in short analysis time allowing successful simultaneous content uniformity testing of ATN and NFD in marketed capsules. The current work presents an interactive response optimizer as an efficient platform to optimize, predict responses, and validate HPLC methodology with tolerable design space for assay in quality control laboratories. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Method validation using weighted linear regression models for quantification of UV filters in water samples.

    PubMed

    da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues

    2015-01-01

    This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Soil Cd, Cr, Cu, Ni, Pb and Zn sorption and retention models using SVM: Variable selection and competitive model.

    PubMed

    González Costa, J J; Reigosa, M J; Matías, J M; Covelo, E F

    2017-09-01

    The aim of this study was to model the sorption and retention of Cd, Cu, Ni, Pb and Zn in soils. To that extent, the sorption and retention of these metals were studied and the soil characterization was performed separately. Multiple stepwise regression was used to produce multivariate models with linear techniques and with support vector machines, all of which included 15 explanatory variables characterizing soils. When the R-squared values are represented, two different groups are noticed. Cr, Cu and Pb sorption and retention show a higher R-squared; the most explanatory variables being humified organic matter, Al oxides and, in some cases, cation-exchange capacity (CEC). The other group of metals (Cd, Ni and Zn) shows a lower R-squared, and clays are the most explanatory variables, including a percentage of vermiculite and slime. In some cases, quartz, plagioclase or hematite percentages also show some explanatory capacity. Support Vector Machine (SVM) regression shows that the different models are not as regular as in multiple regression in terms of number of variables, the regression for nickel adsorption being the one with the highest number of variables in its optimal model. On the other hand, there are cases where the most explanatory variables are the same for two metals, as it happens with Cd and Cr adsorption. A similar adsorption mechanism is thus postulated. These patterns of the introduction of variables in the model allow us to create explainability sequences. Those which are the most similar to the selectivity sequences obtained by Covelo (2005) are Mn oxides in multiple regression and change capacity in SVM. Among all the variables, the only one that is explanatory for all the metals after applying the maximum parsimony principle is the percentage of sand in the retention process. In the competitive model arising from the aforementioned sequences, the most intense competitiveness for the adsorption and retention of different metals appears between Cr and Cd, Cu and Zn in multiple regression; and between Cr and Cd in SVM regression. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Trend analysis of salt load and evaluation of the frequency of water-quality measurements for the Gunnison, the Colorado, and the Dolores rivers in Colorado and Utah

    USGS Publications Warehouse

    Kircher, J.E.; Dinicola, Richard S.; Middelburg, R.F.

    1984-01-01

    Monthly values were computed for water-quality constituents at four streamflow gaging stations in the Upper Colorado River basin for the determination of trends. Seasonal regression and seasonal Kendall trend analysis techniques were applied to two monthly data sets at each station site for four different time periods. A recently developed method for determining optimal water-discharge data-collection frequency was also applied to the monthly water-quality data. Trend analysis results varied with each monthly load computational method, period of record, and trend detection model used. No conclusions could be reached regarding which computational method was best to use in trend analysis. Time-period selection for analysis was found to be important with regard to intended use of the results. Seasonal Kendall procedures were found to be applicable to most data sets. Seasonal regression models were more difficult to apply and were sometimes of questionable validity; however, those results were more informative than seasonal Kendall results. The best model to use depends upon the characteristics of the data and the amount of trend information needed. The measurement-frequency optimization method had potential for application to water-quality data, but refinements are needed. (USGS)

  15. Estimating effects of limiting factors with regression quantiles

    USGS Publications Warehouse

    Cade, B.S.; Terrell, J.W.; Schroeder, R.L.

    1999-01-01

    In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.

  16. Central composite rotatable design for investigation of microwave-assisted extraction of okra pod hydrocolloid.

    PubMed

    Samavati, Vahid

    2013-10-01

    Microwave-assisted extraction (MAE) technique was employed to extract the hydrocolloid from okra pods (OPH). The optimal conditions for microwave-assisted extraction of OPH were determined by response surface methodology. A central composite rotatable design (CCRD) was applied to evaluate the effects of three independent variables (microwave power (X1: 100-500 W), extraction time (X2: 30-90 min), and extraction temperature (X3: 40-90 °C)) on the extraction yield of OPH. The correlation analysis of the mathematical-regression model indicated that quadratic polynomial model could be employed to optimize the microwave extraction of OPH. The optimal conditions to obtain the highest recovery of OPH (14.911±0.27%) were as follows: microwave power, 395.56 W; extraction time, 67.11 min and extraction temperature, 73.33 °C. Under these optimal conditions, the experimental values agreed with the predicted ones by analysis of variance. It indicated high fitness of the model used and the success of response surface methodology for optimizing OPH extraction. After method development, the DPPH radical scavenging activity of the OPH was evaluated. MAE showed obvious advantages in terms of high extraction efficiency and radical scavenging activity of extract within the shorter extraction time. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. The alarming problems of confounding equivalence using logistic regression models in the perspective of causal diagrams.

    PubMed

    Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong

    2017-12-28

    Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which obtained the highest precision. All adjustment strategies through logistic regression were biased for causal effect estimation, while IPW-based-MSM could always obtain unbiased estimation when the adjusted set satisfied G-admissibility. Thus, IPW-based-MSM was recommended to adjust for confounders set.

  18. Chance-constrained multi-objective optimization of groundwater remediation design at DNAPLs-contaminated sites using a multi-algorithm genetically adaptive method

    NASA Astrophysics Data System (ADS)

    Ouyang, Qi; Lu, Wenxi; Hou, Zeyu; Zhang, Yu; Li, Shuai; Luo, Jiannan

    2017-05-01

    In this paper, a multi-algorithm genetically adaptive multi-objective (AMALGAM) method is proposed as a multi-objective optimization solver. It was implemented in the multi-objective optimization of a groundwater remediation design at sites contaminated by dense non-aqueous phase liquids. In this study, there were two objectives: minimization of the total remediation cost, and minimization of the remediation time. A non-dominated sorting genetic algorithm II (NSGA-II) was adopted to compare with the proposed method. For efficiency, the time-consuming surfactant-enhanced aquifer remediation simulation model was replaced by a surrogate model constructed by a multi-gene genetic programming (MGGP) technique. Similarly, two other surrogate modeling methods-support vector regression (SVR) and Kriging (KRG)-were employed to make comparisons with MGGP. In addition, the surrogate-modeling uncertainty was incorporated in the optimization model by chance-constrained programming (CCP). The results showed that, for the problem considered in this study, (1) the solutions obtained by AMALGAM incurred less remediation cost and required less time than those of NSGA-II, indicating that AMALGAM outperformed NSGA-II. It was additionally shown that (2) the MGGP surrogate model was more accurate than SVR and KRG; and (3) the remediation cost and time increased with the confidence level, which can enable decision makers to make a suitable choice by considering the given budget, remediation time, and reliability.

  19. Breast Radiotherapy with Mixed Energy Photons; a Model for Optimal Beam Weighting.

    PubMed

    Birgani, Mohammadjavad Tahmasebi; Fatahiasl, Jafar; Hosseini, Seyed Mohammad; Bagheri, Ali; Behrooz, Mohammad Ali; Zabiehzadeh, Mansour; Meskani, Reza; Gomari, Maryam Talaei

    2015-01-01

    Utilization of high energy photons (>10 MV) with an optimal weight using a mixed energy technique is a practical way to generate a homogenous dose distribution while maintaining adequate target coverage in intact breast radiotherapy. This study represents a model for estimation of this optimal weight for day to day clinical usage. For this purpose, treatment planning computed tomography scans of thirty-three consecutive early stage breast cancer patients following breast conservation surgery were analyzed. After delineation of the breast clinical target volume (CTV) and placing opposed wedge paired isocenteric tangential portals, dosimeteric calculations were conducted and dose volume histograms (DVHs) were generated, first with pure 6 MV photons and then these calculations were repeated ten times with incorporating 18 MV photons (ten percent increase in weight per step) in each individual patient. For each calculation two indexes including maximum dose in the breast CTV (Dmax) and the volume of CTV which covered with 95% Isodose line (VCTV, 95%IDL) were measured according to the DVH data and then normalized values were plotted in a graph. The optimal weight of 18 MV photons was defined as the intersection point of Dmax and VCTV, 95%IDL graphs. For creating a model to predict this optimal weight multiple linear regression analysis was used based on some of the breast and tangential field parameters. The best fitting model for prediction of 18 MV photons optimal weight in breast radiotherapy using mixed energy technique, incorporated chest wall separation plus central lung distance (Adjusted R2=0.776). In conclusion, this study represents a model for the estimation of optimal beam weighting in breast radiotherapy using mixed photon energy technique for routine day to day clinical usage.

  20. Spectral imaging using consumer-level devices and kernel-based regression.

    PubMed

    Heikkinen, Ville; Cámara, Clara; Hirvonen, Tapani; Penttinen, Niko

    2016-06-01

    Hyperspectral reflectance factor image estimations were performed in the 400-700 nm wavelength range using a portable consumer-level laptop display as an adjustable light source for a trichromatic camera. Targets of interest were ColorChecker Classic samples, Munsell Matte samples, geometrically challenging tempera icon paintings from the turn of the 20th century, and human hands. Measurements and simulations were performed using Nikon D80 RGB camera and Dell Vostro 2520 laptop screen as a light source. Estimations were performed without spectral characteristics of the devices and by emphasizing simplicity for training sets and estimation model optimization. Spectral and color error images are shown for the estimations using line-scanned hyperspectral images as the ground truth. Estimations were performed using kernel-based regression models via a first-degree inhomogeneous polynomial kernel and a Matérn kernel, where in the latter case the median heuristic approach for model optimization and link function for bounded estimation were evaluated. Results suggest modest requirements for a training set and show that all estimation models have markedly improved accuracy with respect to the DE00 color distance (up to 99% for paintings and hands) and the Pearson distance (up to 98% for paintings and 99% for hands) from a weak training set (Digital ColorChecker SG) case when small representative training data were used in the estimation.

  1. [Screen potential CYP450 2E1 inhibitors from Chinese herbal medicine based on support vector regression and molecular docking method].

    PubMed

    Chen, Xi; Lu, Fang; Jiang, Lu-di; Cai, Yi-Lian; Li, Gong-Yu; Zhang, Yan-Ling

    2016-07-01

    Inhibition of cytochrome P450 (CYP450) enzymes is the most common reasons for drug interactions, so the study on early prediction of CYPs inhibitors can help to decrease the incidence of adverse reactions caused by drug interactions.CYP450 2E1(CYP2E1), as a key role in drug metabolism process, has broad spectrum of drug metabolism substrate. In this study, 32 CYP2E1 inhibitors were collected for the construction of support vector regression (SVR) model. The test set data were used to verify CYP2E1 quantitative models and obtain the optimal prediction model of CYP2E1 inhibitor. Meanwhile, one molecular docking program, CDOCKER, was utilized to analyze the interaction pattern between positive compounds and active pocket to establish the optimal screening model of CYP2E1 inhibitors.SVR model and molecular docking prediction model were combined to screen traditional Chinese medicine database (TCMD), which could improve the calculation efficiency and prediction accuracy. 6 376 traditional Chinese medicine (TCM) compounds predicted by SVR model were obtained, and in further verification by using molecular docking model, 247 TCM compounds with potential inhibitory activities against CYP2E1 were finally retained. Some of them have been verified by experiments. The results demonstrated that this study could provide guidance for the virtual screening of CYP450 inhibitors and the prediction of CYPs-mediated DDIs, and also provide references for clinical rational drug use. Copyright© by the Chinese Pharmaceutical Association.

  2. Development of a partial least squares-artificial neural network (PLS-ANN) hybrid model for the prediction of consumer liking scores of ready-to-drink green tea beverages.

    PubMed

    Yu, Peigen; Low, Mei Yin; Zhou, Weibiao

    2018-01-01

    In order to develop products that would be preferred by consumers, the effects of the chemical compositions of ready-to-drink green tea beverages on consumer liking were studied through regression analyses. Green tea model systems were prepared by dosing solutions of 0.1% green tea extract with differing concentrations of eight flavour keys deemed to be important for green tea aroma and taste, based on a D-optimal experimental design, before undergoing commercial sterilisation. Sensory evaluation of the green tea model system was carried out using an untrained consumer panel to obtain hedonic liking scores of the samples. Regression models were subsequently trained to objectively predict the consumer liking scores of the green tea model systems. A linear partial least squares (PLS) regression model was developed to describe the effects of the eight flavour keys on consumer liking, with a coefficient of determination (R 2 ) of 0.733, and a root-mean-square error (RMSE) of 3.53%. The PLS model was further augmented with an artificial neural network (ANN) to establish a PLS-ANN hybrid model. The established hybrid model was found to give a better prediction of consumer liking scores, based on its R 2 (0.875) and RMSE (2.41%). Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Uremic Pruritus, Dialysis Adequacy, and Metabolic Profiles in Hemodialysis Patients: A Prospective 5-Year Cohort Study

    PubMed Central

    Chen, Hung-Yuan; Chiu, Yen-Ling; Hsu, Shih-Ping; Pai, Mei-Fen; Ju-YehYang; Lai, Chun-Fu; Lu, Hui-Min; Huang, Shu-Chen; Yang, Shao-Yu; Wen, Su-Yin; Chiu, Hsien-Ching; Hu, Fu-Chang; Peng, Yu-Sen; Jee, Shiou-Hwa

    2013-01-01

    Background Uremic pruritus is a common and intractable symptom in patients on chronic hemodialysis, but factors associated with the severity of pruritus remain unclear. This study aimed to explore the associations of metabolic factors and dialysis adequacy with the aggravation of pruritus. Methods We conducted a 5-year prospective cohort study on patients with maintenance hemodialysis. A visual analogue scale (VAS) was used to assess the intensity of pruritus. Patient demographic and clinical characteristics, laboratory parameters, dialysis adequacy (assessed by Kt/V), and pruritus intensity were recorded at baseline and follow-up. Change score analysis of the difference score of VAS between baseline and follow-up was performed using multiple linear regression models. The optimal threshold of Kt/V, which is associated with the aggravation of uremic pruritus, was determined by generalized additive models and receiver operating characteristic analysis. Results A total of 111 patients completed the study. Linear regression analysis showed that lower Kt/V and use of low-flux dialyzer were significantly associated with the aggravation of pruritus after adjusting for the baseline pruritus intensity and a variety of confounding factors. The optimal threshold value of Kt/V for pruritus was 1.5 suggested by both generalized additive models and receiver operating characteristic analysis. Conclusions Hemodialysis with the target of Kt/V ≥1.5 and use of high-flux dialyzer may reduce the intensity of pruritus in patients on chronic hemodialysis. Further clinical trials are required to determine the optimal dialysis dose and regimen for uremic pruritus. PMID:23940749

  4. Comparing spatial regression to random forests for large ...

    EPA Pesticide Factsheets

    Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. Our primary goal is predicting MMI at over 1.1 million perennial stream reaches across the USA. For spatial regression modeling, we develop two new methods to accommodate large data: (1) a procedure that estimates optimal Box-Cox transformations to linearize covariate relationships; and (2) a computationally efficient covariate selection routine that takes into account spatial autocorrelation. We show that our new methods lead to cross-validated performance similar to random forests, but that there is an advantage for spatial regression when quantifying the uncertainty of the predictions. Simulations are used to clarify advantages for each method. This research investigates different approaches for modeling and mapping national stream condition. We use MMI data from the EPA's National Rivers and Streams Assessment and predictors from StreamCat (Hill et al., 2015). Previous studies have focused on modeling the MMI condition classes (i.e., good, fair, and po

  5. Analysis of low flows and selected methods for estimating low-flow characteristics at partial-record and ungaged stream sites in western Washington

    USGS Publications Warehouse

    Curran, Christopher A.; Eng, Ken; Konrad, Christopher P.

    2012-01-01

    Regional low-flow regression models for estimating Q7,10 at ungaged stream sites are developed from the records of daily discharge at 65 continuous gaging stations (including 22 discontinued gaging stations) for the purpose of evaluating explanatory variables. By incorporating the base-flow recession time constant τ as an explanatory variable in the regression model, the root-mean square error for estimating Q7,10 at ungaged sites can be lowered to 72 percent (for known values of τ), which is 42 percent less than if only basin area and mean annual precipitation are used as explanatory variables. If partial-record sites are included in the regression data set, τ must be estimated from pairs of discharge measurements made during continuous periods of declining low flows. Eight measurement pairs are optimal for estimating τ at partial-record sites, and result in a lowering of the root-mean square error by 25 percent. A low-flow survey strategy that includes paired measurements at partial-record sites requires additional effort and planning beyond a standard strategy, but could be used to enhance regional estimates of τ and potentially reduce the error of regional regression models for estimating low-flow characteristics at ungaged sites.

  6. Modelling and optimization of semi-solid processing of 7075 Al alloy

    NASA Astrophysics Data System (ADS)

    Binesh, B.; Aghaie-Khafri, M.

    2017-09-01

    The new modified strain-induced melt activation (SIMA) process presented by Binesh and Aghaie-Khafri was optimized using a response surface methodology to improve the thixotropic characteristics of semi-solid 7075 alloy. The responses, namely the average grain size and the shape factor, were considered as functions of three independent input variables: effective strain, isothermal holding temperature and time. Mathematical models for the responses were developed using the regression analysis technique, and the adequacy of the models was validated by the analysis of variance method. The calculated results correlated fairly well with the experiments. It was found that all the first- and second-order terms of the independent parameters and the interactive terms of the effective strain and holding time were statistically significant for the responses. In order to simultaneously optimize the responses, the desirable values for the effective strain, holding temperature and time were predicted to be 5.1, 609 °C and 14 min, respectively, when employing the desirability function approach. Based on the optimization results, a significant improvement in the average grain size and shape factor of the semi-solid slurry prepared by the new modified SIMA process was observed.

  7. A system methodology for optimization design of the structural crashworthiness of a vehicle subjected to a high-speed frontal crash

    NASA Astrophysics Data System (ADS)

    Xia, Liang; Liu, Weiguo; Lv, Xiaojiang; Gu, Xianguang

    2018-04-01

    The structural crashworthiness design of vehicles has become an important research direction to ensure the safety of the occupants. To effectively improve the structural safety of a vehicle in a frontal crash, a system methodology is presented in this study. The surrogate model of Online support vector regression (Online-SVR) is adopted to approximate crashworthiness criteria and different kernel functions are selected to enhance the accuracy of the model. The Online-SVR model is demonstrated to have the advantages of solving highly nonlinear problems and saving training costs, and can effectively be applied for vehicle structural crashworthiness design. By combining the non-dominated sorting genetic algorithm II and Monte Carlo simulation, both deterministic optimization and reliability-based design optimization (RBDO) are conducted. The optimization solutions are further validated by finite element analysis, which shows the effectiveness of the RBDO solution in the structural crashworthiness design process. The results demonstrate the advantages of using RBDO, resulting in not only increased energy absorption and decreased structural weight from a baseline design, but also a significant improvement in the reliability of the design.

  8. Distributed collaborative probabilistic design for turbine blade-tip radial running clearance using support vector machine of regression

    NASA Astrophysics Data System (ADS)

    Fei, Cheng-Wei; Bai, Guang-Chen

    2014-12-01

    To improve the computational precision and efficiency of probabilistic design for mechanical dynamic assembly like the blade-tip radial running clearance (BTRRC) of gas turbine, a distribution collaborative probabilistic design method-based support vector machine of regression (SR)(called as DCSRM) is proposed by integrating distribution collaborative response surface method and support vector machine regression model. The mathematical model of DCSRM is established and the probabilistic design idea of DCSRM is introduced. The dynamic assembly probabilistic design of aeroengine high-pressure turbine (HPT) BTRRC is accomplished to verify the proposed DCSRM. The analysis results reveal that the optimal static blade-tip clearance of HPT is gained for designing BTRRC, and improving the performance and reliability of aeroengine. The comparison of methods shows that the DCSRM has high computational accuracy and high computational efficiency in BTRRC probabilistic analysis. The present research offers an effective way for the reliability design of mechanical dynamic assembly and enriches mechanical reliability theory and method.

  9. Structure-activity relationships between sterols and their thermal stability in oil matrix.

    PubMed

    Hu, Yinzhou; Xu, Junli; Huang, Weisu; Zhao, Yajing; Li, Maiquan; Wang, Mengmeng; Zheng, Lufei; Lu, Baiyi

    2018-08-30

    Structure-activity relationships between 20 sterols and their thermal stabilities were studied in a model oil system. All sterol degradations were found to be consistent with a first-order kinetic model with determination of coefficient (R 2 ) higher than 0.9444. The number of double bonds in the sterol structure was negatively correlated with the thermal stability of sterol, whereas the length of the branch chain was positively correlated with the thermal stability of sterol. A quantitative structure-activity relationship (QSAR) model to predict thermal stability of sterol was developed by using partial least squares regression (PLSR) combined with genetic algorithm (GA). A regression model was built with R 2 of 0.806. Almost all sterol degradation constants can be predicted accurately with R 2 of cross-validation equals to 0.680. Four important variables were selected in optimal QSAR model and the selected variables were observed to be related with information indices, RDF descriptors, and 3D-MoRSE descriptors. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Optimizing the models for rapid determination of chlorogenic acid, scopoletin and rutin in plant samples by near-infrared diffuse reflectance spectroscopy

    NASA Astrophysics Data System (ADS)

    Mao, Zhiyi; Shan, Ruifeng; Wang, Jiajun; Cai, Wensheng; Shao, Xueguang

    2014-07-01

    Polyphenols in plant samples have been extensively studied because phenolic compounds are ubiquitous in plants and can be used as antioxidants in promoting human health. A method for rapid determination of three phenolic compounds (chlorogenic acid, scopoletin and rutin) in plant samples using near-infrared diffuse reflectance spectroscopy (NIRDRS) is studied in this work. Partial least squares (PLS) regression was used for building the calibration models, and the effects of spectral preprocessing and variable selection on the models are investigated for optimization of the models. The results show that individual spectral preprocessing and variable selection has no or slight influence on the models, but the combination of the techniques can significantly improve the models. The combination of continuous wavelet transform (CWT) for removing the variant background, multiplicative scatter correction (MSC) for correcting the scattering effect and randomization test (RT) for selecting the informative variables was found to be the best way for building the optimal models. For validation of the models, the polyphenol contents in an independent sample set were predicted. The correlation coefficients between the predicted values and the contents determined by high performance liquid chromatography (HPLC) analysis are as high as 0.964, 0.948 and 0.934 for chlorogenic acid, scopoletin and rutin, respectively.

  11. Optimizing Discharge Capacity of Li-O 2 Batteries by Design of Air-Electrode Porous Structure: Multifidelity Modeling and Optimization

    DOE PAGES

    Pan, Wenxiao; Yang, Xiu; Bao, Jie; ...

    2017-01-01

    We develop a new mathematical framework to study the optimal design of air electrode microstructures for lithium-oxygen (Li-O2) batteries. It can eectively reduce the number of expensive experiments for testing dierent air-electrodes, thereby minimizing the cost in the design of Li-O2 batteries. The design parameters to characterize an air-electrode microstructure include the porosity, surface-to-volume ratio, and parameters associated with the pore-size distribution. A surrogate model (also known as response surface) for discharge capacity is rst constructed as a function of these design parameters. The surrogate model is accurate and easy to evaluate such that an optimization can be performed basedmore » on it. In particular, a Gaussian process regression method, co-kriging, is employed due to its accuracy and eciency in predicting high-dimensional responses from a combination of multidelity data. Specically, a small amount of data from high-delity simulations are combined with a large number of data obtained from computationally ecient low-delity simulations. The high-delity simulation is based on a multiscale modeling approach that couples the microscale (pore-scale) and macroscale (device-scale) models. Whereas, the low-delity simulation is based on an empirical macroscale model. The constructed response surface provides quantitative understanding and prediction about how air electrode microstructures aect the discharge performance of Li-O2 batteries. The succeeding sensitivity analysis via Sobol indices and optimization via genetic algorithm ultimately oer a reliable guidance on the optimal design of air electrode microstructures. The proposed mathematical framework can be generalized to investigate other new energy storage techniques and materials.« less

  12. Optimizing Discharge Capacity of Li-O 2 Batteries by Design of Air-Electrode Porous Structure: Multifidelity Modeling and Optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pan, Wenxiao; Yang, Xiu; Bao, Jie

    We develop a new mathematical framework to study the optimal design of air electrode microstructures for lithium-oxygen (Li-O2) batteries. It can eectively reduce the number of expensive experiments for testing dierent air-electrodes, thereby minimizing the cost in the design of Li-O2 batteries. The design parameters to characterize an air-electrode microstructure include the porosity, surface-to-volume ratio, and parameters associated with the pore-size distribution. A surrogate model (also known as response surface) for discharge capacity is rst constructed as a function of these design parameters. The surrogate model is accurate and easy to evaluate such that an optimization can be performed basedmore » on it. In particular, a Gaussian process regression method, co-kriging, is employed due to its accuracy and eciency in predicting high-dimensional responses from a combination of multidelity data. Specically, a small amount of data from high-delity simulations are combined with a large number of data obtained from computationally ecient low-delity simulations. The high-delity simulation is based on a multiscale modeling approach that couples the microscale (pore-scale) and macroscale (device-scale) models. Whereas, the low-delity simulation is based on an empirical macroscale model. The constructed response surface provides quantitative understanding and prediction about how air electrode microstructures aect the discharge performance of Li-O2 batteries. The succeeding sensitivity analysis via Sobol indices and optimization via genetic algorithm ultimately oer a reliable guidance on the optimal design of air electrode microstructures. The proposed mathematical framework can be generalized to investigate other new energy storage techniques and materials.« less

  13. A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery - part II: an illustrative example.

    PubMed

    Cevenini, Gabriele; Barbini, Emanuela; Scolletta, Sabino; Biagioli, Bonizella; Giomarelli, Pierpaolo; Barbini, Paolo

    2007-11-22

    Popular predictive models for estimating morbidity probability after heart surgery are compared critically in a unitary framework. The study is divided into two parts. In the first part modelling techniques and intrinsic strengths and weaknesses of different approaches were discussed from a theoretical point of view. In this second part the performances of the same models are evaluated in an illustrative example. Eight models were developed: Bayes linear and quadratic models, k-nearest neighbour model, logistic regression model, Higgins and direct scoring systems and two feed-forward artificial neural networks with one and two layers. Cardiovascular, respiratory, neurological, renal, infectious and hemorrhagic complications were defined as morbidity. Training and testing sets each of 545 cases were used. The optimal set of predictors was chosen among a collection of 78 preoperative, intraoperative and postoperative variables by a stepwise procedure. Discrimination and calibration were evaluated by the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit test, respectively. Scoring systems and the logistic regression model required the largest set of predictors, while Bayesian and k-nearest neighbour models were much more parsimonious. In testing data, all models showed acceptable discrimination capacities, however the Bayes quadratic model, using only three predictors, provided the best performance. All models showed satisfactory generalization ability: again the Bayes quadratic model exhibited the best generalization, while artificial neural networks and scoring systems gave the worst results. Finally, poor calibration was obtained when using scoring systems, k-nearest neighbour model and artificial neural networks, while Bayes (after recalibration) and logistic regression models gave adequate results. Although all the predictive models showed acceptable discrimination performance in the example considered, the Bayes and logistic regression models seemed better than the others, because they also had good generalization and calibration. The Bayes quadratic model seemed to be a convincing alternative to the much more usual Bayes linear and logistic regression models. It showed its capacity to identify a minimum core of predictors generally recognized as essential to pragmatically evaluate the risk of developing morbidity after heart surgery.

  14. Statistical Optimality in Multipartite Ranking and Ordinal Regression.

    PubMed

    Uematsu, Kazuki; Lee, Yoonkyung

    2015-05-01

    Statistical optimality in multipartite ranking is investigated as an extension of bipartite ranking. We consider the optimality of ranking algorithms through minimization of the theoretical risk which combines pairwise ranking errors of ordinal categories with differential ranking costs. The extension shows that for a certain class of convex loss functions including exponential loss, the optimal ranking function can be represented as a ratio of weighted conditional probability of upper categories to lower categories, where the weights are given by the misranking costs. This result also bridges traditional ranking methods such as proportional odds model in statistics with various ranking algorithms in machine learning. Further, the analysis of multipartite ranking with different costs provides a new perspective on non-smooth list-wise ranking measures such as the discounted cumulative gain and preference learning. We illustrate our findings with simulation study and real data analysis.

  15. Variable selection based near infrared spectroscopy quantitative and qualitative analysis on wheat wet gluten

    NASA Astrophysics Data System (ADS)

    Lü, Chengxu; Jiang, Xunpeng; Zhou, Xingfan; Zhang, Yinqiao; Zhang, Naiqian; Wei, Chongfeng; Mao, Wenhua

    2017-10-01

    Wet gluten is a useful quality indicator for wheat, and short wave near infrared spectroscopy (NIRS) is a high performance technique with the advantage of economic rapid and nondestructive test. To study the feasibility of short wave NIRS analyzing wet gluten directly from wheat seed, 54 representative wheat seed samples were collected and scanned by spectrometer. 8 spectral pretreatment method and genetic algorithm (GA) variable selection method were used to optimize analysis. Both quantitative and qualitative model of wet gluten were built by partial least squares regression and discriminate analysis. For quantitative analysis, normalization is the optimized pretreatment method, 17 wet gluten sensitive variables are selected by GA, and GA model performs a better result than that of all variable model, with R2V=0.88, and RMSEV=1.47. For qualitative analysis, automatic weighted least squares baseline is the optimized pretreatment method, all variable models perform better results than those of GA models. The correct classification rates of 3 class of <24%, 24-30%, >30% wet gluten content are 95.45, 84.52, and 90.00%, respectively. The short wave NIRS technique shows potential for both quantitative and qualitative analysis of wet gluten for wheat seed.

  16. Elaborate ligand-based modeling reveal new submicromolar Rho kinase inhibitors

    NASA Astrophysics Data System (ADS)

    Shahin, Rand; AlQtaishat, Saja; Taha, Mutasem O.

    2012-02-01

    Rho Kinase (ROCKII) has been recently implicated in several cardiovascular diseases prompting several attempts to discover and optimize new ROCKII inhibitors. Towards this end we explored the pharmacophoric space of 138 ROCKII inhibitors to identify high quality pharmacophores. The pharmacophoric models were subsequently allowed to compete within quantitative structure-activity relationship (QSAR) context. Genetic algorithm and multiple linear regression analysis were employed to select an optimal combination of pharmacophoric models and 2D physicochemical descriptors capable of accessing self-consistent QSAR of optimal predictive potential ( r 77 = 0.84, F = 18.18, r LOO 2 = 0.639, r PRESS 2 against 19 external test inhibitors = 0.494). Two orthogonal pharmacophores emerged in the QSAR equation suggesting the existence of at least two binding modes accessible to ligands within ROCKII binding pocket. Receiver operating characteristic (ROC) curve analyses established the validity of QSAR-selected pharmacophores. Moreover, the successful pharmacophores models were found to be comparable with crystallographically resolved ROCKII binding pocket. We employed the pharmacophoric models and associated QSAR equation to screen the national cancer institute (NCI) list of compounds Eight submicromolar ROCKII inhibitors were identified. The most potent gave IC50 values of 0.7 and 1.0 μM.

  17. [Application of near infrared spectroscopy combined with particle swarm optimization based least square support vactor machine to rapid quantitative analysis of Corni Fructus].

    PubMed

    Liu, Xue-song; Sun, Fen-fang; Jin, Ye; Wu, Yong-jiang; Gu, Zhi-xin; Zhu, Li; Yan, Dong-lan

    2015-12-01

    A novel method was developed for the rapid determination of multi-indicators in corni fructus by means of near infrared (NIR) spectroscopy. Particle swarm optimization (PSO) based least squares support vector machine was investigated to increase the levels of quality control. The calibration models of moisture, extractum, morroniside and loganin were established using the PSO-LS-SVM algorithm. The performance of PSO-LS-SVM models was compared with partial least squares regression (PLSR) and back propagation artificial neural network (BP-ANN). The calibration and validation results of PSO-LS-SVM were superior to both PLS and BP-ANN. For PSO-LS-SVM models, the correlation coefficients (r) of calibrations were all above 0.942. The optimal prediction results were also achieved by PSO-LS-SVM models with the RMSEP (root mean square error of prediction) and RSEP (relative standard errors of prediction) less than 1.176 and 15.5% respectively. The results suggest that PSO-LS-SVM algorithm has a good model performance and high prediction accuracy. NIR has a potential value for rapid determination of multi-indicators in Corni Fructus.

  18. Multivariate research in areas of phosphorus cast-iron brake shoes manufacturing using the statistical analysis and the multiple regression equations

    NASA Astrophysics Data System (ADS)

    Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.

    2017-05-01

    The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for maximal response. For the calculation of the regression coefficients, dispersion and correlation coefficients, the software Matlab was used.

  19. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  20. Accounting for informatively missing data in logistic regression by means of reassessment sampling.

    PubMed

    Lin, Ji; Lyles, Robert H

    2015-05-20

    We explore the 'reassessment' design in a logistic regression setting, where a second wave of sampling is applied to recover a portion of the missing data on a binary exposure and/or outcome variable. We construct a joint likelihood function based on the original model of interest and a model for the missing data mechanism, with emphasis on non-ignorable missingness. The estimation is carried out by numerical maximization of the joint likelihood function with close approximation of the accompanying Hessian matrix, using sharable programs that take advantage of general optimization routines in standard software. We show how likelihood ratio tests can be used for model selection and how they facilitate direct hypothesis testing for whether missingness is at random. Examples and simulations are presented to demonstrate the performance of the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.

  1. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback.

    PubMed

    Yang, Yi; Nie, Feiping; Xu, Dong; Luo, Jiebo; Zhuang, Yueting; Pan, Yunhe

    2012-04-01

    We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.

  2. Time-Gated Raman Spectroscopy for Quantitative Determination of Solid-State Forms of Fluorescent Pharmaceuticals.

    PubMed

    Lipiäinen, Tiina; Pessi, Jenni; Movahedi, Parisa; Koivistoinen, Juha; Kurki, Lauri; Tenhunen, Mari; Yliruusi, Jouko; Juppo, Anne M; Heikkonen, Jukka; Pahikkala, Tapio; Strachan, Clare J

    2018-04-03

    Raman spectroscopy is widely used for quantitative pharmaceutical analysis, but a common obstacle to its use is sample fluorescence masking the Raman signal. Time-gating provides an instrument-based method for rejecting fluorescence through temporal resolution of the spectral signal and allows Raman spectra of fluorescent materials to be obtained. An additional practical advantage is that analysis is possible in ambient lighting. This study assesses the efficacy of time-gated Raman spectroscopy for the quantitative measurement of fluorescent pharmaceuticals. Time-gated Raman spectroscopy with a 128 × (2) × 4 CMOS SPAD detector was applied for quantitative analysis of ternary mixtures of solid-state forms of the model drug, piroxicam (PRX). Partial least-squares (PLS) regression allowed quantification, with Raman-active time domain selection (based on visual inspection) improving performance. Model performance was further improved by using kernel-based regularized least-squares (RLS) regression with greedy feature selection in which the data use in both the Raman shift and time dimensions was statistically optimized. Overall, time-gated Raman spectroscopy, especially with optimized data analysis in both the spectral and time dimensions, shows potential for sensitive and relatively routine quantitative analysis of photoluminescent pharmaceuticals during drug development and manufacturing.

  3. Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.

    PubMed

    Gijsberts, Arjan; Metta, Giorgio

    2013-05-01

    Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Multi-Target Regression via Robust Low-Rank Learning.

    PubMed

    Zhen, Xiantong; Yu, Mengyang; He, Xiaofei; Li, Shuo

    2018-02-01

    Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.

  5. Identification of extremely premature infants at high risk of rehospitalization.

    PubMed

    Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D

    2011-11-01

    Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.

  6. Identification of Extremely Premature Infants at High Risk of Rehospitalization

    PubMed Central

    Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.

    2011-01-01

    OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016

  7. Compatible Models of Carbon Content of Individual Trees on a Cunninghamia lanceolata Plantation in Fujian Province, China

    PubMed Central

    Zhuo, Lin; Tao, Hong; Wei, Hong; Chengzhen, Wu

    2016-01-01

    We tried to establish compatible carbon content models of individual trees for a Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) plantation from Fujian province in southeast China. In general, compatibility requires that the sum of components equal the whole tree, meaning that the sum of percentages calculated from component equations should equal 100%. Thus, we used multiple approaches to simulate carbon content in boles, branches, foliage leaves, roots and the whole individual trees. The approaches included (i) single optimal fitting (SOF), (ii) nonlinear adjustment in proportion (NAP) and (iii) nonlinear seemingly unrelated regression (NSUR). These approaches were used in combination with variables relating diameter at breast height (D) and tree height (H), such as D, D2H, DH and D&H (where D&H means two separate variables in bivariate model). Power, exponential and polynomial functions were tested as well as a new general function model was proposed by this study. Weighted least squares regression models were employed to eliminate heteroscedasticity. Model performances were evaluated by using mean residuals, residual variance, mean square error and the determination coefficient. The results indicated that models with two dimensional variables (DH, D2H and D&H) were always superior to those with a single variable (D). The D&H variable combination was found to be the most useful predictor. Of all the approaches, SOF could establish a single optimal model separately, but there were deviations in estimating results due to existing incompatibilities, while NAP and NSUR could ensure predictions compatibility. Simultaneously, we found that the new general model had better accuracy than others. In conclusion, we recommend that the new general model be used to estimate carbon content for Chinese fir and considered for other vegetation types as well. PMID:26982054

  8. Always looking on the bright side of life? Exploring optimism and health in three UK post-industrial urban settings.

    PubMed

    Walsh, David; McCartney, Gerry; McCullough, Sarah; van der Pol, Marjon; Buchanan, Duncan; Jones, Russell

    2015-09-01

    Many theories have been proposed to explain the high levels of 'excess' mortality (i.e. higher mortality over and above that explained by differences in socio-economic circumstances) shown in Scotland-and, especially, in its largest city, Glasgow-compared with elsewhere in the UK. One such proposal relates to differences in optimism, given previously reported evidence of the health benefits of an optimistic outlook. A representative survey of Glasgow, Liverpool and Manchester was undertaken in 2011. Optimism was measured by the Life Orientation Test (Revised) (LOT-R), and compared between the cities by means of multiple linear regression models, adjusting for any differences in sample characteristics. Unadjusted analyses showed LOT-R scores to be similar in Glasgow and Liverpool (mean score (SD): 14.7 (4.0) for both), but lower in Manchester (13.9 (3.8)). This was consistent in analyses by age, gender and social class. Multiple regression confirmed the city results: compared with Glasgow, optimism was either similar (Liverpool: adjusted difference in mean score: -0.16 (95% CI -0.45 to 0.13)) or lower (Manchester: -0.85 (-1.14 to -0.56)). The reasons for high levels of Scottish 'excess' mortality remain unclear. However, differences in psychological outlook such as optimism appear to be an unlikely explanation. © The Author 2015. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Low-dose computed tomography scans with automatic exposure control for patients of different ages undergoing cardiac PET/CT and SPECT/CT.

    PubMed

    Yang, Ching-Ching; Yang, Bang-Hung; Tu, Chun-Yuan; Wu, Tung-Hsin; Liu, Shu-Hsin

    2017-06-01

    This study aimed to evaluate the efficacy of automatic exposure control (AEC) in order to optimize low-dose computed tomography (CT) protocols for patients of different ages undergoing cardiac PET/CT and single-photon emission computed tomography/computed tomography (SPECT/CT). One PET/CT and one SPECT/CT were used to acquire CT images for four anthropomorphic phantoms representative of 1-year-old, 5-year-old and 10-year-old children and an adult. For the hybrid systems investigated in this study, the radiation dose and image quality of cardiac CT scans performed with AEC activated depend mainly on the selection of a predefined image quality index. Multiple linear regression methods were used to analyse image data from anthropomorphic phantom studies to investigate the effects of body size and predefined image quality index on CT radiation dose in cardiac PET/CT and SPECT/CT scans. The regression relationships have a coefficient of determination larger than 0.9, indicating a good fit to the data. According to the regression models, low-dose protocols using the AEC technique were optimized for patients of different ages. In comparison with the standard protocol with AEC activated for adult cardiac examinations used in our clinical routine practice, the optimized paediatric protocols in PET/CT allow 32.2, 63.7 and 79.2% CT dose reductions for anthropomorphic phantoms simulating 10-year-old, 5-year-old and 1-year-old children, respectively. The corresponding results for cardiac SPECT/CT are 8.4, 51.5 and 72.7%. AEC is a practical way to reduce CT radiation dose in cardiac PET/CT and SPECT/CT, but the AEC settings should be determined properly for optimal effect. Our results show that AEC does not eliminate the need for paediatric protocols and CT examinations using the AEC technique should be optimized for paediatric patients to reduce the radiation dose as low as reasonably achievable.

  10. [Study on application of SVM in prediction of coronary heart disease].

    PubMed

    Zhu, Yue; Wu, Jianghua; Fang, Ying

    2013-12-01

    Base on the data of blood pressure, plasma lipid, Glu and UA by physical test, Support Vector Machine (SVM) was applied to identify coronary heart disease (CHD) in patients and non-CHD individuals in south China population for guide of further prevention and treatment of the disease. Firstly, the SVM classifier was built using radial basis kernel function, liner kernel function and polynomial kernel function, respectively. Secondly, the SVM penalty factor C and kernel parameter sigma were optimized by particle swarm optimization (PSO) and then employed to diagnose and predict the CHD. By comparison with those from artificial neural network with the back propagation (BP) model, linear discriminant analysis, logistic regression method and non-optimized SVM, the overall results of our calculation demonstrated that the classification performance of optimized RBF-SVM model could be superior to other classifier algorithm with higher accuracy rate, sensitivity and specificity, which were 94.51%, 92.31% and 96.67%, respectively. So, it is well concluded that SVM could be used as a valid method for assisting diagnosis of CHD.

  11. Optimization of Hyaluronidase Inhibition Activity from Prunus davidiana (Carriere) Franch Fruit Extract Fermented by its Isolated Bacillus subtilis Strain SPF4211.

    PubMed

    Kim, Won-Baek; Park, So Hae; Koo, Kyoung Yoon; Kim, Bo Ram; Kim, Minji; Lee, Heeseob

    2016-09-28

    Strain SPF4211, having hyaluronidase (HAase) inhibition activity, was isolated from P. davidiana (Carriere) Franch fruit (PrDF) sugar extract. The phenotypic and biochemical properties based on 16S rDNA sequencing and an API 50 CHB kit suggested that the organism was B. subtilis. To optimize the HAase inhibition activity of PrDF extract by fermentation of strain SPF4211, a central composite design (CCD) was introduced based on three variables: concentration of PrDF extract (X₁: 1-5%), amount of starter culture (X₂: 1-5%), and fermentation time (X₃: 0-7 days). The experimental data were fitted with quadratic regression equations, and the accuracy of the equations was analyzed by ANOVA. The statistical model predicted the highest HAase inhibition activity of 37.936% under the optimal conditions of X₁ = 1%, X₂ = 2.53%, and X₃ = 7 days. The optimized conditions were validated by observation of an actual HAase inhibition activity of 38.367% from extract of PrDF fermented by SPF4211. These results agree well with the predicted model value.

  12. Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters

    PubMed Central

    2014-01-01

    This paper examined the efficiency of multivariate linear regression (MLR) and artificial neural network (ANN) models in prediction of two major water quality parameters in a wastewater treatment plant. Biochemical oxygen demand (BOD) and chemical oxygen demand (COD) as well as indirect indicators of organic matters are representative parameters for sewer water quality. Performance of the ANN models was evaluated using coefficient of correlation (r), root mean square error (RMSE) and bias values. The computed values of BOD and COD by model, ANN method and regression analysis were in close agreement with their respective measured values. Results showed that the ANN performance model was better than the MLR model. Comparative indices of the optimized ANN with input values of temperature (T), pH, total suspended solid (TSS) and total suspended (TS) for prediction of BOD was RMSE = 25.1 mg/L, r = 0.83 and for prediction of COD was RMSE = 49.4 mg/L, r = 0.81. It was found that the ANN model could be employed successfully in estimating the BOD and COD in the inlet of wastewater biochemical treatment plants. Moreover, sensitive examination results showed that pH parameter have more effect on BOD and COD predicting to another parameters. Also, both implemented models have predicted BOD better than COD. PMID:24456676

  13. A new surrogate modeling technique combining Kriging and polynomial chaos expansions – Application to uncertainty analysis in computational dosimetry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kersaudy, Pierric, E-mail: pierric.kersaudy@orange.com; Whist Lab, 38 avenue du Général Leclerc, 92130 Issy-les-Moulineaux; ESYCOM, Université Paris-Est Marne-la-Vallée, 5 boulevard Descartes, 77700 Marne-la-Vallée

    2015-04-01

    In numerical dosimetry, the recent advances in high performance computing led to a strong reduction of the required computational time to assess the specific absorption rate (SAR) characterizing the human exposure to electromagnetic waves. However, this procedure remains time-consuming and a single simulation can request several hours. As a consequence, the influence of uncertain input parameters on the SAR cannot be analyzed using crude Monte Carlo simulation. The solution presented here to perform such an analysis is surrogate modeling. This paper proposes a novel approach to build such a surrogate model from a design of experiments. Considering a sparse representationmore » of the polynomial chaos expansions using least-angle regression as a selection algorithm to retain the most influential polynomials, this paper proposes to use the selected polynomials as regression functions for the universal Kriging model. The leave-one-out cross validation is used to select the optimal number of polynomials in the deterministic part of the Kriging model. The proposed approach, called LARS-Kriging-PC modeling, is applied to three benchmark examples and then to a full-scale metamodeling problem involving the exposure of a numerical fetus model to a femtocell device. The performances of the LARS-Kriging-PC are compared to an ordinary Kriging model and to a classical sparse polynomial chaos expansion. The LARS-Kriging-PC appears to have better performances than the two other approaches. A significant accuracy improvement is observed compared to the ordinary Kriging or to the sparse polynomial chaos depending on the studied case. This approach seems to be an optimal solution between the two other classical approaches. A global sensitivity analysis is finally performed on the LARS-Kriging-PC model of the fetus exposure problem.« less

  14. Integration of measurements with atmospheric dispersion models: Source term estimation for dispersal of (239)Pu due to non-nuclear detonation of high explosive

    NASA Astrophysics Data System (ADS)

    Edwards, L. L.; Harvey, T. F.; Freis, R. P.; Pitovranov, S. E.; Chernokozhin, E. V.

    1992-10-01

    The accuracy associated with assessing the environmental consequences of an accidental release of radioactivity is highly dependent on our knowledge of the source term characteristics and, in the case when the radioactivity is condensed on particles, the particle size distribution, all of which are generally poorly known. This paper reports on the development of a numerical technique that integrates the radiological measurements with atmospheric dispersion modeling. This results in a more accurate particle-size distribution and particle injection height estimation when compared with measurements of high explosive dispersal of (239)Pu. The estimation model is based on a non-linear least squares regression scheme coupled with the ARAC three-dimensional atmospheric dispersion models. The viability of the approach is evaluated by estimation of ADPIC model input parameters such as the ADPIC particle size mean aerodynamic diameter, the geometric standard deviation, and largest size. Additionally we estimate an optimal 'coupling coefficient' between the particles and an explosive cloud rise model. The experimental data are taken from the Clean Slate 1 field experiment conducted during 1963 at the Tonopah Test Range in Nevada. The regression technique optimizes the agreement between the measured and model predicted concentrations of (239)Pu by varying the model input parameters within their respective ranges of uncertainties. The technique generally estimated the measured concentrations within a factor of 1.5, with the worst estimate being within a factor of 5, very good in view of the complexity of the concentration measurements, the uncertainties associated with the meteorological data, and the limitations of the models. The best fit also suggest a smaller mean diameter and a smaller geometric standard deviation on the particle size as well as a slightly weaker particle to cloud coupling than previously reported.

  15. Prediction of soil organic carbon with different parent materials development using visible-near infrared spectroscopy.

    PubMed

    Liu, Jinbao; Han, Jichang; Zhang, Yang; Wang, Huanyuan; Kong, Hui; Shi, Lei

    2018-06-05

    The storage of soil organic carbon (SOC) should improve soil fertility. Conventional determination of SOC is expensive and tedious. Visible-near infrared reflectance spectroscopy is a practical and cost-effective approach that has been successfully used SOC concentration. Soil spectral inversion model could quickly and efficiently determine SOC content. This paper presents a study dealing with SOC estimation through the combination of soil spectroscopy and stepwise multiple linear regression (SMLR), partial least squares regression (PLSR), principal component regression (PCR). Spectral measurements for 106 soil samples were acquired using an ASD FieldSpec 4 standard-res spectroradiometer (350-2500 nm). Six types of transformations and three regression methods were applied to build for the quantification of different parent materials development soil. The results show that (1)the basaltic volcanic clastics development of SOC spectral response bands located in 500 nm, 800 nm; Trachyte spectral response of the soil quality, and the volcanic clastics development at 405 nm, 465 nm, 575 nm, 1105 nm. (2) Basaltic volcanic debris soil development, first deviation of maximum correlation coefficient is 0.8898; thick surface soil of the development of rocky volcanic debris from bottom reflectivity logarithm of first deviation of maximum correlation coefficient is 0.9029. (3) Soil organic matter content of basaltic volcanic clastics development optimal prediction model based on spectral reflectance inverse logarithms of first deviation of SMLR. Independent variable number is 7, Rv 2  = 0.9720, RMSEP = 2.0590, sig = 0.003. Trachyte qualitative volcanic clastics developed soil organic matter content of the optimal prediction model based on spectral reflectance inverse logarithms of first deviation of PLSR. Model number of the independent variables Pc = 5, Rc = 0.9872, Rc 2  = 0.9745, RMSEC = 0.4821, SEC = 0.4906, forecasts determine coefficient Rv 2  = 0.9702, RMSEP = 0.9563, SEP = 0.9711, Bias = 0.0637. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. On-Board Real-Time Optimization Control for Turbo-Fan Engine Life Extending

    NASA Astrophysics Data System (ADS)

    Zheng, Qiangang; Zhang, Haibo; Miao, Lizhen; Sun, Fengyong

    2017-11-01

    A real-time optimization control method is proposed to extend turbo-fan engine service life. This real-time optimization control is based on an on-board engine mode, which is devised by a MRR-LSSVR (multi-input multi-output recursive reduced least squares support vector regression method). To solve the optimization problem, a FSQP (feasible sequential quadratic programming) algorithm is utilized. The thermal mechanical fatigue is taken into account during the optimization process. Furthermore, to describe the engine life decaying, a thermal mechanical fatigue model of engine acceleration process is established. The optimization objective function not only contains the sub-item which can get fast response of the engine, but also concludes the sub-item of the total mechanical strain range which has positive relationship to engine fatigue life. Finally, the simulations of the conventional optimization control which just consider engine acceleration performance or the proposed optimization method have been conducted. The simulations demonstrate that the time of the two control methods from idle to 99.5 % of the maximum power are equal. However, the engine life using the proposed optimization method could be surprisingly increased by 36.17 % compared with that using conventional optimization control.

  17. The influence of changes in land use and landscape patterns on soil erosion in a watershed.

    PubMed

    Zhang, Shanghong; Fan, Weiwei; Li, Yueqiang; Yi, Yujun

    2017-01-01

    It is very important to have a good understanding of the relation between soil erosion and landscape patterns so that soil and water conservation in river basins can be optimized. In this study, this relationship was explored, using the Liusha River Watershed, China, as a case study. A distributed water and sediment model based on the Soil and Water Assessment Tool (SWAT) was developed to simulate soil erosion from different land use types in each sub-basin of the Liusha River Watershed. Observed runoff and sediment data from 1985 to 2005 and land use maps from 1986, 1995, and 2000 were used to calibrate and validate the model. The erosion modulus for each sub-basin was calculated from SWAT model results using the different land use maps and 12 landscape indices were chosen and calculated to describe the land use in each sub-basin for the different years. The variations in instead of the absolute amounts of the erosion modulus and the landscape indices for each sub-basin were used as the dependent and independent variables, respectively, for the regression equations derived from multiple linear regression. The results indicated that the variations in the erosion modulus were closely related to changes in the large patch index, patch cohesion index, modified Simpson's evenness index, and the aggregation index. From the regression equation and the corresponding landscape indices, it was found that watershed erosion can be reduced by decreasing the physical connectivity between patches, improving the evenness of the landscape patch types, enriching landscape types, and enhancing the degree of aggregation between the landscape patches. These findings will be useful for water and soil conservation and for optimizing the management of watershed landscapes. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. A fast identification algorithm for Box-Cox transformation based radial basis function neural network.

    PubMed

    Hong, Xia

    2006-07-01

    In this letter, a Box-Cox transformation-based radial basis function (RBF) neural network is introduced using the RBF neural network to represent the transformed system output. Initially a fixed and moderate sized RBF model base is derived based on a rank revealing orthogonal matrix triangularization (QR decomposition). Then a new fast identification algorithm is introduced using Gauss-Newton algorithm to derive the required Box-Cox transformation, based on a maximum likelihood estimator. The main contribution of this letter is to explore the special structure of the proposed RBF neural network for computational efficiency by utilizing the inverse of matrix block decomposition lemma. Finally, the Box-Cox transformation-based RBF neural network, with good generalization and sparsity, is identified based on the derived optimal Box-Cox transformation and a D-optimality-based orthogonal forward regression algorithm. The proposed algorithm and its efficacy are demonstrated with an illustrative example in comparison with support vector machine regression.

  19. Statistical optimization of process parameters for the production of tannase by Aspergillus flavus under submerged fermentation.

    PubMed

    Mohan, S K; Viruthagiri, T; Arunkumar, C

    2014-04-01

    Production of tannase by Aspergillus flavus (MTCC 3783) using tamarind seed powder as substrate was studied in submerged fermentation. Plackett-Burman design was applied for the screening of 12 medium nutrients. From the results, the significant nutrients were identified as tannic acid, magnesium sulfate, ferrous sulfate and ammonium sulfate. Further the optimization of process parameters was carried out using response surface methodology (RSM). RSM has been applied for designing of experiments to evaluate the interactive effects through a full 31 factorial design. The optimum conditions were tannic acid concentration, 3.22 %; fermentation period, 96 h; temperature, 35.1 °C; and pH 5.4. Higher value of the regression coefficient (R 2  = 0.9638) indicates excellent evaluation of experimental data by second-order polynomial regression model. The RSM revealed that a maximum tannase production of 139.3 U/ml was obtained at the optimum conditions.

  20. Model inversion via multi-fidelity Bayesian optimization: a new paradigm for parameter estimation in haemodynamics, and beyond.

    PubMed

    Perdikaris, Paris; Karniadakis, George Em

    2016-05-01

    We present a computational framework for model inversion based on multi-fidelity information fusion and Bayesian optimization. The proposed methodology targets the accurate construction of response surfaces in parameter space, and the efficient pursuit to identify global optima while keeping the number of expensive function evaluations at a minimum. We train families of correlated surrogates on available data using Gaussian processes and auto-regressive stochastic schemes, and exploit the resulting predictive posterior distributions within a Bayesian optimization setting. This enables a smart adaptive sampling procedure that uses the predictive posterior variance to balance the exploration versus exploitation trade-off, and is a key enabler for practical computations under limited budgets. The effectiveness of the proposed framework is tested on three parameter estimation problems. The first two involve the calibration of outflow boundary conditions of blood flow simulations in arterial bifurcations using multi-fidelity realizations of one- and three-dimensional models, whereas the last one aims to identify the forcing term that generated a particular solution to an elliptic partial differential equation. © 2016 The Author(s).

  1. Model inversion via multi-fidelity Bayesian optimization: a new paradigm for parameter estimation in haemodynamics, and beyond

    PubMed Central

    Perdikaris, Paris; Karniadakis, George Em

    2016-01-01

    We present a computational framework for model inversion based on multi-fidelity information fusion and Bayesian optimization. The proposed methodology targets the accurate construction of response surfaces in parameter space, and the efficient pursuit to identify global optima while keeping the number of expensive function evaluations at a minimum. We train families of correlated surrogates on available data using Gaussian processes and auto-regressive stochastic schemes, and exploit the resulting predictive posterior distributions within a Bayesian optimization setting. This enables a smart adaptive sampling procedure that uses the predictive posterior variance to balance the exploration versus exploitation trade-off, and is a key enabler for practical computations under limited budgets. The effectiveness of the proposed framework is tested on three parameter estimation problems. The first two involve the calibration of outflow boundary conditions of blood flow simulations in arterial bifurcations using multi-fidelity realizations of one- and three-dimensional models, whereas the last one aims to identify the forcing term that generated a particular solution to an elliptic partial differential equation. PMID:27194481

  2. ELASTIC NET FOR COX’S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM

    PubMed Central

    Wu, Yichao

    2012-01-01

    For least squares regression, Efron et al. (2004) proposed an efficient solution path algorithm, the least angle regression (LAR). They showed that a slight modification of the LAR leads to the whole LASSO solution path. Both the LAR and LASSO solution paths are piecewise linear. Recently Wu (2011) extended the LAR to generalized linear models and the quasi-likelihood method. In this work we extend the LAR further to handle Cox’s proportional hazards model. The goal is to develop a solution path algorithm for the elastic net penalty (Zou and Hastie (2005)) in Cox’s proportional hazards model. This goal is achieved in two steps. First we extend the LAR to optimizing the log partial likelihood plus a fixed small ridge term. Then we define a path modification, which leads to the solution path of the elastic net regularized log partial likelihood. Our solution path is exact and piecewise determined by ordinary differential equation systems. PMID:23226932

  3. Modeling and optimization by particle swarm embedded neural network for adsorption of zinc (II) by palm kernel shell based activated carbon from aqueous environment.

    PubMed

    Karri, Rama Rao; Sahu, J N

    2018-01-15

    Zn (II) is one the common pollutant among heavy metals found in industrial effluents. Removal of pollutant from industrial effluents can be accomplished by various techniques, out of which adsorption was found to be an efficient method. Applications of adsorption limits itself due to high cost of adsorbent. In this regard, a low cost adsorbent produced from palm oil kernel shell based agricultural waste is examined for its efficiency to remove Zn (II) from waste water and aqueous solution. The influence of independent process variables like initial concentration, pH, residence time, activated carbon (AC) dosage and process temperature on the removal of Zn (II) by palm kernel shell based AC from batch adsorption process are studied systematically. Based on the design of experimental matrix, 50 experimental runs are performed with each process variable in the experimental range. The optimal values of process variables to achieve maximum removal efficiency is studied using response surface methodology (RSM) and artificial neural network (ANN) approaches. A quadratic model, which consists of first order and second order degree regressive model is developed using the analysis of variance and RSM - CCD framework. The particle swarm optimization which is a meta-heuristic optimization is embedded on the ANN architecture to optimize the search space of neural network. The optimized trained neural network well depicts the testing data and validation data with R 2 equal to 0.9106 and 0.9279 respectively. The outcomes indicates that the superiority of ANN-PSO based model predictions over the quadratic model predictions provided by RSM. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. The use of geographic information system and 1860s cadastral data to model agricultural suitability before heavy mechanization. A case study from Malta.

    PubMed

    Alberti, Gianmarco; Grima, Reuben; Vella, Nicholas C

    2018-01-01

    The present study seeks to understand the determinants of land agricultural suitability in Malta before heavy mechanization. A GIS-based Logistic Regression model is built on the basis of the data from mid-1800s cadastral maps (cabreo). This is the first time that such data are being used for the purpose of building a predictive model. The maps record the agricultural quality of parcels (ranging from good to lowest), which is represented by different colours. The study treats the agricultural quality as a depended variable with two levels: optimal (corresponding to the good class) vs. non-optimal quality (mediocre, bad, low, and lowest classes). Seventeen predictors are isolated on the basis of literature review and data availability. Logistic Regression is used to isolate the predictors that can be considered determinants of the agricultural quality. Our model has an optimal discriminatory power (AUC: 0.92). The positive effect on land agricultural quality of the following predictors is considered and discussed: sine of the aspect (odds ratio 1.42), coast distance (2.46), Brown Rendzinas (2.31), Carbonate Raw (2.62) and Xerorendzinas (9.23) soils, distance to minor roads (4.88). Predictors resulting having a negative effect are: terrain elevation (0.96), slope (0.97), distance to the nearest geological fault lines (0.09), Terra Rossa soil (0.46), distance to secondary roads (0.19) and footpaths (0.41). The model isolates a host of topographic and cultural variables, the latter related to human mobility and landscape accessibility, which differentially contributed to the agricultural suitability, providing the bases for the creation of the fragmented and extremely variegated agricultural landscape that is the hallmark of the Maltese Islands. Our findings are also useful to suggest new questions that may be posed to the more meagre evidence from earlier periods.

  5. The use of geographic information system and 1860s cadastral data to model agricultural suitability before heavy mechanization. A case study from Malta

    PubMed Central

    Grima, Reuben; Vella, Nicholas C.

    2018-01-01

    The present study seeks to understand the determinants of land agricultural suitability in Malta before heavy mechanization. A GIS-based Logistic Regression model is built on the basis of the data from mid-1800s cadastral maps (cabreo). This is the first time that such data are being used for the purpose of building a predictive model. The maps record the agricultural quality of parcels (ranging from good to lowest), which is represented by different colours. The study treats the agricultural quality as a depended variable with two levels: optimal (corresponding to the good class) vs. non-optimal quality (mediocre, bad, low, and lowest classes). Seventeen predictors are isolated on the basis of literature review and data availability. Logistic Regression is used to isolate the predictors that can be considered determinants of the agricultural quality. Our model has an optimal discriminatory power (AUC: 0.92). The positive effect on land agricultural quality of the following predictors is considered and discussed: sine of the aspect (odds ratio 1.42), coast distance (2.46), Brown Rendzinas (2.31), Carbonate Raw (2.62) and Xerorendzinas (9.23) soils, distance to minor roads (4.88). Predictors resulting having a negative effect are: terrain elevation (0.96), slope (0.97), distance to the nearest geological fault lines (0.09), Terra Rossa soil (0.46), distance to secondary roads (0.19) and footpaths (0.41). The model isolates a host of topographic and cultural variables, the latter related to human mobility and landscape accessibility, which differentially contributed to the agricultural suitability, providing the bases for the creation of the fragmented and extremely variegated agricultural landscape that is the hallmark of the Maltese Islands. Our findings are also useful to suggest new questions that may be posed to the more meagre evidence from earlier periods. PMID:29415059

  6. Chance-constrained multi-objective optimization of groundwater remediation design at DNAPLs-contaminated sites using a multi-algorithm genetically adaptive method.

    PubMed

    Ouyang, Qi; Lu, Wenxi; Hou, Zeyu; Zhang, Yu; Li, Shuai; Luo, Jiannan

    2017-05-01

    In this paper, a multi-algorithm genetically adaptive multi-objective (AMALGAM) method is proposed as a multi-objective optimization solver. It was implemented in the multi-objective optimization of a groundwater remediation design at sites contaminated by dense non-aqueous phase liquids. In this study, there were two objectives: minimization of the total remediation cost, and minimization of the remediation time. A non-dominated sorting genetic algorithm II (NSGA-II) was adopted to compare with the proposed method. For efficiency, the time-consuming surfactant-enhanced aquifer remediation simulation model was replaced by a surrogate model constructed by a multi-gene genetic programming (MGGP) technique. Similarly, two other surrogate modeling methods-support vector regression (SVR) and Kriging (KRG)-were employed to make comparisons with MGGP. In addition, the surrogate-modeling uncertainty was incorporated in the optimization model by chance-constrained programming (CCP). The results showed that, for the problem considered in this study, (1) the solutions obtained by AMALGAM incurred less remediation cost and required less time than those of NSGA-II, indicating that AMALGAM outperformed NSGA-II. It was additionally shown that (2) the MGGP surrogate model was more accurate than SVR and KRG; and (3) the remediation cost and time increased with the confidence level, which can enable decision makers to make a suitable choice by considering the given budget, remediation time, and reliability. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Framework for Smart Electronic Health Record-Linked Predictive Models to Optimize Care for Complex Digestive Diseases

    DTIC Science & Technology

    2012-06-01

    indices was independent of the presence or absence of hepatic steatosis on abdominal imaging. Key Research Accomplishments Abstracts presented at...transplant; 3) hepatic encephalopathy; and 4) hepatocellular carcinoma. Logistic regression analysis confirmed that the clinical predictive value of the...Gumus S, Saul,MI Bae KT. Noninvasive Hepatic Fibrosis Scores Predict Liver-Related Outcomes in Diabetic Patients [abstract]. Gastroenterology. 2012

  8. A New Mathematical Framework for Design Under Uncertainty

    DTIC Science & Technology

    2016-05-05

    blending multiple information sources via auto-regressive stochastic modeling. A computationally efficient machine learning framework is developed based on...sion and machine learning approaches; see Fig. 1. This will lead to a comprehensive description of system performance with less uncertainty than in the...Bayesian optimization of super-cavitating hy- drofoils The goal of this study is to demonstrate the capabilities of statistical learning and

  9. Design and statistical optimization of glipizide loaded lipospheres using response surface methodology.

    PubMed

    Shivakumar, Hagalavadi Nanjappa; Patel, Pragnesh Bharat; Desai, Bapusaheb Gangadhar; Ashok, Purnima; Arulmozhi, Sinnathambi

    2007-09-01

    A 32 factorial design was employed to produce glipizide lipospheres by the emulsification phase separation technique using paraffin wax and stearic acid as retardants. The effect of critical formulation variables, namely levels of paraffin wax (X1) and proportion of stearic acid in the wax (X2) on geometric mean diameter (dg), percent encapsulation efficiency (% EE), release at the end of 12 h (rel12) and time taken for 50% of drug release (t50), were evaluated using the F-test. Mathematical models containing only the significant terms were generated for each response parameter using the multiple linear regression analysis (MLRA) and analysis of variance (ANOVA). Both formulation variables studied exerted a significant influence (p < 0.05) on the response parameters. Numerical optimization using the desirability approach was employed to develop an optimized formulation by setting constraints on the dependent and independent variables. The experimental values of dg, % EE, rel12 and t50 values for the optimized formulation were found to be 57.54 +/- 1.38 mum, 86.28 +/- 1.32%, 77.23 +/- 2.78% and 5.60 +/- 0.32 h, respectively, which were in close agreement with those predicted by the mathematical models. The drug release from lipospheres followed first-order kinetics and was characterized by the Higuchi diffusion model. The optimized liposphere formulation developed was found to produce sustained anti-diabetic activity following oral administration in rats.

  10. Improved helicopter aeromechanical stability analysis using segmented constrained layer damping and hybrid optimization

    NASA Astrophysics Data System (ADS)

    Liu, Qiang; Chattopadhyay, Aditi

    2000-06-01

    Aeromechanical stability plays a critical role in helicopter design and lead-lag damping is crucial to this design. In this paper, the use of segmented constrained damping layer (SCL) treatment and composite tailoring is investigated for improved rotor aeromechanical stability using formal optimization technique. The principal load-carrying member in the rotor blade is represented by a composite box beam, of arbitrary thickness, with surface bonded SCLs. A comprehensive theory is used to model the smart box beam. A ground resonance analysis model and an air resonance analysis model are implemented in the rotor blade built around the composite box beam with SCLs. The Pitt-Peters dynamic inflow model is used in air resonance analysis under hover condition. A hybrid optimization technique is used to investigate the optimum design of the composite box beam with surface bonded SCLs for improved damping characteristics. Parameters such as stacking sequence of the composite laminates and placement of SCLs are used as design variables. Detailed numerical studies are presented for aeromechanical stability analysis. It is shown that optimum blade design yields significant increase in rotor lead-lag regressive modal damping compared to the initial system.

  11. Hyperspectral Imaging for Predicting the Internal Quality of Kiwifruits Based on Variable Selection Algorithms and Chemometric Models.

    PubMed

    Zhu, Hongyan; Chu, Bingquan; Fan, Yangyang; Tao, Xiaoya; Yin, Wenxin; He, Yong

    2017-08-10

    We investigated the feasibility and potentiality of determining firmness, soluble solids content (SSC), and pH in kiwifruits using hyperspectral imaging, combined with variable selection methods and calibration models. The images were acquired by a push-broom hyperspectral reflectance imaging system covering two spectral ranges. Weighted regression coefficients (BW), successive projections algorithm (SPA) and genetic algorithm-partial least square (GAPLS) were compared and evaluated for the selection of effective wavelengths. Moreover, multiple linear regression (MLR), partial least squares regression and least squares support vector machine (LS-SVM) were developed to predict quality attributes quantitatively using effective wavelengths. The established models, particularly SPA-MLR, SPA-LS-SVM and GAPLS-LS-SVM, performed well. The SPA-MLR models for firmness (R pre  = 0.9812, RPD = 5.17) and SSC (R pre  = 0.9523, RPD = 3.26) at 380-1023 nm showed excellent performance, whereas GAPLS-LS-SVM was the optimal model at 874-1734 nm for predicting pH (R pre  = 0.9070, RPD = 2.60). Image processing algorithms were developed to transfer the predictive model in every pixel to generate prediction maps that visualize the spatial distribution of firmness and SSC. Hence, the results clearly demonstrated that hyperspectral imaging has the potential as a fast and non-invasive method to predict the quality attributes of kiwifruits.

  12. Applying complex models to poultry production in the future--economics and biology.

    PubMed

    Talpaz, H; Cohen, M; Fancher, B; Halley, J

    2013-09-01

    The ability to determine the optimal broiler feed nutrient density that maximizes margin over feeding cost (MOFC) has obvious economic value. To determine optimal feed nutrient density, one must consider ingredient prices, meat values, the product mix being marketed, and the projected biological performance. A series of 8 feeding trials was conducted to estimate biological responses to changes in ME and amino acid (AA) density. Eight different genotypes of sex-separate reared broilers were fed diets varying in ME (2,723-3,386 kcal of ME/kg) and AA (0.89-1.65% digestible lysine with all essential AA acids being indexed to lysine) levels. Broilers were processed to determine carcass component yield at many different BW (1.09-4.70 kg). Trial data generated were used in model constructed to discover the dietary levels of ME and AA that maximize MOFC on a per broiler or per broiler annualized basis (bird × number of cycles/year). The model was designed to estimate the effects of dietary nutrient concentration on broiler live weight, feed conversion, mortality, and carcass component yield. Estimated coefficients from the step-wise regression process are subsequently used to predict the optimal ME and AA concentrations that maximize MOFC. The effects of changing feed or meat prices across a wide spectrum on optimal ME and AA levels can be evaluated via parametric analysis. The model can rapidly compare both biological and economic implications of changing from current practice to the simulated optimal solution. The model can be exploited to enhance decision making under volatile market conditions.

  13. A new approach to synthesis of benzyl cinnamate: Optimization by response surface methodology.

    PubMed

    Zhang, Dong-Hao; Zhang, Jiang-Yan; Che, Wen-Cai; Wang, Yun

    2016-09-01

    In this work, the new approach to synthesis of benzyl cinnamate by enzymatic esterification of cinnamic acid with benzyl alcohol is optimized by response surface methodology. The effects of various reaction conditions, including temperature, enzyme loading, substrate molar ratio of benzyl alcohol to cinnamic acid, and reaction time, are investigated. A 5-level-4-factor central composite design is employed to search for the optimal yield of benzyl cinnamate. A quadratic polynomial regression model is used to analyze the experimental data at a 95% confidence level (P<0.05). The coefficient of determination of this model is found to be 0.9851. Three sets of optimum reaction conditions are established, and the verified experimental trials are performed for validating the optimum points. Under the optimum conditions (40°C, 31mg/mL enzyme loading, 2.6:1 molar ratio, 27h), the yield reaches 97.7%, which provides an efficient processes for industrial production of benzyl cinnamate. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Multi-criteria optimization for ultrasonic-assisted extraction of antioxidants from Pericarpium Citri Reticulatae using response surface methodology, an activity-based approach.

    PubMed

    Zeng, Shanshan; Wang, Lu; Zhang, Lei; Qu, Haibin; Gong, Xingchu

    2013-06-01

    An activity-based approach to optimize the ultrasonic-assisted extraction of antioxidants from Pericarpium Citri Reticulatae (Chenpi in Chinese) was developed. Response surface optimization based on a quantitative composition-activity relationship model showed the relationships among product chemical composition, antioxidant activity of extract, and parameters of extraction process. Three parameters of ultrasonic-assisted extraction, including the ethanol/water ratio, Chenpi amount, and alkaline amount, were investigated to give optimum extraction conditions for antioxidants of Chenpi: ethanol/water 70:30 v/v, Chenpi amount of 10 g, and alkaline amount of 28 mg. The experimental antioxidant yield under the optimum conditions was found to be 196.5 mg/g Chenpi, and the antioxidant activity was 2023.8 μmol Trolox equivalents/g of the Chenpi powder. The results agreed well with the second-order polynomial regression model. This presented approach promised great application potentials in both food and pharmaceutical industries. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Quantum algorithm for linear regression

    NASA Astrophysics Data System (ADS)

    Wang, Guoming

    2017-07-01

    We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.

  16. Spatial analysis of relative humidity during ungauged periods in a mountainous region

    NASA Astrophysics Data System (ADS)

    Um, Myoung-Jin; Kim, Yeonjoo

    2017-08-01

    Although atmospheric humidity influences environmental and agricultural conditions, thereby influencing plant growth, human health, and air pollution, efforts to develop spatial maps of atmospheric humidity using statistical approaches have thus far been limited. This study therefore aims to develop statistical approaches for inferring the spatial distribution of relative humidity (RH) for a mountainous island, for which data are not uniformly available across the region. A multiple regression analysis based on various mathematical models was used to identify the optimal model for estimating monthly RH by incorporating not only temperature but also location and elevation. Based on the regression analysis, we extended the monthly RH data from weather stations to cover the ungauged periods when no RH observations were available. Then, two different types of station-based data, the observational data and the data extended via the regression model, were used to form grid-based data with a resolution of 100 m. The grid-based data that used the extended station-based data captured the increasing RH trend along an elevation gradient. Furthermore, annual RH values averaged over the regions were examined. Decreasing temporal trends were found in most cases, with magnitudes varying based on the season and region.

  17. Influence of soil pH on the sorption of ionizable chemicals: modeling advances.

    PubMed

    Franco, Antonio; Fu, Wenjing; Trapp, Stefan

    2009-03-01

    The soil-water distribution coefficient of ionizable chemicals (K(d)) depends on the soil acidity, mainly because the pH governs speciation. Using pH-specific K(d) values normalized to organic carbon (K(OC)) from the literature, a method was developed to estimate the K(OC) of monovalent organic acids and bases. The regression considers pH-dependent speciation and species-specific partition coefficients, calculated from the dissociation constant (pK(a)) and the octanol-water partition coefficient of the neutral molecule (log P(n)). Probably because of the lower pH near the organic colloid-water interface, the optimal pH to model dissociation was lower than the bulk soil pH. The knowledge of the soil pH allows calculation of the fractions of neutral and ionic molecules in the system, thus improving the existing regression for acids. The same approach was not successful with bases, for which the impact of pH on the total sorption is contrasting. In fact, the shortcomings of the model assumptions affect the predictive power for acids and for bases differently. We evaluated accuracy and limitations of the regressions for their use in the environmental fate assessment of ionizable chemicals.

  18. A stepwise model to predict monthly streamflow

    NASA Astrophysics Data System (ADS)

    Mahmood Al-Juboori, Anas; Guven, Aytac

    2016-12-01

    In this study, a stepwise model empowered with genetic programming is developed to predict the monthly flows of Hurman River in Turkey and Diyalah and Lesser Zab Rivers in Iraq. The model divides the monthly flow data to twelve intervals representing the number of months in a year. The flow of a month, t is considered as a function of the antecedent month's flow (t - 1) and it is predicted by multiplying the antecedent monthly flow by a constant value called K. The optimum value of K is obtained by a stepwise procedure which employs Gene Expression Programming (GEP) and Nonlinear Generalized Reduced Gradient Optimization (NGRGO) as alternative to traditional nonlinear regression technique. The degree of determination and root mean squared error are used to evaluate the performance of the proposed models. The results of the proposed model are compared with the conventional Markovian and Auto Regressive Integrated Moving Average (ARIMA) models based on observed monthly flow data. The comparison results based on five different statistic measures show that the proposed stepwise model performed better than Markovian model and ARIMA model. The R2 values of the proposed model range between 0.81 and 0.92 for the three rivers in this study.

  19. Polynomial order selection in random regression models via penalizing adaptively the likelihood.

    PubMed

    Corrales, J D; Munilla, S; Cantet, R J C

    2015-08-01

    Orthogonal Legendre polynomials (LP) are used to model the shape of additive genetic and permanent environmental effects in random regression models (RRM). Frequently, the Akaike (AIC) and the Bayesian (BIC) information criteria are employed to select LP order. However, it has been theoretically shown that neither AIC nor BIC is simultaneously optimal in terms of consistency and efficiency. Thus, the goal was to introduce a method, 'penalizing adaptively the likelihood' (PAL), as a criterion to select LP order in RRM. Four simulated data sets and real data (60,513 records, 6675 Colombian Holstein cows) were employed. Nested models were fitted to the data, and AIC, BIC and PAL were calculated for all of them. Results showed that PAL and BIC identified with probability of one the true LP order for the additive genetic and permanent environmental effects, but AIC tended to favour over parameterized models. Conversely, when the true model was unknown, PAL selected the best model with higher probability than AIC. In the latter case, BIC never favoured the best model. To summarize, PAL selected a correct model order regardless of whether the 'true' model was within the set of candidates. © 2015 Blackwell Verlag GmbH.

  20. Seasonal forecasting for water resource management: the example of CNR Genissiat dam on the Rhone River in France

    NASA Astrophysics Data System (ADS)

    Dommanget, Etienne; Bellier, Joseph; Ben Daoud, Aurélien; Graff, Benjamin

    2014-05-01

    Compagnie Nationale du Rhône (CNR) has been granted the concession to operate the Rhone River from the Swiss border to the Mediterranean Sea since 1933 and carries out three interdependent missions: navigation, irrigation and hydropower production. Nowadays, CNR generates one quarter of France's hydropower electricity. The convergence of public and private interests around optimizing the management of water resources throughout the French Rhone valley led CNR to develop hydrological models dedicated to discharge seasonal forecasting. Indeed, seasonal forecasting is a major issue for CNR and water resource management, in order to optimize long-term investments of the produced electricity, plan dam maintenance operations and anticipate low water period. Seasonal forecasting models have been developed on the Genissiat dam. With an installed capacity of 420MW, Genissiat dam is the first of the 19 CNR's hydropower plants. Discharge forecasting at Genissiat dam is strategic since its inflows contributes to 20% of the total Rhone average discharge and consequently to 40% of the total Rhone hydropower production. Forecasts are based on hydrological statistical models. Discharge on the main Rhone River tributaries upstream Genissiat dam are forecasted from 1 to 6 months ahead thanks to multiple linear regressions. Inputs data of these regressions are identified depending on river hydrological regimes and periods of the year. For the melting season, from spring to summer, snow water equivalent (SWE) data are of major importance. SWE data are calculated from Crocus model (Météo France) and SLF's model (Switzerland). CNR hydro-meteorological forecasters assessed meteorological trends regarding precipitations for the next coming months. These trends are used to generate stochastically precipitation scenarios in order to complement regression data set. This probabilistic approach build a decision-making supports for CNR's water resource management team and provides them with seasonal forecasts and their confidence interval. After a presentation of CNR methodology, results for the years 2011 and 2013 will illustrate CNR's seasonal forecasting models ability. These years are of particular interest regarding water resource management seeing that they are, respectively, unusually dry and snowy. Model performances will be assessed in comparison with historical climatology thanks to CRPS skill score.

  1. Neural Network and Regression Approximations in High Speed Civil Transport Aircraft Design Optimization

    NASA Technical Reports Server (NTRS)

    Patniak, Surya N.; Guptill, James D.; Hopkins, Dale A.; Lavelle, Thomas M.

    1998-01-01

    Nonlinear mathematical-programming-based design optimization can be an elegant method. However, the calculations required to generate the merit function, constraints, and their gradients, which are frequently required, can make the process computational intensive. The computational burden can be greatly reduced by using approximating analyzers derived from an original analyzer utilizing neural networks and linear regression methods. The experience gained from using both of these approximation methods in the design optimization of a high speed civil transport aircraft is the subject of this paper. The Langley Research Center's Flight Optimization System was selected for the aircraft analysis. This software was exercised to generate a set of training data with which a neural network and a regression method were trained, thereby producing the two approximating analyzers. The derived analyzers were coupled to the Lewis Research Center's CometBoards test bed to provide the optimization capability. With the combined software, both approximation methods were examined for use in aircraft design optimization, and both performed satisfactorily. The CPU time for solution of the problem, which had been measured in hours, was reduced to minutes with the neural network approximation and to seconds with the regression method. Instability encountered in the aircraft analysis software at certain design points was also eliminated. On the other hand, there were costs and difficulties associated with training the approximating analyzers. The CPU time required to generate the input-output pairs and to train the approximating analyzers was seven times that required for solution of the problem.

  2. [Optimization of ultrasonic-assisted extraction of total flavonoids from leaves of the Artocarpus heterophyllus by response surface methodology].

    PubMed

    Wang, Hong-wu; Liu, Yan-qing; Wang, Yuan-hong

    2011-07-01

    To investigate the ultrasonic-assisted extract on of total flavonoids from leaves of the Artocarpus heterophyllus. Investigated the effects of ethanol concentration, extraction time, and liquid-solid ratio on flavonoids yield. A 17-run response surface design involving three factors at three levels was generated by the Design-Expert software and experimental data obtained were subjected to quadratic regression analysis to create a mathematical model describing flavonoids extraction. The optimum ultrasonic assisted extraction conditions were: ethanol volume fraction 69.4% and liquid-solid ratio of 22.6:1 for 32 min. Under these optimized conditions, the yield of flavonoids was 7.55 mg/g. The Box-Behnken design and response surface analysis can well optimize the ultrasonic-assisted extraction of total flavonoids from Artocarpus heterophyllus.

  3. Cascade Optimization for Aircraft Engines With Regression and Neural Network Analysis - Approximators

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Guptill, James D.; Hopkins, Dale A.; Lavelle, Thomas M.

    2000-01-01

    The NASA Engine Performance Program (NEPP) can configure and analyze almost any type of gas turbine engine that can be generated through the interconnection of a set of standard physical components. In addition, the code can optimize engine performance by changing adjustable variables under a set of constraints. However, for engine cycle problems at certain operating points, the NEPP code can encounter difficulties: nonconvergence in the currently implemented Powell's optimization algorithm and deficiencies in the Newton-Raphson solver during engine balancing. A project was undertaken to correct these deficiencies. Nonconvergence was avoided through a cascade optimization strategy, and deficiencies associated with engine balancing were eliminated through neural network and linear regression methods. An approximation-interspersed cascade strategy was used to optimize the engine's operation over its flight envelope. Replacement of Powell's algorithm by the cascade strategy improved the optimization segment of the NEPP code. The performance of the linear regression and neural network methods as alternative engine analyzers was found to be satisfactory. This report considers two examples-a supersonic mixed-flow turbofan engine and a subsonic waverotor-topped engine-to illustrate the results, and it discusses insights gained from the improved version of the NEPP code.

  4. Optimization of isotherm models for pesticide sorption on biopolymer-nanoclay composite by error analysis.

    PubMed

    Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M

    2017-04-01

    A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Iterative integral parameter identification of a respiratory mechanics model.

    PubMed

    Schranz, Christoph; Docherty, Paul D; Chiew, Yeong Shiong; Möller, Knut; Chase, J Geoffrey

    2012-07-18

    Patient-specific respiratory mechanics models can support the evaluation of optimal lung protective ventilator settings during ventilation therapy. Clinical application requires that the individual's model parameter values must be identified with information available at the bedside. Multiple linear regression or gradient-based parameter identification methods are highly sensitive to noise and initial parameter estimates. Thus, they are difficult to apply at the bedside to support therapeutic decisions. An iterative integral parameter identification method is applied to a second order respiratory mechanics model. The method is compared to the commonly used regression methods and error-mapping approaches using simulated and clinical data. The clinical potential of the method was evaluated on data from 13 Acute Respiratory Distress Syndrome (ARDS) patients. The iterative integral method converged to error minima 350 times faster than the Simplex Search Method using simulation data sets and 50 times faster using clinical data sets. Established regression methods reported erroneous results due to sensitivity to noise. In contrast, the iterative integral method was effective independent of initial parameter estimations, and converged successfully in each case tested. These investigations reveal that the iterative integral method is beneficial with respect to computing time, operator independence and robustness, and thus applicable at the bedside for this clinical application.

  6. Evaluation of energy consumption during aerobic sewage sludge treatment in dairy wastewater treatment plant.

    PubMed

    Dąbrowski, Wojciech; Żyłka, Radosław; Malinowski, Paweł

    2017-02-01

    The subject of the research conducted in an operating dairy wastewater treatment plant (WWTP) was to examine electric energy consumption during sewage sludge treatment. The excess sewage sludge was aerobically stabilized and dewatered with a screw press. Organic matter varied from 48% to 56% in sludge after stabilization and dewatering. It proves that sludge was properly stabilized and it was possible to apply it as a fertilizer. Measurement factors for electric energy consumption for mechanically dewatered sewage sludge were determined, which ranged between 0.94 and 1.5 kWhm -3 with the average value at 1.17 kWhm -3 . The shares of devices used for sludge dewatering and aerobic stabilization in the total energy consumption of the plant were also established, which were 3% and 25% respectively. A model of energy consumption during sewage sludge treatment was estimated according to experimental data. Two models were applied: linear regression for dewatering process and segmented linear regression for aerobic stabilization. The segmented linear regression model was also applied to total energy consumption during sewage sludge treatment in the examined dairy WWTP. The research constitutes an introduction for further studies on defining a mathematical model used to optimize electric energy consumption by dairy WWTPs. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Regression models evaluating THMs, HAAs and HANs formation upon chloramination of source water collected from Yangtze River Delta Region, China.

    PubMed

    Lin, Jiajia; Chen, Xi; Ansheng, Zhu; Hong, Huachang; Liang, Yan; Sun, Hongjie; Lin, Hongjun; Chen, Jianrong

    2018-09-30

    Present study aimed to generate multiple regression models to estimate the formation of trihalomethanes (THMs), haloacetonitriles (HANs) and haloacetic acids (HAAs) during chloramination of source water obtained from Yangtze River Delta Region, China. The results showed that the regression models for trichloromethane (TCM), dichloroacetonitrile (DCAN), dichloroacetic acid (DCAA), dihaloacetic acids (DHAAs), 5 HAAs species regulated by U.S. EPA (HAA 5 ) and total haloacetic acids (HAA 9 ) have good evaluation ability (prediction accuracy reached 81-94%), while the models for total haloacetonitriles (HAN 4 ), trichloroacetic acid (TCAA), trihaloacetic acids (THAAs) and total trihalomethanes (THM 4 ), they appeared relative low prediction accuracy (58-72%). For THMs, dissolved organic nitrogen (DON) was their key organic precursor, yet for HAN, DHAAs and THAAs, UVA 254 played the dominant role. The other key factors influencing DBP formation included the bromide (THM 4 , DHAAs and HAA 9 ), reaction time (DCAN, HAN 4 ), chloramine dose (TCM, DCAA, TCAA, HAA 5 and THAAs). These results provided important information for water works to optimize the water treatment process to control DBPs, and give an evaluating method for DBPs levels when estimating the health risks related with DBP exposure during chloramination. Copyright © 2018 Elsevier Inc. All rights reserved.

  8. Modelling of salad plants growth and physiological status in vitamin space greenhouse during lighting regime optimization

    NASA Astrophysics Data System (ADS)

    Konovalova, Irina; Berkovich, Yuliy A.; Smolyanina, Svetlana; Erokhin, Alexei; Yakovleva, Olga; Lapach, Sergij; Radchenko, Stanislav; Znamenskii, Artem; Tarakanov, Ivan

    2016-07-01

    The efficiency of the photoautotrophic element as part of bio-engineering life-support systems is determined substantially by lighting regime. The artificial light regime optimization complexity results from the wide range of plant physiological functions controlled by light: trophic, informative, biosynthetical, etc. An average photosynthetic photon flux density (PPFD), light spectral composition and pulsed light effects on the crop growth and plant physiological status were studied in the multivariate experiment, including 16 independent experiments in 3 replicates. Chinese cabbage plants (Brassica chinensis L.), cultivar Vesnianka, were grown during 24 days in a climatic chamber under white and red light-emitting diodes (LEDs): photoperiod 24 h, PPFD from 260 to 500 µM/(m ^{2}*s), red light share in the spectrum varying from 33% to 73%, pulsed (pulse period from 30 to 501 µs) and non-pulsed lighting. The regressions of plant photosynthetic and biochemical indexes as well as the crop specific productivity in response to the selected parameters of lighting regime were calculated. Developed models of crop net photosynthesis and dark respiration revealed the most intense gas exchange area corresponding to PPFD level 450 - 500 µM/(m ^{2}*s) with red light share in the spectrum about 60% and the pulse length 30 µs with a pulse period from 300 to 400 µs. Shoot dry weight increased monotonically in response to the increasing PPFD and changed depending on the pulse period under stabilized PPFD level. An increase in ascorbic acid content in the shoot biomass was revealed when increasing red light share in spectrum from 33% to 73%. The lighting regime optimization criterion (Q) was designed for the vitamin space greenhouse as the maximum of a crop yield square on its ascorbic acid concentration, divided by the light energy consumption. The regression model of optimization criterion was constructed based on the experimental data. The analysis of the model made it possible to determine the optimal lighting regime for the space greenhouse: PPFD level about 430 µ&M/(m ^{2}*s), red light share in the spectrum around 73%, non-pulsed lighting. At the same PPFD, Q-criterion value for the selected lighting regime was 1.5 times higher than under white LEDs, and slightly (about 15%) lower than under high-pressure sodium lamp light.

  9. The effects of optimism and gratitude on adherence, functioning and mental health following an acute coronary syndrome.

    PubMed

    Millstein, Rachel A; Celano, Christopher M; Beale, Eleanor E; Beach, Scott R; Suarez, Laura; Belcher, Arianna M; Januzzi, James L; Huffman, Jeff C

    This study examined the effects of optimism and gratitude on self-reported health behavior adherence, physical functioning and emotional well-being after an acute coronary syndrome (ACS). Among 156 patients, we examined associations between optimism and gratitude measured 2 weeks post-ACS and 6-month outcomes: adherence to medical recommendations, mental and physical health-related quality of life (HRQoL), physical functioning, depressive symptoms and anxiety. Multivariable linear regression models were used, controlling for increasing levels of adjustment. Optimism [β=.11, standard error (S.E.)=.05, P=.038] and gratitude (β=.10, S.E.=.05, P=.027) at 2 weeks were associated with subsequent self-reported adherence to medical recommendations (diet, exercise, medication adherence, stress reduction) at 6 months in fully adjusted models. Two-week optimism and gratitude were associated with improvements in mental HRQoL (optimism: β=.44, S.E.=.13, P=.001; gratitude: β=.33, S.E.=.12, P=.005) and reductions in symptoms of depression (optimism: β=-.11, S.E.=.05, P=.039; gratitude: β=-.10, S.E.=.05, P=.028) and anxiety (optimism: β=-.15, S.E.=.05, P=.004; gratitude: β=-.10, S.E.=.05, P=.034) at 6 months. Optimism and gratitude at 2 weeks post-ACS were associated with higher self-reported adherence and improved emotional well-being 6 months later, independent of negative emotional states. Optimism and gratitude may help recovery from an ACS. Interventions promoting these positive constructs could help improve adherence and well-being. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Using modified fruit fly optimisation algorithm to perform the function test and case studies

    NASA Astrophysics Data System (ADS)

    Pan, Wen-Tsao

    2013-06-01

    Evolutionary computation is a computing mode established by practically simulating natural evolutionary processes based on the concept of Darwinian Theory, and it is a common research method. The main contribution of this paper was to reinforce the function of searching for the optimised solution using the fruit fly optimization algorithm (FOA), in order to avoid the acquisition of local extremum solutions. The evolutionary computation has grown to include the concepts of animal foraging behaviour and group behaviour. This study discussed three common evolutionary computation methods and compared them with the modified fruit fly optimization algorithm (MFOA). It further investigated the ability of the three mathematical functions in computing extreme values, as well as the algorithm execution speed and the forecast ability of the forecasting model built using the optimised general regression neural network (GRNN) parameters. The findings indicated that there was no obvious difference between particle swarm optimization and the MFOA in regards to the ability to compute extreme values; however, they were both better than the artificial fish swarm algorithm and FOA. In addition, the MFOA performed better than the particle swarm optimization in regards to the algorithm execution speed, and the forecast ability of the forecasting model built using the MFOA's GRNN parameters was better than that of the other three forecasting models.

  11. Selective principal component regression analysis of fluorescence hyperspectral image to assess aflatoxin contamination in corn

    USDA-ARS?s Scientific Manuscript database

    Selective principal component regression analysis (SPCR) uses a subset of the original image bands for principal component transformation and regression. For optimal band selection before the transformation, this paper used genetic algorithms (GA). In this case, the GA process used the regression co...

  12. Application of Differential Evolutionary Optimization Methodology for Parameter Structure Identification in Groundwater Modeling

    NASA Astrophysics Data System (ADS)

    Chiu, Y.; Nishikawa, T.

    2013-12-01

    With the increasing complexity of parameter-structure identification (PSI) in groundwater modeling, there is a need for robust, fast, and accurate optimizers in the groundwater-hydrology field. For this work, PSI is defined as identifying parameter dimension, structure, and value. In this study, Voronoi tessellation and differential evolution (DE) are used to solve the optimal PSI problem. Voronoi tessellation is used for automatic parameterization, whereby stepwise regression and the error covariance matrix are used to determine the optimal parameter dimension. DE is a novel global optimizer that can be used to solve nonlinear, nondifferentiable, and multimodal optimization problems. It can be viewed as an improved version of genetic algorithms and employs a simple cycle of mutation, crossover, and selection operations. DE is used to estimate the optimal parameter structure and its associated values. A synthetic numerical experiment of continuous hydraulic conductivity distribution was conducted to demonstrate the proposed methodology. The results indicate that DE can identify the global optimum effectively and efficiently. A sensitivity analysis of the control parameters (i.e., the population size, mutation scaling factor, crossover rate, and mutation schemes) was performed to examine their influence on the objective function. The proposed DE was then applied to solve a complex parameter-estimation problem for a small desert groundwater basin in Southern California. Hydraulic conductivity, specific yield, specific storage, fault conductance, and recharge components were estimated simultaneously. Comparison of DE and a traditional gradient-based approach (PEST) shows DE to be more robust and efficient. The results of this work not only provide an alternative for PSI in groundwater models, but also extend DE applications towards solving complex, regional-scale water management optimization problems.

  13. Optimization of bio-ethanol autothermal reforming and carbon monoxide removal processes

    NASA Astrophysics Data System (ADS)

    Markova, D.; Bazbauers, G.; Valters, K.; Alhucema Arias, R.; Weuffen, C.; Rochlitz, L.

    Experimental investigation of bio-ethanol autothermal reforming (ATR) and water-gas shift (WGS) processes for hydrogen production and regression analysis of the data is performed in the study. The main goal was to obtain regression relations between the most critical dependent variables such as hydrogen, carbon monoxide and methane content in the reformate gas and independent factors such as air-to-fuel ratio (λ), steam-to-carbon ratio (S/C), inlet temperature of reactants into reforming process (T ATRin), pressure (p) and temperature (T ATR) in the ATR reactor from the experimental data. Purpose of the regression models is to provide optimum values of the process factors that give the maximum amount of hydrogen. The experimental ATR system consisted of an evaporator, an ATR reactor and a one-stage WGS reactor. Empirical relations between hydrogen, carbon monoxide, methane content and the controlling parameters downstream of the ATR reactor are shown in the work. The optimization results show that within the considered range of the process factors the maximum hydrogen concentration of 42 dry vol. % and yield of 3.8 mol mol -1 of ethanol downstream of the ATR reactor can be achieved at S/C = 2.5, λ = 0.20-0.23, p = 0.4 bar, T ATRin = 230 °C, T ATR = 640 °C.

  14. HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION

    PubMed Central

    Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong

    2015-01-01

    In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645

  15. PM(10) emission forecasting using artificial neural networks and genetic algorithm input variable optimization.

    PubMed

    Antanasijević, Davor Z; Pocajt, Viktor V; Povrenović, Dragan S; Ristić, Mirjana Đ; Perić-Grujić, Aleksandra A

    2013-01-15

    This paper describes the development of an artificial neural network (ANN) model for the forecasting of annual PM(10) emissions at the national level, using widely available sustainability and economical/industrial parameters as inputs. The inputs for the model were selected and optimized using a genetic algorithm and the ANN was trained using the following variables: gross domestic product, gross inland energy consumption, incineration of wood, motorization rate, production of paper and paperboard, sawn wood production, production of refined copper, production of aluminum, production of pig iron and production of crude steel. The wide availability of the input parameters used in this model can overcome a lack of data and basic environmental indicators in many countries, which can prevent or seriously impede PM emission forecasting. The model was trained and validated with the data for 26 EU countries for the period from 1999 to 2006. PM(10) emission data, collected through the Convention on Long-range Transboundary Air Pollution - CLRTAP and the EMEP Programme or as emission estimations by the Regional Air Pollution Information and Simulation (RAINS) model, were obtained from Eurostat. The ANN model has shown very good performance and demonstrated that the forecast of PM(10) emission up to two years can be made successfully and accurately. The mean absolute error for two-year PM(10) emission prediction was only 10%, which is more than three times better than the predictions obtained from the conventional multi-linear regression and principal component regression models that were trained and tested using the same datasets and input variables. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Functional Data Analysis Applied to Modeling of Severe Acute Mucositis and Dysphagia Resulting From Head and Neck Radiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dean, Jamie A., E-mail: jamie.dean@icr.ac.uk; Wong, Kee H.; Gay, Hiram

    Purpose: Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue–sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. Methods and Materials: FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogrammore » data. The reduced dose data were input into functional logistic regression models (functional partial least squares–logistic regression [FPLS-LR] and functional principal component–logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate–response associations, assessed using bootstrapping. Results: The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/−0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/−0.96, 0.79/−0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. Conclusions: FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling.« less

  17. Functional Data Analysis Applied to Modeling of Severe Acute Mucositis and Dysphagia Resulting From Head and Neck Radiation Therapy.

    PubMed

    Dean, Jamie A; Wong, Kee H; Gay, Hiram; Welsh, Liam C; Jones, Ann-Britt; Schick, Ulrike; Oh, Jung Hun; Apte, Aditya; Newbold, Kate L; Bhide, Shreerang A; Harrington, Kevin J; Deasy, Joseph O; Nutting, Christopher M; Gulliford, Sarah L

    2016-11-15

    Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue-sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogram data. The reduced dose data were input into functional logistic regression models (functional partial least squares-logistic regression [FPLS-LR] and functional principal component-logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate-response associations, assessed using bootstrapping. The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/-0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/-0.96, 0.79/-0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  18. Mixed oxidizer hybrid propulsion system optimization under uncertainty using applied response surface methodology and Monte Carlo simulation

    NASA Astrophysics Data System (ADS)

    Whitehead, James Joshua

    The analysis documented herein provides an integrated approach for the conduct of optimization under uncertainty (OUU) using Monte Carlo Simulation (MCS) techniques coupled with response surface-based methods for characterization of mixture-dependent variables. This novel methodology provides an innovative means of conducting optimization studies under uncertainty in propulsion system design. Analytic inputs are based upon empirical regression rate information obtained from design of experiments (DOE) mixture studies utilizing a mixed oxidizer hybrid rocket concept. Hybrid fuel regression rate was selected as the target response variable for optimization under uncertainty, with maximization of regression rate chosen as the driving objective. Characteristic operational conditions and propellant mixture compositions from experimental efforts conducted during previous foundational work were combined with elemental uncertainty estimates as input variables. Response surfaces for mixture-dependent variables and their associated uncertainty levels were developed using quadratic response equations incorporating single and two-factor interactions. These analysis inputs, response surface equations and associated uncertainty contributions were applied to a probabilistic MCS to develop dispersed regression rates as a function of operational and mixture input conditions within design space. Illustrative case scenarios were developed and assessed using this analytic approach including fully and partially constrained operational condition sets over all of design mixture space. In addition, optimization sets were performed across an operationally representative region in operational space and across all investigated mixture combinations. These scenarios were selected as representative examples relevant to propulsion system optimization, particularly for hybrid and solid rocket platforms. Ternary diagrams, including contour and surface plots, were developed and utilized to aid in visualization. The concept of Expanded-Durov diagrams was also adopted and adapted to this study to aid in visualization of uncertainty bounds. Regions of maximum regression rate and associated uncertainties were determined for each set of case scenarios. Application of response surface methodology coupled with probabilistic-based MCS allowed for flexible and comprehensive interrogation of mixture and operating design space during optimization cases. Analyses were also conducted to assess sensitivity of uncertainty to variations in key elemental uncertainty estimates. The methodology developed during this research provides an innovative optimization tool for future propulsion design efforts.

  19. [Variable selection methods combined with local linear embedding theory used for optimization of near infrared spectral quantitative models].

    PubMed

    Hao, Yong; Sun, Xu-Dong; Yang, Qiang

    2012-12-01

    Variables selection strategy combined with local linear embedding (LLE) was introduced for the analysis of complex samples by using near infrared spectroscopy (NIRS). Three methods include Monte Carlo uninformation variable elimination (MCUVE), successive projections algorithm (SPA) and MCUVE connected with SPA were used for eliminating redundancy spectral variables. Partial least squares regression (PLSR) and LLE-PLSR were used for modeling complex samples. The results shown that MCUVE can both extract effective informative variables and improve the precision of models. Compared with PLSR models, LLE-PLSR models can achieve more accurate analysis results. MCUVE combined with LLE-PLSR is an effective modeling method for NIRS quantitative analysis.

  20. Perceptual control models of pursuit manual tracking demonstrate individual specificity and parameter consistency.

    PubMed

    Parker, Maximilian G; Tyson, Sarah F; Weightman, Andrew P; Abbott, Bruce; Emsley, Richard; Mansell, Warren

    2017-11-01

    Computational models that simulate individuals' movements in pursuit-tracking tasks have been used to elucidate mechanisms of human motor control. Whilst there is evidence that individuals demonstrate idiosyncratic control-tracking strategies, it remains unclear whether models can be sensitive to these idiosyncrasies. Perceptual control theory (PCT) provides a unique model architecture with an internally set reference value parameter, and can be optimized to fit an individual's tracking behavior. The current study investigated whether PCT models could show temporal stability and individual specificity over time. Twenty adults completed three blocks of 15 1-min, pursuit-tracking trials. Two blocks (training and post-training) were completed in one session and the third was completed after 1 week (follow-up). The target moved in a one-dimensional, pseudorandom pattern. PCT models were optimized to the training data using a least-mean-squares algorithm, and validated with data from post-training and follow-up. We found significant inter-individual variability (partial η 2 : .464-.697) and intra-individual consistency (Cronbach's α: .880-.976) in parameter estimates. Polynomial regression revealed that all model parameters, including the reference value parameter, contribute to simulation accuracy. Participants' tracking performances were significantly more accurately simulated by models developed from their own tracking data than by models developed from other participants' data. We conclude that PCT models can be optimized to simulate the performance of an individual and that the test-retest reliability of individual models is a necessary criterion for evaluating computational models of human performance.

  1. [Optimization of diagnosis indicator selection and inspection plan by 3.0T MRI in breast cancer].

    PubMed

    Jiang, Zhongbiao; Wang, Yunhua; He, Zhong; Zhang, Lejun; Zheng, Kai

    2013-08-01

    To optimize 3.0T MRI diagnosis indicator in breast cancer and to select the best MRI scan program. Totally 45 patients with breast cancers were collected, and another 35 patients with benign breast tumor served as the control group. All patients underwent 3.0T MRI, including T1- weighted imaging (T1WI), fat suppression of the T2-weighted imaging (T2WI), diffusion weighted imaging (DWI), 1H magnetic resonance spectroscopy (1H-MRS) and dynamic contrast enhanced (DCE) sequence. With operation pathology results as the gold standard in the diagnosis of breast diseases, the pathological results of benign and malignant served as dependent variables, and the diagnostic indicators of MRI were taken as independent variables. We put all the indicators of MRI examination under Logistic regression analysis, established the Logistic model, and optimized the diagnosis indicators of MRI examination to further improve MRI scan of breast cancer. By Logistic regression analysis, some indicators were selected in the equation, including the edge feature of the tumor, the time-signal intensity curve (TIC) type and the apparent diffusion coefficient (ADC) value when b=500 s/mm2. The regression equation was Logit (P)=-21.936+20.478X6+3.267X7+ 21.488X3. Valuable indicators in the diagnosis of breast cancer are the edge feature of the tumor, the TIC type and the ADC value when b=500 s/mm2. Combining conventional MRI scan, DWI and dynamic enhanced MRI is a better examination program, while MRS is the complementary program when diagnosis is difficult.

  2. Modeling the compliance of polyurethane nanofiber tubes for artificial common bile duct

    NASA Astrophysics Data System (ADS)

    Moazeni, Najmeh; Vadood, Morteza; Semnani, Dariush; Hasani, Hossein

    2018-02-01

    The common bile duct is one of the body’s most sensitive organs and a polyurethane nanofiber tube can be used as a prosthetic of the common bile duct. The compliance is one of the most important properties of prosthetic which should be adequately compliant as long as possible to keep the behavioral integrity of prosthetic. In the present paper, the prosthetic compliance was measured and modeled using regression method and artificial neural network (ANN) based on the electrospinning process parameters such as polymer concentration, voltage, tip-to-collector distance and flow rate. Whereas, the ANN model contains different parameters affecting on the prediction accuracy directly, the genetic algorithm (GA) was used to optimize the ANN parameters. Finally, it was observed that the optimized ANN model by GA can predict the compliance with high accuracy (mean absolute percentage error = 8.57%). Moreover, the contribution of variables on the compliance was investigated through relative importance analysis and the optimum values of parameters for ideal compliance were determined.

  3. Estimating leaf nitrogen accumulation in maize based on canopy hyperspectrum data

    NASA Astrophysics Data System (ADS)

    Gu, Xiaohe; Wang, Lizhi; Song, Xiaoyu; Xu, Xingang

    2016-10-01

    Leaf nitrogen accumulation (LNA) has important influence on the formation of crop yield and grain protein. Monitoring leaf nitrogen accumulation of crop canopy quantitively and real-timely is helpful for mastering crop nutrition status, diagnosing group growth and managing fertilization precisely. The study aimed to develop a universal method to monitor LNA of maize by hyperspectrum data, which could provide mechanism support for mapping LNA of maize at county scale. The correlations between LNA and hyperspectrum reflectivity and its mathematical transformations were analyzed. Then the feature bands and its transformations were screened to develop the optimal model of estimating LNA based on multiple linear regression method. The in-situ samples were used to evaluate the accuracy of the estimating model. Results showed that the estimating model with one differential logarithmic transformation (lgP') of reflectivity could reach highest correlation coefficient (0.889) with lowest RMSE (0.646 g·m-2), which was considered as the optimal model for estimating LNA in maize. The determination coefficient (R2) of testing samples was 0.831, while the RMSE was 1.901 g·m-2. It indicated that the one differential logarithmic transformation of hyperspectrum had good response with LNA of maize. Based on this transformation, the optimal estimating model of LNA could reach good accuracy with high stability.

  4. A Computational approach in optimizing process parameters of GTAW for SA 106 Grade B steel pipes using Response surface methodology

    NASA Astrophysics Data System (ADS)

    Sumesh, A.; Sai Ramnadh, L. V.; Manish, P.; Harnath, V.; Lakshman, V.

    2016-09-01

    Welding is one of the most common metal joining techniques used in industry for decades. As in the global manufacturing scenario the products should be more cost effective. Therefore the selection of right process with optimal parameters will help the industry in minimizing their cost of production. SA 106 Grade B steel has a wide application in Automobile chassis structure, Boiler tubes and pressure vessels industries. Employing central composite design the process parameters for Gas Tungsten Arc Welding was optimized. The input parameters chosen were weld current, peak current and frequency. The joint tensile strength was the response considered in this study. Analysis of variance was performed to determine the statistical significance of the parameters and a Regression analysis was performed to determine the effect of input parameters over the response. From the experiment the maximum tensile strength obtained was 95 KN reported for a weld current of 95 Amp, frequency of 50 Hz and peak current of 100 Amp. With an aim of maximizing the joint strength using Response optimizer a target value of 100 KN is selected and regression models were optimized. The output results are achievable with a Weld current of 62.6148 Amp, Frequency of 23.1821 Hz, and Peak current of 65.9104 Amp. Using Die penetration test the weld joints were also classified in to 2 categories as good weld and weld with defect. This will also help in getting a defect free joint when welding is performed using GTAW process.

  5. Socioeconomic differences in adolescent stress: the role of psychological resources.

    PubMed

    Finkelstein, Daniel M; Kubzansky, Laura D; Capitman, John; Goodman, Elizabeth

    2007-02-01

    To investigate whether psychological resources influenced the association between parent education (PE), a marker of socioeconomic status (SES), and perceived stress. Cross-sectional analyses were conducted in a sample of 1167 non-Hispanic black and white junior and senior high school students from a Midwestern public school district in 2002-2003. Hierarchical multivariable regression analyses examined relationships between PE (high school graduate or less = E1, > high school, < college = E2, college graduate = E3, and professional degree = E4), and psychological resources (optimism and coping style) on teens' perceived stress. Greater optimism and adaptive coping were hypothesized to influence (i.e., mediate or moderate) the relationship between higher PE and lower stress. Relative to adolescents from families with a professionally educated parent, adolescents with lower parent education had higher perceived stress (E3 beta = 1.70, p < .01, E2 beta = 1.94, p < .01, E1 beta = 3.19, p < .0001). Both psychological resources were associated with stress: higher optimism (beta = -.58, p < .0001) and engagement coping (beta = -.19, p < .0001) were associated with less stress and higher disengagement coping was associated with more stress (beta = .09, p < .01). Adding optimism to the regression model attenuated the effect of SES by nearly 30%, suggesting that optimism partially mediates the inverse SES-stress relationship. Mediation was confirmed using a Sobel test (p < .01). Adolescents from families with lower parent education are less optimistic than teens from more educated families. This pessimism may be a mechanism through which lower SES increases stress in adolescence.

  6. Linear regression models for solvent accessibility prediction in proteins.

    PubMed

    Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław

    2005-04-01

    The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.

  7. Optimal Wavelength Selection on Hyperspectral Data with Fused Lasso for Biomass Estimation of Tropical Rain Forest

    NASA Astrophysics Data System (ADS)

    Takayama, T.; Iwasaki, A.

    2016-06-01

    Above-ground biomass prediction of tropical rain forest using remote sensing data is of paramount importance to continuous large-area forest monitoring. Hyperspectral data can provide rich spectral information for the biomass prediction; however, the prediction accuracy is affected by a small-sample-size problem, which widely exists as overfitting in using high dimensional data where the number of training samples is smaller than the dimensionality of the samples due to limitation of require time, cost, and human resources for field surveys. A common approach to addressing this problem is reducing the dimensionality of dataset. Also, acquired hyperspectral data usually have low signal-to-noise ratio due to a narrow bandwidth and local or global shifts of peaks due to instrumental instability or small differences in considering practical measurement conditions. In this work, we propose a methodology based on fused lasso regression that select optimal bands for the biomass prediction model with encouraging sparsity and grouping, which solves the small-sample-size problem by the dimensionality reduction from the sparsity and the noise and peak shift problem by the grouping. The prediction model provided higher accuracy with root-mean-square error (RMSE) of 66.16 t/ha in the cross-validation than other methods; multiple linear analysis, partial least squares regression, and lasso regression. Furthermore, fusion of spectral and spatial information derived from texture index increased the prediction accuracy with RMSE of 62.62 t/ha. This analysis proves efficiency of fused lasso and image texture in biomass estimation of tropical forests.

  8. VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA

    PubMed Central

    Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu

    2009-01-01

    We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology. PMID:20336190

  9. A new epidemic modeling approach: Multi-regions discrete-time model with travel-blocking vicinity optimal control strategy.

    PubMed

    Zakary, Omar; Rachik, Mostafa; Elmouki, Ilias

    2017-08-01

    First, we devise in this paper, a multi-regions discrete-time model which describes the spatial-temporal spread of an epidemic which starts from one region and enters to regions which are connected with their neighbors by any kind of anthropological movement. We suppose homogeneous Susceptible-Infected-Removed (SIR) populations, and we consider in our simulations, a grid of colored cells, which represents the whole domain affected by the epidemic while each cell can represent a sub-domain or region. Second, in order to minimize the number of infected individuals in one region, we propose an optimal control approach based on a travel-blocking vicinity strategy which aims to control only one cell by restricting movements of infected people coming from all neighboring cells. Thus, we show the influence of the optimal control approach on the controlled cell. We should also note that the cellular modeling approach we propose here, can also describes infection dynamics of regions which are not necessarily attached one to an other, even if no empty space can be viewed between cells. The theoretical method we follow for the characterization of the travel-locking optimal controls, is based on a discrete version of Pontryagin's maximum principle while the numerical approach applied to the multi-points boundary value problems we obtain here, is based on discrete progressive-regressive iterative schemes. We illustrate our modeling and control approaches by giving an example of 100 regions.

  10. Retro-regression--another important multivariate regression improvement.

    PubMed

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  11. Geodesic regression for image time-series.

    PubMed

    Niethammer, Marc; Huang, Yang; Vialard, François-Xavier

    2011-01-01

    Registration of image-time series has so far been accomplished (i) by concatenating registrations between image pairs, (ii) by solving a joint estimation problem resulting in piecewise geodesic paths between image pairs, (iii) by kernel based local averaging or (iv) by augmenting the joint estimation with additional temporal irregularity penalties. Here, we propose a generative model extending least squares linear regression to the space of images by using a second-order dynamic formulation for image registration. Unlike previous approaches, the formulation allows for a compact representation of an approximation to the full spatio-temporal trajectory through its initial values. The method also opens up possibilities to design image-based approximation algorithms. The resulting optimization problem is solved using an adjoint method.

  12. Application of response surface methodology to optimize pressurized liquid extraction of antioxidant compounds from sage (Salvia officinalis L.), basil (Ocimum basilicum L.) and thyme (Thymus vulgaris L.).

    PubMed

    Hossain, M B; Brunton, N P; Martin-Diana, A B; Barry-Ryan, C

    2010-12-01

    The present study optimized pressurized liquid extraction (PLE) conditions using Dionex ASE® 200, USA to maximize the antioxidant activity [Ferric ion Reducing Antioxidant Power (FRAP)] and total polyphenol content (TP) of the extracts from three spices of Lamiaceae family (sage, basil and thyme). Optimal conditions with regard to extraction temperature (66-129 °C) and solvent concentration (32-88% methanol) were identified using response surface methodology (RSM). For all three spices, results showed that 129 °C was the optimum temperature with regard to antioxidant activity. Optimal methanol concentrations with respect to the antioxidant activity of sage and basil extracts were 58% and 60% respectively. Thyme showed a different trend with regard to methanol concentration and was optimally extracted at 33%. Antioxidant activity yields of the optimal PLE were significantly (p < 0.05) higher than solid/liquid extracts. Predicted models were highly significant (p < 0.05) for both total phenol (TP) and FRAP values in all the spices with high regression coefficients (R(2)) ranging from 0.651 to 0.999.

  13. Downscaling soil moisture over East Asia through multi-sensor data fusion and optimization of regression trees

    NASA Astrophysics Data System (ADS)

    Park, Seonyoung; Im, Jungho; Park, Sumin; Rhee, Jinyoung

    2017-04-01

    Soil moisture is one of the most important keys for understanding regional and global climate systems. Soil moisture is directly related to agricultural processes as well as hydrological processes because soil moisture highly influences vegetation growth and determines water supply in the agroecosystem. Accurate monitoring of the spatiotemporal pattern of soil moisture is important. Soil moisture has been generally provided through in situ measurements at stations. Although field survey from in situ measurements provides accurate soil moisture with high temporal resolution, it requires high cost and does not provide the spatial distribution of soil moisture over large areas. Microwave satellite (e.g., advanced Microwave Scanning Radiometer on the Earth Observing System (AMSR2), the Advanced Scatterometer (ASCAT), and Soil Moisture Active Passive (SMAP)) -based approaches and numerical models such as Global Land Data Assimilation System (GLDAS) and Modern- Era Retrospective Analysis for Research and Applications (MERRA) provide spatial-temporalspatiotemporally continuous soil moisture products at global scale. However, since those global soil moisture products have coarse spatial resolution ( 25-40 km), their applications for agriculture and water resources at local and regional scales are very limited. Thus, soil moisture downscaling is needed to overcome the limitation of the spatial resolution of soil moisture products. In this study, GLDAS soil moisture data were downscaled up to 1 km spatial resolution through the integration of AMSR2 and ASCAT soil moisture data, Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), and Moderate Resolution Imaging Spectroradiometer (MODIS) data—Land Surface Temperature, Normalized Difference Vegetation Index, and Land cover—using modified regression trees over East Asia from 2013 to 2015. Modified regression trees were implemented using Cubist, a commercial software tool based on machine learning. An optimization based on pruning of rules derived from the modified regression trees was conducted. Root Mean Square Error (RMSE) and Correlation coefficients (r) were used to optimize the rules, and finally 59 rules from modified regression trees were produced. The results show high validation r (0.79) and low validation RMSE (0.0556m3/m3). The 1 km downscaled soil moisture was evaluated using ground soil moisture data at 14 stations, and both soil moisture data showed similar temporal patterns (average r=0.51 and average RMSE=0.041). The spatial distribution of the 1 km downscaled soil moisture well corresponded with GLDAS soil moisture that caught both extremely dry and wet regions. Correlation between GLDAS and the 1 km downscaled soil moisture during growing season was positive (mean r=0.35) in most regions.

  14. Development of brief versions of the Wechsler Intelligence Scale for schizophrenia: considerations of the structure and predictability of intelligence.

    PubMed

    Sumiyoshi, Chika; Uetsuki, Miki; Suga, Motomu; Kasai, Kiyoto; Sumiyoshi, Tomiki

    2013-12-30

    Short forms (SF) of the Wechsler Intelligence Scale have been developed to enhance its practicality. However, only a few studies have addressed the Wechsler Intelligence Scale Revised (WAIS-R) SFs based on data from patients with schizophrenia. The current study was conducted to develop the WAIS-R SFs for these patients based on the intelligence structure and predictability of the Full IQ (FIQ). Relations to demographic and clinical variables were also examined on selecting plausible subtests. The WAIS-R was administered to 90 Japanese patients with schizophrenia. Exploratory factor analysis (EFA) and multiple regression analysis were conducted to find potential subtests. EFA extracted two dominant factors corresponding to Verbal IQ and Performance IQ measures. Subtests with higher factor loadings on those factors were initially nominated. Regression analysis was carried out to reach the model containing all the nominated subtests. The optimality of the potential subtests included in that model was evaluated from the perspectives of the representativeness of intelligence structure, FIQ predictability, and the relation with demographic and clinical variables. Taken together, the dyad of Vocabulary and Block Design was considered to be the most optimal WAIS-R SF for patients with schizophrenia, reflecting both intelligence structure and FIQ predictability. © 2013 Elsevier Ireland Ltd. All rights reserved.

  15. Groundwater depth prediction in a shallow aquifer in north China by a quantile regression model

    NASA Astrophysics Data System (ADS)

    Li, Fawen; Wei, Wan; Zhao, Yong; Qiao, Jiale

    2017-01-01

    There is a close relationship between groundwater level in a shallow aquifer and the surface ecological environment; hence, it is important to accurately simulate and predict the groundwater level in eco-environmental construction projects. The multiple linear regression (MLR) model is one of the most useful methods to predict groundwater level (depth); however, the predicted values by this model only reflect the mean distribution of the observations and cannot effectively fit the extreme distribution data (outliers). The study reported here builds a prediction model of groundwater-depth dynamics in a shallow aquifer using the quantile regression (QR) method on the basis of the observed data of groundwater depth and related factors. The proposed approach was applied to five sites in Tianjin city, north China, and the groundwater depth was calculated in different quantiles, from which the optimal quantile was screened out according to the box plot method and compared to the values predicted by the MLR model. The results showed that the related factors in the five sites did not follow the standard normal distribution and that there were outliers in the precipitation and last-month (initial state) groundwater-depth factors because the basic assumptions of the MLR model could not be achieved, thereby causing errors. Nevertheless, these conditions had no effect on the QR model, as it could more effectively describe the distribution of original data and had a higher precision in fitting the outliers.

  16. Oil Formation Volume Factor Determination Through a Fused Intelligence

    NASA Astrophysics Data System (ADS)

    Gholami, Amin

    2016-12-01

    Volume change of oil between reservoir condition and standard surface condition is called oil formation volume factor (FVF), which is very time, cost and labor intensive to determine. This study proposes an accurate, rapid and cost-effective approach for determining FVF from reservoir temperature, dissolved gas oil ratio, and specific gravity of both oil and dissolved gas. Firstly, structural risk minimization (SRM) principle of support vector regression (SVR) was employed to construct a robust model for estimating FVF from the aforementioned inputs. Subsequently, an alternating conditional expectation (ACE) was used for approximating optimal transformations of input/output data to a higher correlated data and consequently developing a sophisticated model between transformed data. Eventually, a committee machine with SVR and ACE was constructed through the use of hybrid genetic algorithm-pattern search (GA-PS). Committee machine integrates ACE and SVR models in an optimal linear combination such that makes benefit of both methods. A group of 342 data points was used for model development and a group of 219 data points was used for blind testing the constructed model. Results indicated that the committee machine performed better than individual models.

  17. Effects of homogenization process parameters on physicochemical properties of astaxanthin nanodispersions prepared using a solvent-diffusion technique

    PubMed Central

    Anarjan, Navideh; Jafarizadeh-Malmiri, Hoda; Nehdi, Imededdine Arbi; Sbihi, Hassen Mohamed; Al-Resayes, Saud Ibrahim; Tan, Chin Ping

    2015-01-01

    Nanodispersion systems allow incorporation of lipophilic bioactives, such as astaxanthin (a fat soluble carotenoid) into aqueous systems, which can improve their solubility, bioavailability, and stability, and widen their uses in water-based pharmaceutical and food products. In this study, response surface methodology was used to investigate the influences of homogenization time (0.5–20 minutes) and speed (1,000–9,000 rpm) in the formation of astaxanthin nanodispersions via the solvent-diffusion process. The product was characterized for particle size and astaxanthin concentration using laser diffraction particle size analysis and high performance liquid chromatography, respectively. Relatively high determination coefficients (ranging from 0.896 to 0.969) were obtained for all suggested polynomial regression models. The overall optimal homogenization conditions were determined by multiple response optimization analysis to be 6,000 rpm for 7 minutes. In vitro cellular uptake of astaxanthin from the suggested individual and multiple optimized astaxanthin nanodispersions was also evaluated. The cellular uptake of astaxanthin was found to be considerably increased (by more than five times) as it became incorporated into optimum nanodispersion systems. The lack of a significant difference between predicted and experimental values confirms the suitability of the regression equations connecting the response variables studied to the independent parameters. PMID:25709435

  18. Optimal design and experimental analyses of a new micro-vibration control payload-platform

    NASA Astrophysics Data System (ADS)

    Sun, Xiaoqing; Yang, Bintang; Zhao, Long; Sun, Xiaofen

    2016-07-01

    This paper presents a new payload-platform, for precision devices, which possesses the capability of isolating the complex space micro-vibration in low frequency range below 5 Hz. The novel payload-platform equipped with smart material actuators is investigated and designed through optimization strategy based on the minimum energy loss rate, for the aim of achieving high drive efficiency and reducing the effect of the magnetic circuit nonlinearity. Then, the dynamic model of the driving element is established by using the Lagrange method and the performance of the designed payload-platform is further discussed through the combination of the controlled auto regressive moving average (CARMA) model with modified generalized prediction control (MGPC) algorithm. Finally, an experimental prototype is developed and tested. The experimental results demonstrate that the payload-platform has an impressive potential of micro-vibration isolation.

  19. Correcting for Blood Arrival Time in Global Mean Regression Enhances Functional Connectivity Analysis of Resting State fMRI-BOLD Signals.

    PubMed

    Erdoğan, Sinem B; Tong, Yunjie; Hocke, Lia M; Lindsey, Kimberly P; deB Frederick, Blaise

    2016-01-01

    Resting state functional connectivity analysis is a widely used method for mapping intrinsic functional organization of the brain. Global signal regression (GSR) is commonly employed for removing systemic global variance from resting state BOLD-fMRI data; however, recent studies have demonstrated that GSR may introduce spurious negative correlations within and between functional networks, calling into question the meaning of anticorrelations reported between some networks. In the present study, we propose that global signal from resting state fMRI is composed primarily of systemic low frequency oscillations (sLFOs) that propagate with cerebral blood circulation throughout the brain. We introduce a novel systemic noise removal strategy for resting state fMRI data, "dynamic global signal regression" (dGSR), which applies a voxel-specific optimal time delay to the global signal prior to regression from voxel-wise time series. We test our hypothesis on two functional systems that are suggested to be intrinsically organized into anticorrelated networks: the default mode network (DMN) and task positive network (TPN). We evaluate the efficacy of dGSR and compare its performance with the conventional "static" global regression (sGSR) method in terms of (i) explaining systemic variance in the data and (ii) enhancing specificity and sensitivity of functional connectivity measures. dGSR increases the amount of BOLD signal variance being modeled and removed relative to sGSR while reducing spurious negative correlations introduced in reference regions by sGSR, and attenuating inflated positive connectivity measures. We conclude that incorporating time delay information for sLFOs into global noise removal strategies is of crucial importance for optimal noise removal from resting state functional connectivity maps.

  20. A hybrid artificial neural network and particle swarm optimization for prediction of removal of hazardous dye brilliant green from aqueous solution using zinc sulfide nanoparticle loaded on activated carbon.

    PubMed

    Ghaedi, M; Ansari, A; Bahari, F; Ghaedi, A M; Vafaei, A

    2015-02-25

    In the present study, zinc sulfide nanoparticle loaded on activated carbon (ZnS-NP-AC) simply was synthesized in the presence of ultrasound and characterized using different techniques such as SEM and BET analysis. Then, this material was used for brilliant green (BG) removal. To dependency of BG removal percentage toward various parameters including pH, adsorbent dosage, initial dye concentration and contact time were examined and optimized. The mechanism and rate of adsorption was ascertained by analyzing experimental data at various time to conventional kinetic models such as pseudo-first-order and second order, Elovich and intra-particle diffusion models. Comparison according to general criterion such as relative error in adsorption capacity and correlation coefficient confirm the usability of pseudo-second-order kinetic model for explanation of data. The Langmuir models is efficiently can explained the behavior of adsorption system to give full information about interaction of BG with ZnS-NP-AC. A multiple linear regression (MLR) and a hybrid of artificial neural network and partial swarm optimization (ANN-PSO) model were used for prediction of brilliant green adsorption onto ZnS-NP-AC. Comparison of the results obtained using offered models confirm higher ability of ANN model compare to the MLR model for prediction of BG adsorption onto ZnS-NP-AC. Using the optimal ANN-PSO model the coefficient of determination (R(2)) were 0.9610 and 0.9506; mean squared error (MSE) values were 0.0020 and 0.0022 for the training and testing data set, respectively. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. A hybrid artificial neural network and particle swarm optimization for prediction of removal of hazardous dye brilliant green from aqueous solution using zinc sulfide nanoparticle loaded on activated carbon

    NASA Astrophysics Data System (ADS)

    Ghaedi, M.; Ansari, A.; Bahari, F.; Ghaedi, A. M.; Vafaei, A.

    2015-02-01

    In the present study, zinc sulfide nanoparticle loaded on activated carbon (ZnS-NP-AC) simply was synthesized in the presence of ultrasound and characterized using different techniques such as SEM and BET analysis. Then, this material was used for brilliant green (BG) removal. To dependency of BG removal percentage toward various parameters including pH, adsorbent dosage, initial dye concentration and contact time were examined and optimized. The mechanism and rate of adsorption was ascertained by analyzing experimental data at various time to conventional kinetic models such as pseudo-first-order and second order, Elovich and intra-particle diffusion models. Comparison according to general criterion such as relative error in adsorption capacity and correlation coefficient confirm the usability of pseudo-second-order kinetic model for explanation of data. The Langmuir models is efficiently can explained the behavior of adsorption system to give full information about interaction of BG with ZnS-NP-AC. A multiple linear regression (MLR) and a hybrid of artificial neural network and partial swarm optimization (ANN-PSO) model were used for prediction of brilliant green adsorption onto ZnS-NP-AC. Comparison of the results obtained using offered models confirm higher ability of ANN model compare to the MLR model for prediction of BG adsorption onto ZnS-NP-AC. Using the optimal ANN-PSO model the coefficient of determination (R2) were 0.9610 and 0.9506; mean squared error (MSE) values were 0.0020 and 0.0022 for the training and testing data set, respectively.

  2. Development of Ensemble Model Based Water Demand Forecasting Model

    NASA Astrophysics Data System (ADS)

    Kwon, Hyun-Han; So, Byung-Jin; Kim, Seong-Hyeon; Kim, Byung-Seop

    2014-05-01

    In recent years, Smart Water Grid (SWG) concept has globally emerged over the last decade and also gained significant recognition in South Korea. Especially, there has been growing interest in water demand forecast and optimal pump operation and this has led to various studies regarding energy saving and improvement of water supply reliability. Existing water demand forecasting models are categorized into two groups in view of modeling and predicting their behavior in time series. One is to consider embedded patterns such as seasonality, periodicity and trends, and the other one is an autoregressive model that is using short memory Markovian processes (Emmanuel et al., 2012). The main disadvantage of the abovementioned model is that there is a limit to predictability of water demands of about sub-daily scale because the system is nonlinear. In this regard, this study aims to develop a nonlinear ensemble model for hourly water demand forecasting which allow us to estimate uncertainties across different model classes. The proposed model is consist of two parts. One is a multi-model scheme that is based on combination of independent prediction model. The other one is a cross validation scheme named Bagging approach introduced by Brieman (1996) to derive weighting factors corresponding to individual models. Individual forecasting models that used in this study are linear regression analysis model, polynomial regression, multivariate adaptive regression splines(MARS), SVM(support vector machine). The concepts are demonstrated through application to observed from water plant at several locations in the South Korea. Keywords: water demand, non-linear model, the ensemble forecasting model, uncertainty. Acknowledgements This subject is supported by Korea Ministry of Environment as "Projects for Developing Eco-Innovation Technologies (GT-11-G-02-001-6)

  3. Adaptive Modeling Procedure Selection by Data Perturbation.

    PubMed

    Zhang, Yongli; Shen, Xiaotong

    2015-10-01

    Many procedures have been developed to deal with the high-dimensional problem that is emerging in various business and economics areas. To evaluate and compare these procedures, modeling uncertainty caused by model selection and parameter estimation has to be assessed and integrated into a modeling process. To do this, a data perturbation method estimates the modeling uncertainty inherited in a selection process by perturbing the data. Critical to data perturbation is the size of perturbation, as the perturbed data should resemble the original dataset. To account for the modeling uncertainty, we derive the optimal size of perturbation, which adapts to the data, the model space, and other relevant factors in the context of linear regression. On this basis, we develop an adaptive data-perturbation method that, unlike its nonadaptive counterpart, performs well in different situations. This leads to a data-adaptive model selection method. Both theoretical and numerical analysis suggest that the data-adaptive model selection method adapts to distinct situations in that it yields consistent model selection and optimal prediction, without knowing which situation exists a priori. The proposed method is applied to real data from the commodity market and outperforms its competitors in terms of price forecasting accuracy.

  4. Application of nonlinear-regression methods to a ground-water flow model of the Albuquerque Basin, New Mexico

    USGS Publications Warehouse

    Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.

    1998-01-01

    This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface flow from adjacent regions; irrigation and septic field seepage; and leakage through the Rio Grande, canal, and Cochiti Reservoir beds. Ground water is discharged from the basin by withdrawal; evapotranspiration; subsurface flow; and flow to the Rio Grande, canals, and drains. The transient, three-dimensional numerical model of ground-water flow to which nonlinear-regression methods were applied simulates flow in the Albuquerque Basin from 1900 to March 1995. Six different basin subsurface configurations are considered in the model. These configurations are designed to test the effects of (1) varying the simulated basin thickness, (2) including a hypothesized hydrogeologic unit with large hydraulic conductivity in the western part of the basin (the west basin high-K zone), and (3) substantially lowering the simulated hydraulic conductivity of a fault in the western part of the basin (the low-K fault zone). The model with each of the subsurface configurations was calibrated using a nonlinear least- squares regression technique. The calibration data set includes 802 hydraulic-head measurements that provide broad spatial and temporal coverage of basin conditions, and one measurement of net flow from the Rio Grande and drains to the ground-water system in the Albuquerque area. Data are weighted on the basis of estimates of the standard deviations of measurement errors. The 10 to 12 parameters to which the calibration data as a whole are generally most sensitive were estimated by nonlinear regression, whereas the remaining model parameter values were specified. Results of model calibration indicate that the optimal parameter estimates as a whole are most reasonable in calibrations of the model with with configurations 3 (which contains 1,600-ft-thick basin deposits and the west basin high-K zone), 4 (which contains 5,000-ft-thick basin de

  5. Nursing workload, patient safety incidents and mortality: an observational study from Finland

    PubMed Central

    Kinnunen, Marina; Saarela, Jan

    2018-01-01

    Objective To investigate whether the daily workload per nurse (Oulu Patient Classification (OPCq)/nurse) as measured by the RAFAELA system correlates with different types of patient safety incidents and with patient mortality, and to compare the results with regressions based on the standard patients/nurse measure. Setting We obtained data from 36 units from four Finnish hospitals. One was a tertiary acute care hospital, and the three others were secondary acute care hospitals. Participants Patients’ nursing intensity (249 123 classifications), nursing resources, patient safety incidents and patient mortality were collected on a daily basis during 1 year, corresponding to 12 475 data points. Associations between OPC/nurse and patient safety incidents or mortality were estimated using unadjusted logistic regression models, and models that adjusted for ward-specific effects, and effects of day of the week, holiday and season. Primary and secondary outcome measures Main outcome measures were patient safety incidents and death of a patient. Results When OPC/nurse was above the assumed optimal level, the adjusted odds for a patient safety incident were 1.24 (95% CI 1.08 to 1.42) that of the assumed optimal level, and 0.79 (95% CI 0.67 to 0.93) if it was below the assumed optimal level. Corresponding estimates for patient mortality were 1.43 (95% CI 1.18 to 1.73) and 0.78 (95% CI 0.60 to 1.00), respectively. As compared with the patients/nurse classification, models estimated on basis of the RAFAELA classification system generally provided larger effect sizes, greater statistical power and better model fit, although the difference was not very large. Net benefits as calculated on the basis of decision analysis did not provide any clear evidence on which measure to prefer. Conclusions We have demonstrated an association between daily workload per nurse and patient safety incidents and mortality. Current findings need to be replicated by future studies. PMID:29691240

  6. The value of a statistical life: a meta-analysis with a mixed effects regression model.

    PubMed

    Bellavance, François; Dionne, Georges; Lebeau, Martin

    2009-03-01

    The value of a statistical life (VSL) is a very controversial topic, but one which is essential to the optimization of governmental decisions. We see a great variability in the values obtained from different studies. The source of this variability needs to be understood, in order to offer public decision-makers better guidance in choosing a value and to set clearer guidelines for future research on the topic. This article presents a meta-analysis based on 39 observations obtained from 37 studies (from nine different countries) which all use a hedonic wage method to calculate the VSL. Our meta-analysis is innovative in that it is the first to use the mixed effects regression model [Raudenbush, S.W., 1994. Random effects models. In: Cooper, H., Hedges, L.V. (Eds.), The Handbook of Research Synthesis. Russel Sage Foundation, New York] to analyze studies on the value of a statistical life. We conclude that the variability found in the values studied stems in large part from differences in methodologies.

  7. Measuring the Impact of Financial Intermediation: Linking Contract Theory to Econometric Policy Evaluation *

    PubMed Central

    Townsend, Robert M.; Urzua, Sergio S.

    2010-01-01

    We study the impact that financial intermediation can have on productivity through the alleviation of credit constraints in occupation choice and/or an improved allocation of risk, using both static and dynamic structural models as well as reduced form OLS and IV regressions. Our goal in this paper is to bring these two strands of the literature together. Even though, under certain assumptions, IV regressions can recover accurately the true model-generated local average treatment effect, these are quantitatively different, in order of magnitude and even sign, from other policy impact parameters (e.g., ATE and TT). We also show that laying out clearly alternative models can guide the search for instruments. On the other hand adding more margins of decision, i.e., occupation choice and intermediation jointly, or adding more periods with promised utilities as key state variables, as in optimal multi-period contracts, can cause the misinterpretation of IV as the causal effect of interest. PMID:20436953

  8. Generalized Scalar-on-Image Regression Models via Total Variation.

    PubMed

    Wang, Xiao; Zhu, Hongtu

    2017-01-01

    The use of imaging markers to predict clinical outcomes can have a great impact in public health. The aim of this paper is to develop a class of generalized scalar-on-image regression models via total variation (GSIRM-TV), in the sense of generalized linear models, for scalar response and imaging predictor with the presence of scalar covariates. A key novelty of GSIRM-TV is that it is assumed that the slope function (or image) of GSIRM-TV belongs to the space of bounded total variation in order to explicitly account for the piecewise smooth nature of most imaging data. We develop an efficient penalized total variation optimization to estimate the unknown slope function and other parameters. We also establish nonasymptotic error bounds on the excess risk. These bounds are explicitly specified in terms of sample size, image size, and image smoothness. Our simulations demonstrate a superior performance of GSIRM-TV against many existing approaches. We apply GSIRM-TV to the analysis of hippocampus data obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset.

  9. Multi-fidelity Gaussian process regression for prediction of random fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parussini, L.; Venturi, D., E-mail: venturi@ucsc.edu; Perdikaris, P.

    We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgersmore » equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.« less

  10. Parametric Human Body Reconstruction Based on Sparse Key Points.

    PubMed

    Cheng, Ke-Li; Tong, Ruo-Feng; Tang, Min; Qian, Jing-Ye; Sarkis, Michel

    2016-11-01

    We propose an automatic parametric human body reconstruction algorithm which can efficiently construct a model using a single Kinect sensor. A user needs to stand still in front of the sensor for a couple of seconds to measure the range data. The user's body shape and pose will then be automatically constructed in several seconds. Traditional methods optimize dense correspondences between range data and meshes. In contrast, our proposed scheme relies on sparse key points for the reconstruction. It employs regression to find the corresponding key points between the scanned range data and some annotated training data. We design two kinds of feature descriptors as well as corresponding regression stages to make the regression robust and accurate. Our scheme follows with dense refinement where a pre-factorization method is applied to improve the computational efficiency. Compared with other methods, our scheme achieves similar reconstruction accuracy but significantly reduces runtime.

  11. Optimal selection of MULTI-model downscaled ensembles for interannual and seasonal climate prediction in the eastern seaboard of Thailand

    NASA Astrophysics Data System (ADS)

    Bejranonda, W.; Koch, M.

    2010-12-01

    Because of the imminent threat of the water resources of the eastern seaboard of Thailand, a climate impact study has been carried out there. To that avail, a hydrological watershed model is being used to simulate the future water availability in the wake of possible climate change in the region. The hydrological model is forced by predictions from global climate models (GCMs) that are to be downscaled in an appropriate manner. The challenge at that stage of the climate impact analysis lies then the in the choice of the best GCM and the (statistical) downscaling method. In this study the selection of coarse grid resolution output of the GCMs, transferring information to the fine grid of local climate-hydrology is achieved by cross-correlation and multiple linear regression using meteorological data in the eastern seaboard of Thailand observed between 1970-1999. The grids of 20 atmosphere/ocean global climate models (AOGCM), covering latitude 12.5-15.0 N and longitude 100.0-102.5 E were examined using the Climate-Change Scenario Generator (SCENGEN). With that tool the model efficiency of the prediction of daily precipitation and mean temperature was calculated by comparing the 1980-1999 ECMWF reanalysis predictions with the observed data during that time period. The root means square errors of the predictions were considered and ranked to select the top 5 models, namely, BCCR-BCM2.0, GISS-ER, ECHO-G, ECHAM5/MPI-OM and PCM. The daily time-series of 338 predictors in 9 runs of the 5 selected models were gathered from the CMIP3 multi-model database. Monthly time-serial cross-correlations between the climate predictors and the meteorological measurements from 25 rainfall, 4 minimum and maximum temperature, 4 humidity and 2 solar radiation stations in the study area were then computed and ranked. Using the ranked predictors, a multiple-linear regression model (downscaling transfer model) to forecast the local climate was set up. To improve the prediction power of this GCM downscaling approach, the regression equations were considered as a dynamic regression model that can alter the predictor by seasonal variation. The possible seasonal effect was examined for the 1974-1999 period which was equally divided into a calibration and verification sub-period. The calibrated model using the whole observed time-series was compared with the models separated into 2 seasons; dry and wet, 3 seasons; winter, summer and rainy, and 4 seasons; dry, pre-monsoon, first monsoon and second monsoon. The verification power of the various model variants was measured considering Akaike's information criterion (AIC) and the Nash-Sutcliffe coefficient of the corresponding model fit. The results show that the 4-seasons-variation prediction works best. The highest efficiency for the prediction of rainfall is achieved for the dry season, Oct-Mar, whereas the smallest efficiency is obtained in the monsoon seasons. The overall number of predictor giving top efficiency lies between 3 and 20 in the regression models. In the next, still ongoing stage of the climate impact study the predictions from this new, seasonally optimized downscaling transfer model are being used in the simulations of the future hydrological water budget in that region of Thailand.

  12. Development of a funding, cost, and spending model for satellite projects

    NASA Technical Reports Server (NTRS)

    Johnson, Jesse P.

    1989-01-01

    The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.

  13. Wavelet analysis techniques applied to removing varying spectroscopic background in calibration model for pear sugar content

    NASA Astrophysics Data System (ADS)

    Liu, Yande; Ying, Yibin; Lu, Huishan; Fu, Xiaping

    2005-11-01

    A new method is proposed to eliminate the varying background and noise simultaneously for multivariate calibration of Fourier transform near infrared (FT-NIR) spectral signals. An ideal spectrum signal prototype was constructed based on the FT-NIR spectrum of fruit sugar content measurement. The performances of wavelet based threshold de-noising approaches via different combinations of wavelet base functions were compared. Three families of wavelet base function (Daubechies, Symlets and Coiflets) were applied to estimate the performance of those wavelet bases and threshold selection rules by a series of experiments. The experimental results show that the best de-noising performance is reached via the combinations of Daubechies 4 or Symlet 4 wavelet base function. Based on the optimization parameter, wavelet regression models for sugar content of pear were also developed and result in a smaller prediction error than a traditional Partial Least Squares Regression (PLSR) mode.

  14. Non-linear quantitative structure-activity relationship for adenine derivatives as competitive inhibitors of adenosine deaminase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sadat Hayatshahi, Sayyed Hamed; Abdolmaleki, Parviz; Safarian, Shahrokh

    2005-12-16

    Logistic regression and artificial neural networks have been developed as two non-linear models to establish quantitative structure-activity relationships between structural descriptors and biochemical activity of adenosine based competitive inhibitors, toward adenosine deaminase. The training set included 24 compounds with known k {sub i} values. The models were trained to solve two-class problems. Unlike the previous work in which multiple linear regression was used, the highest of positive charge on the molecules was recognized to be in close relation with their inhibition activity, while the electric charge on atom N1 of adenosine was found to be a poor descriptor. Consequently, themore » previously developed equation was improved and the newly formed one could predict the class of 91.66% of compounds correctly. Also optimized 2-3-1 and 3-4-1 neural networks could increase this rate to 95.83%.« less

  15. Learning-based 3D surface optimization from medical image reconstruction

    NASA Astrophysics Data System (ADS)

    Wei, Mingqiang; Wang, Jun; Guo, Xianglin; Wu, Huisi; Xie, Haoran; Wang, Fu Lee; Qin, Jing

    2018-04-01

    Mesh optimization has been studied from the graphical point of view: It often focuses on 3D surfaces obtained by optical and laser scanners. This is despite the fact that isosurfaced meshes of medical image reconstruction suffer from both staircases and noise: Isotropic filters lead to shape distortion, while anisotropic ones maintain pseudo-features. We present a data-driven method for automatically removing these medical artifacts while not introducing additional ones. We consider mesh optimization as a combination of vertex filtering and facet filtering in two stages: Offline training and runtime optimization. In specific, we first detect staircases based on the scanning direction of CT/MRI scanners, and design a staircase-sensitive Laplacian filter (vertex-based) to remove them; and then design a unilateral filtered facet normal descriptor (uFND) for measuring the geometry features around each facet of a given mesh, and learn the regression functions from a set of medical meshes and their high-resolution reference counterparts for mapping the uFNDs to the facet normals of the reference meshes (facet-based). At runtime, we first perform staircase-sensitive Laplacian filter on an input MC (Marching Cubes) mesh, and then filter the mesh facet normal field using the learned regression functions, and finally deform it to match the new normal field for obtaining a compact approximation of the high-resolution reference model. Tests show that our algorithm achieves higher quality results than previous approaches regarding surface smoothness and surface accuracy.

  16. Does high optimism protect against the inter-generational transmission of high BMI? The Cardiovascular Risk in Young Finns Study.

    PubMed

    Serlachius, Anna; Pulkki-Råback, Laura; Juonala, Markus; Sabin, Matthew; Lehtimäki, Terho; Raitakari, Olli; Elovainio, Marko

    2017-09-01

    The transmission of overweight from one generation to the next is well established, however little is known about what psychosocial factors may protect against this familial risk. The aim of this study was to examine whether optimism plays a role in the intergenerational transmission of obesity. Our sample included 1043 participants from the prospective Cardiovascular Risk in Young FINNS Study. Optimism was measured in early adulthood (2001) when the cohort was aged 24-39years. BMI was measured in 2001 (baseline) and 2012 when they were aged 35-50years. Parental BMI was measured in 1980. Hierarchical linear regression and logistic regression were used to examine the association between optimism and future BMI/obesity, and whether an interaction existed between optimism and parental BMI when predicting BMI/obesity 11years later. High optimism in young adulthood demonstrated a negative relationship with high BMI in mid-adulthood, but only in women (β=-0.127, p=0.001). The optimism×maternal BMI interaction term was a significant predictor of future BMI in women (β=-0.588, p=0.036). The logistic regression results confirmed that high optimism predicted reduced obesity in women (OR=0.68, 95% CI, 0.55-0.86), however the optimism × maternal obesity interaction term was not a significant predictor (OR=0.50, 95% CI, 0.10-2.48). Our findings supported our hypothesis that high optimism mitigated the intergenerational transmission of high BMI, but only in women. These findings also provided evidence that positive psychosocial factors such as optimism are associated with long-term protective effects on BMI in women. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Optimization of processing parameters for the preparation of phytosterol microemulsions by the solvent displacement method.

    PubMed

    Leong, Wai Fun; Che Man, Yaakob B; Lai, Oi Ming; Long, Kamariah; Misran, Misni; Tan, Chin Ping

    2009-09-23

    The purpose of this study was to optimize the parameters involved in the production of water-soluble phytosterol microemulsions for use in the food industry. In this study, response surface methodology (RSM) was employed to model and optimize four of the processing parameters, namely, the number of cycles of high-pressure homogenization (1-9 cycles), the pressure used for high-pressure homogenization (100-500 bar), the evaporation temperature (30-70 degrees C), and the concentration ratio of microemulsions (1-5). All responses-particle size (PS), polydispersity index (PDI), and percent ethanol residual (%ER)-were well fit by a reduced cubic model obtained by multiple regression after manual elimination. The coefficient of determination (R(2)) and absolute average deviation (AAD) value for PS, PDI, and %ER were 0.9628 and 0.5398%, 0.9953 and 0.7077%, and 0.9989 and 1.0457%, respectively. The optimized processing parameters were 4.88 (approximately 5) homogenization cycles, homogenization pressure of 400 bar, evaporation temperature of 44.5 degrees C, and concentration ratio of microemulsions of 2.34 cycles (approximately 2 cycles) of high-pressure homogenization. The corresponding responses for the optimized preparation condition were a minimal particle size of 328 nm, minimal polydispersity index of 0.159, and <0.1% of ethanol residual. The chi-square test verified the model, whereby the experimental values of PS, PDI, and %ER agreed with the predicted values at a 0.05 level of significance.

  18. A Comparative Investigation of the Combined Effects of Pre-Processing, Wavelength Selection, and Regression Methods on Near-Infrared Calibration Model Performance.

    PubMed

    Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N

    2017-07-01

    Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant role in the calibration while wavelength selection plays a marginal role and the combination of certain pre-processing, wavelength selection, and nonlinear regression methods can achieve superior performance over traditional linear regression-based calibration.

  19. Optimization of the trienzyme extraction for the microbiological assay of folate in vegetables.

    PubMed

    Chen, Liwen; Eitenmiller, Ronald R

    2007-05-16

    Response surface methodology was applied to optimize the trienzyme digestion for the extraction of folate from vegetables. Trienzyme extraction is a combined enzymatic digestion by protease, alpha-amylase, and conjugase (gamma-glutamyl hydrolase) to liberate the carbohydrate and protein-bound folates from food matrices for total folate analysis. It is the extraction method used in AOAC Official Method 2004.05 for assay of total folate in cereal grain products. Certified reference material (CRM) 485 mixed vegetables was used to represent the matrix of vegetables. Regression and ridge analysis were performed by statistical analysis software. The predicted second-order polynomial model was adequate (R2 = 0.947), without significant lack of fit (p > 0.1). Both protease and alpha-amylase have significant effects on the extraction. Ridge analysis gave an optimum trienzyme digestion time: Pronase, 1.5 h; alpha-amylase, 1.5 h; and conjugase, 3 h. The experimental value for CRM 485 under this optimum digestion was close to the predicted value from the model, confirming the validity and adequacy of the model. The optimized trienzyme digestion condition was applied to five vegetables and yielded higher folate levels than the trienzyme digestion parameters employed in AOAC Official Method 2004.05.

  20. Analyzing multicomponent receptive fields from neural responses to natural stimuli

    PubMed Central

    Rowekamp, Ryan; Sharpee, Tatyana O

    2011-01-01

    The challenge of building increasingly better models of neural responses to natural stimuli is to accurately estimate the multiple stimulus features that may jointly affect the neural spike probability. The selectivity for combinations of features is thought to be crucial for achieving classical properties of neural responses such as contrast invariance. The joint search for these multiple stimulus features is difficult because estimating spike probability as a multidimensional function of stimulus projections onto candidate relevant dimensions is subject to the curse of dimensionality. An attractive alternative is to search for relevant dimensions sequentially, as in projection pursuit regression. Here we demonstrate using analytic arguments and simulations of model cells that different types of sequential search strategies exhibit systematic biases when used with natural stimuli. Simulations show that joint optimization is feasible for up to three dimensions with current algorithms. When applied to the responses of V1 neurons to natural scenes, models based on three jointly optimized dimensions had better predictive power in a majority of cases compared to dimensions optimized sequentially, with different sequential methods yielding comparable results. Thus, although the curse of dimensionality remains, at least several relevant dimensions can be estimated by joint information maximization. PMID:21780916

  1. The extraction process optimization of antioxidant polysaccharides from Marshmallow (Althaea officinalis L.) roots.

    PubMed

    Pakrokh Ghavi, Peyman

    2015-04-01

    Response surface methodology (RSM) with a central composite rotatable design (CCRD) based on five levels was employed to model and optimize four experimental operating conditions of extraction temperature (10-90 °C) and time (6-30 h), particle size (6-24 mm) and water to solid (W/S, 10-50) ratio, obtaining polysaccharides from Althaea officinalis roots with high yield and antioxidant activity. For each response, a second-order polynomial model with high R(2) values (> 0.966) was developed using multiple linear regression analysis. Results showed that the most significant (P < 0.05) extraction conditions that affect the yield and antioxidant activity of extracted polysaccharides were the main effect of extraction temperature and the interaction effect of the particle size and W/S ratio. The optimum conditions to maximize yield (10.80%) and antioxidant activity (84.09%) for polysaccharides extraction from A. officinalis roots were extraction temperature 60.90 °C, extraction time 12.01 h, particle size 12.0mm and W/S ratio of 40.0. The experimental values were found to be in agreement with those predicted, indicating the models suitability for optimizing the polysaccharides extraction conditions. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Multi-Objective Optimization of Friction Stir Welding Process Parameters of AA6061-T6 and AA7075-T6 Using a Biogeography Based Optimization Algorithm

    PubMed Central

    Tamjidy, Mehran; Baharudin, B. T. Hang Tuah; Paslar, Shahla; Matori, Khamirul Amin; Sulaiman, Shamsuddin; Fadaeifard, Firouz

    2017-01-01

    The development of Friction Stir Welding (FSW) has provided an alternative approach for producing high-quality welds, in a fast and reliable manner. This study focuses on the mechanical properties of the dissimilar friction stir welding of AA6061-T6 and AA7075-T6 aluminum alloys. The FSW process parameters such as tool rotational speed, tool traverse speed, tilt angle, and tool offset influence the mechanical properties of the friction stir welded joints significantly. A mathematical regression model is developed to determine the empirical relationship between the FSW process parameters and mechanical properties, and the results are validated. In order to obtain the optimal values of process parameters that simultaneously optimize the ultimate tensile strength, elongation, and minimum hardness in the heat affected zone (HAZ), a metaheuristic, multi objective algorithm based on biogeography based optimization is proposed. The Pareto optimal frontiers for triple and dual objective functions are obtained and the best optimal solution is selected through using two different decision making techniques, technique for order of preference by similarity to ideal solution (TOPSIS) and Shannon’s entropy. PMID:28772893

  3. Multi-Objective Optimization of Friction Stir Welding Process Parameters of AA6061-T6 and AA7075-T6 Using a Biogeography Based Optimization Algorithm.

    PubMed

    Tamjidy, Mehran; Baharudin, B T Hang Tuah; Paslar, Shahla; Matori, Khamirul Amin; Sulaiman, Shamsuddin; Fadaeifard, Firouz

    2017-05-15

    The development of Friction Stir Welding (FSW) has provided an alternative approach for producing high-quality welds, in a fast and reliable manner. This study focuses on the mechanical properties of the dissimilar friction stir welding of AA6061-T6 and AA7075-T6 aluminum alloys. The FSW process parameters such as tool rotational speed, tool traverse speed, tilt angle, and tool offset influence the mechanical properties of the friction stir welded joints significantly. A mathematical regression model is developed to determine the empirical relationship between the FSW process parameters and mechanical properties, and the results are validated. In order to obtain the optimal values of process parameters that simultaneously optimize the ultimate tensile strength, elongation, and minimum hardness in the heat affected zone (HAZ), a metaheuristic, multi objective algorithm based on biogeography based optimization is proposed. The Pareto optimal frontiers for triple and dual objective functions are obtained and the best optimal solution is selected through using two different decision making techniques, technique for order of preference by similarity to ideal solution (TOPSIS) and Shannon's entropy.

  4. Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation

    PubMed Central

    Song, Yongsoo; Wang, Shuang; Xia, Yuhou; Jiang, Xiaoqian

    2018-01-01

    Background Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. Objective The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). Methods We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Results Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. Conclusions We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still. PMID:29666041

  5. Estimating the global incidence of traumatic spinal cord injury.

    PubMed

    Fitzharris, M; Cripps, R A; Lee, B B

    2014-02-01

    Population modelling--forecasting. To estimate the global incidence of traumatic spinal cord injury (TSCI). An initiative of the International Spinal Cord Society (ISCoS) Prevention Committee. Regression techniques were used to derive regional and global estimates of TSCI incidence. Using the findings of 31 published studies, a regression model was fitted using a known number of TSCI cases as the dependent variable and the population at risk as the single independent variable. In the process of deriving TSCI incidence, an alternative TSCI model was specified in an attempt to arrive at an optimal way of estimating the global incidence of TSCI. The global incidence of TSCI was estimated to be 23 cases per 1,000,000 persons in 2007 (179,312 cases per annum). World Health Organization's regional results are provided. Understanding the incidence of TSCI is important for health service planning and for the determination of injury prevention priorities. In the absence of high-quality epidemiological studies of TSCI in each country, the estimation of TSCI obtained through population modelling can be used to overcome known deficits in global spinal cord injury (SCI) data. The incidence of TSCI is context specific, and an alternative regression model demonstrated how TSCI incidence estimates could be improved with additional data. The results highlight the need for data standardisation and comprehensive reporting of national level TSCI data. A step-wise approach from the collation of conventional epidemiological data through to population modelling is suggested.

  6. Simulation of groundwater level variations using wavelet combined with neural network, linear regression and support vector machine

    NASA Astrophysics Data System (ADS)

    Ebrahimi, Hadi; Rajaee, Taher

    2017-01-01

    Simulation of groundwater level (GWL) fluctuations is an important task in management of groundwater resources. In this study, the effect of wavelet analysis on the training of the artificial neural network (ANN), multi linear regression (MLR) and support vector regression (SVR) approaches was investigated, and the ANN, MLR and SVR along with the wavelet-ANN (WNN), wavelet-MLR (WLR) and wavelet-SVR (WSVR) models were compared in simulating one-month-ahead of GWL. The only variable used to develop the models was the monthly GWL data recorded over a period of 11 years from two wells in the Qom plain, Iran. The results showed that decomposing GWL time series into several sub-time series, extremely improved the training of the models. For both wells 1 and 2, the Meyer and Db5 wavelets produced better results compared to the other wavelets; which indicated wavelet types had similar behavior in similar case studies. The optimal number of delays was 6 months, which seems to be due to natural phenomena. The best WNN model, using Meyer mother wavelet with two decomposition levels, simulated one-month-ahead with RMSE values being equal to 0.069 m and 0.154 m for wells 1 and 2, respectively. The RMSE values for the WLR model were 0.058 m and 0.111 m, and for WSVR model were 0.136 m and 0.060 m for wells 1 and 2, respectively.

  7. Water quality parameter measurement using spectral signatures

    NASA Technical Reports Server (NTRS)

    White, P. E.

    1973-01-01

    Regression analysis is applied to the problem of measuring water quality parameters from remote sensing spectral signature data. The equations necessary to perform regression analysis are presented and methods of testing the strength and reliability of a regression are described. An efficient algorithm for selecting an optimal subset of the independent variables available for a regression is also presented.

  8. Active Learning to Understand Infectious Disease Models and Improve Policy Making

    PubMed Central

    Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel

    2014-01-01

    Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings. PMID:24743387

  9. Active learning to understand infectious disease models and improve policy making.

    PubMed

    Willem, Lander; Stijven, Sean; Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel

    2014-04-01

    Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings.

  10. Highly predictive and interpretable models for PAMPA permeability.

    PubMed

    Sun, Hongmao; Nguyen, Kimloan; Kerns, Edward; Yan, Zhengyin; Yu, Kyeong Ri; Shah, Pranav; Jadhav, Ajit; Xu, Xin

    2017-02-01

    Cell membrane permeability is an important determinant for oral absorption and bioavailability of a drug molecule. An in silico model predicting drug permeability is described, which is built based on a large permeability dataset of 7488 compound entries or 5435 structurally unique molecules measured by the same lab using parallel artificial membrane permeability assay (PAMPA). On the basis of customized molecular descriptors, the support vector regression (SVR) model trained with 4071 compounds with quantitative data is able to predict the remaining 1364 compounds with the qualitative data with an area under the curve of receiver operating characteristic (AUC-ROC) of 0.90. The support vector classification (SVC) model trained with half of the whole dataset comprised of both the quantitative and the qualitative data produced accurate predictions to the remaining data with the AUC-ROC of 0.88. The results suggest that the developed SVR model is highly predictive and provides medicinal chemists a useful in silico tool to facilitate design and synthesis of novel compounds with optimal drug-like properties, and thus accelerate the lead optimization in drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Nondestructive evaluation of soluble solid content in strawberry by near infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Guo, Zhiming; Huang, Wenqian; Chen, Liping; Wang, Xiu; Peng, Yankun

    This paper indicates the feasibility to use near infrared (NIR) spectroscopy combined with synergy interval partial least squares (siPLS) algorithms as a rapid nondestructive method to estimate the soluble solid content (SSC) in strawberry. Spectral preprocessing methods were optimized selected by cross-validation in the model calibration. Partial least squares (PLS) algorithm was conducted on the calibration of regression model. The performance of the final model was back-evaluated according to root mean square error of calibration (RMSEC) and correlation coefficient (R2 c) in calibration set, and tested by mean square error of prediction (RMSEP) and correlation coefficient (R2 p) in prediction set. The optimal siPLS model was obtained with after first derivation spectra preprocessing. The measurement results of best model were achieved as follow: RMSEC = 0.2259, R2 c = 0.9590 in the calibration set; and RMSEP = 0.2892, R2 p = 0.9390 in the prediction set. This work demonstrated that NIR spectroscopy and siPLS with efficient spectral preprocessing is a useful tool for nondestructively evaluation SSC in strawberry.

  12. Development of hybrid genetic-algorithm-based neural networks using regression trees for modeling air quality inside a public transportation bus.

    PubMed

    Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok

    2013-02-01

    The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic-algorithm-based neural network IAQ models outperformed the traditional ANN methods of the back-propagation and the radial basis function networks. The novelty of this research is the development of a novel approach to modeling vehicular indoor air quality by integration of the advanced methods of genetic algorithms, regression trees, and the analysis of variance for the monitored in-vehicle gaseous and particulate matter contaminants, and comparing the results obtained from using the developed approach with conventional artificial intelligence techniques of back propagation networks and radial basis function networks. This study validated the newly developed approach using holdout and threefold cross-validation methods. These results are of great interest to scientists, researchers, and the public in understanding the various aspects of modeling an indoor microenvironment. This methodology can easily be extended to other fields of study also.

  13. Evaluation of laser cutting process with auxiliary gas pressure by soft computing approach

    NASA Astrophysics Data System (ADS)

    Lazov, Lyubomir; Nikolić, Vlastimir; Jovic, Srdjan; Milovančević, Miloš; Deneva, Heristina; Teirumenieka, Erika; Arsic, Nebojsa

    2018-06-01

    Evaluation of the optimal laser cutting parameters is very important for the high cut quality. This is highly nonlinear process with different parameters which is the main challenge in the optimization process. Data mining methodology is one of most versatile method which can be used laser cutting process optimization. Support vector regression (SVR) procedure is implemented since it is a versatile and robust technique for very nonlinear data regression. The goal in this study was to determine the optimal laser cutting parameters to ensure robust condition for minimization of average surface roughness. Three cutting parameters, the cutting speed, the laser power, and the assist gas pressure, were used in the investigation. As a laser type TruLaser 1030 technological system was used. Nitrogen as an assisted gas was used in the laser cutting process. As the data mining method, support vector regression procedure was used. Data mining prediction accuracy was very high according the coefficient (R2) of determination and root mean square error (RMSE): R2 = 0.9975 and RMSE = 0.0337. Therefore the data mining approach could be used effectively for determination of the optimal conditions of the laser cutting process.

  14. Gradient descent for robust kernel-based regression

    NASA Astrophysics Data System (ADS)

    Guo, Zheng-Chu; Hu, Ting; Shi, Lei

    2018-06-01

    In this paper, we study the gradient descent algorithm generated by a robust loss function over a reproducing kernel Hilbert space (RKHS). The loss function is defined by a windowing function G and a scale parameter σ, which can include a wide range of commonly used robust losses for regression. There is still a gap between theoretical analysis and optimization process of empirical risk minimization based on loss: the estimator needs to be global optimal in the theoretical analysis while the optimization method can not ensure the global optimality of its solutions. In this paper, we aim to fill this gap by developing a novel theoretical analysis on the performance of estimators generated by the gradient descent algorithm. We demonstrate that with an appropriately chosen scale parameter σ, the gradient update with early stopping rules can approximate the regression function. Our elegant error analysis can lead to convergence in the standard L 2 norm and the strong RKHS norm, both of which are optimal in the mini-max sense. We show that the scale parameter σ plays an important role in providing robustness as well as fast convergence. The numerical experiments implemented on synthetic examples and real data set also support our theoretical results.

  15. Multi Objective Optimization of Multi Wall Carbon Nanotube Based Nanogrinding Wheel Using Grey Relational and Regression Analysis

    NASA Astrophysics Data System (ADS)

    Sethuramalingam, Prabhu; Vinayagam, Babu Kupusamy

    2016-07-01

    Carbon nanotube mixed grinding wheel is used in the grinding process to analyze the surface characteristics of AISI D2 tool steel material. Till now no work has been carried out using carbon nanotube based grinding wheel. Carbon nanotube based grinding wheel has excellent thermal conductivity and good mechanical properties which are used to improve the surface finish of the workpiece. In the present study, the multi response optimization of process parameters like surface roughness and metal removal rate of grinding process of single wall carbon nanotube (CNT) in mixed cutting fluids is undertaken using orthogonal array with grey relational analysis. Experiments are performed with designated grinding conditions obtained using the L9 orthogonal array. Based on the results of the grey relational analysis, a set of optimum grinding parameters is obtained. Using the analysis of variance approach the significant machining parameters are found. Empirical model for the prediction of output parameters has been developed using regression analysis and the results are compared empirically, for conditions of with and without CNT grinding wheel in grinding process.

  16. Prediction of random-regression coefficient for daily milk yield after 305 days in milk by using the regression-coefficient estimates from the first 305 days.

    PubMed

    Yamazaki, Takeshi; Takeda, Hisato; Hagiya, Koichi; Yamaguchi, Satoshi; Sasaki, Osamu

    2018-03-13

    Because lactation periods in dairy cows lengthen with increasing total milk production, it is important to predict individual productivities after 305 days in milk (DIM) to determine the optimal lactation period. We therefore examined whether the random regression (RR) coefficient from 306 to 450 DIM (M2) can be predicted from those during the first 305 DIM (M1) by using a random regression model. We analyzed test-day milk records from 85690 Holstein cows in their first lactations and 131727 cows in their later (second to fifth) lactations. Data in M1 and M2 were analyzed separately by using different single-trait RR animal models. We then performed a multiple regression analysis of the RR coefficients of M2 on those of M1 during the first and later lactations. The first-order Legendre polynomials were practical covariates of random regression for the milk yields of M2. All RR coefficients for the additive genetic (AG) effect and the intercept for the permanent environmental (PE) effect of M2 had moderate to strong correlations with the intercept for the AG effect of M1. The coefficients of determination for multiple regression of the combined intercepts for the AG and PE effects of M2 on the coefficients for the AG effect of M1 were moderate to high. The daily milk yields of M2 predicted by using the RR coefficients for the AG effect of M1 were highly correlated with those obtained by using the coefficients of M2. Milk production after 305 DIM can be predicted by using the RR coefficient estimates of the AG effect during the first 305 DIM.

  17. Retrieving relevant factors with exploratory SEM and principal-covariate regression: A comparison.

    PubMed

    Vervloet, Marlies; Van den Noortgate, Wim; Ceulemans, Eva

    2018-02-12

    Behavioral researchers often linearly regress a criterion on multiple predictors, aiming to gain insight into the relations between the criterion and predictors. Obtaining this insight from the ordinary least squares (OLS) regression solution may be troublesome, because OLS regression weights show only the effect of a predictor on top of the effects of other predictors. Moreover, when the number of predictors grows larger, it becomes likely that the predictors will be highly collinear, which makes the regression weights' estimates unstable (i.e., the "bouncing beta" problem). Among other procedures, dimension-reduction-based methods have been proposed for dealing with these problems. These methods yield insight into the data by reducing the predictors to a smaller number of summarizing variables and regressing the criterion on these summarizing variables. Two promising methods are principal-covariate regression (PCovR) and exploratory structural equation modeling (ESEM). Both simultaneously optimize reduction and prediction, but they are based on different frameworks. The resulting solutions have not yet been compared; it is thus unclear what the strengths and weaknesses are of both methods. In this article, we focus on the extents to which PCovR and ESEM are able to extract the factors that truly underlie the predictor scores and can predict a single criterion. The results of two simulation studies showed that for a typical behavioral dataset, ESEM (using the BIC for model selection) in this regard is successful more often than PCovR. Yet, in 93% of the datasets PCovR performed equally well, and in the case of 48 predictors, 100 observations, and large differences in the strengths of the factors, PCovR even outperformed ESEM.

  18. celerite: Scalable 1D Gaussian Processes in C++, Python, and Julia

    NASA Astrophysics Data System (ADS)

    Foreman-Mackey, Daniel; Agol, Eric; Ambikasaran, Sivaram; Angus, Ruth

    2017-09-01

    celerite provides fast and scalable Gaussian Process (GP) Regression in one dimension and is implemented in C++, Python, and Julia. The celerite API is designed to be familiar to users of george and, like george, celerite is designed to efficiently evaluate the marginalized likelihood of a dataset under a GP model. This is then be used alongside a non-linear optimization or posterior inference library for the best results.

  19. Optimization of microwave-assisted extraction (MAE) of coriander phenolic antioxidants - response surface methodology approach.

    PubMed

    Zeković, Zoran; Vladić, Jelena; Vidović, Senka; Adamović, Dušan; Pavlić, Branimir

    2016-10-01

    Microwave-assisted extraction (MAE) of polyphenols from coriander seeds was optimized by simultaneous maximization of total phenolic (TP) and total flavonoid (TF) yields, as well as maximized antioxidant activity determined by 1,1-diphenyl-2-picrylhydrazyl and reducing power assays. Box-Behnken experimental design with response surface methodology (RSM) was used for optimization of MAE. Extraction time (X1 , 15-35 min), ethanol concentration (X2 , 50-90% w/w) and irradiation power (X3 , 400-800 W) were investigated as independent variables. Experimentally obtained values of investigated responses were fitted to a second-order polynomial model, and multiple regression analysis and analysis of variance were used to determine fitness of the model and optimal conditions. The optimal MAE conditions for simultaneous maximization of polyphenol yield and increased antioxidant activity were an extraction time of 19 min, an ethanol concentration of 63% and an irradiation power of 570 W, while predicted values of TP, TF, IC50 and EC50 at optimal MAE conditions were 311.23 mg gallic acid equivalent per 100 g dry weight (DW), 213.66 mg catechin equivalent per 100 g DW, 0.0315 mg mL(-1) and 0.1311 mg mL(-1) respectively. RSM was successfully used for multi-response optimization of coriander seed polyphenols. Comparison of optimized MAE with conventional extraction techniques confirmed that MAE provides significantly higher polyphenol yields and extracts with increased antioxidant activity. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.

  20. An Experimental Investigation into the Optimal Processing Conditions for the CO2 Laser Cladding of 20 MnCr5 Steel Using Taguchi Method and ANN

    NASA Astrophysics Data System (ADS)

    Mondal, Subrata; Bandyopadhyay, Asish.; Pal, Pradip Kumar

    2010-10-01

    This paper presents the prediction and evaluation of laser clad profile formed by means of CO2 laser applying Taguchi method and the artificial neural network (ANN). Laser cladding is one of the surface modifying technologies in which the desired surface characteristics of any component can be achieved such as good corrosion resistance, wear resistance and hardness etc. Laser is used as a heat source to melt the anti-corrosive powder of Inconel-625 (Super Alloy) to give a coating on 20 MnCr5 substrate. The parametric study of this technique is also attempted here. The data obtained from experiments have been used to develop the linear regression equation and then to develop the neural network model. Moreover, the data obtained from regression equations have also been used as supporting data to train the neural network. The artificial neural network (ANN) is used to establish the relationship between the input/output parameters of the process. The established ANN model is then indirectly integrated with the optimization technique. It has been seen that the developed neural network model shows a good degree of approximation with experimental data. In order to obtain the combination of process parameters such as laser power, scan speed and powder feed rate for which the output parameters become optimum, the experimental data have been used to develop the response surfaces.

  1. In situ forming biodegradable poly(ε-caprolactone) microsphere systems: a challenge for transarterial embolization therapy. In vitro and preliminary ex vivo studies.

    PubMed

    Salis, Andrea; Porcu, Elena P; Gavini, Elisabetta; Fois, Giulia R; Icaro Cornaglia, Antonia; Rassu, Giovanna; Diana, Marco; Maestri, Marcello; Giunchedi, Paolo; Nikolakakis, Ioannis

    2017-04-01

    In situ forming biodegradable poly(ε-caprolactone) (PCL) microspheres (PCL-ISM) system was developed as a novel embolic agent for transarterial embolization (TAE) therapy of hepatocellular carcinoma (HCC). Ibuprofen sodium (Ibu-Na) was loaded on this platform to evaluate its potential for the treatment of post embolization syndrome. The influence of formulation parameters on the size/shape, encapsulation efficiency and drug release was investigated using mixture experimental design. Regression models were derived and used to optimize the formulation for particle size, encapsulation efficiency and drug release profile for TAE therapy. An ex vivo model using isolated rat livers was established to assess the in situ formation of microspheres. All PCL-ISM components affected the studied properties and fitting indices of the regression models were high (Radj 2  = 0.810 for size, 0.964 encapsulation efficiency, and 0.993 or 0.971 for drug release at 30 min or 48 h). The optimized composition was: PCL = 4%, NMP = 43.1%, oil = 48.9%, surfactant = 2% and drug = 2%. Ex vivo studies revealed that PCL-ISM was able to form microspheres in the hepatic arterial bed. PCL-ISM system provides a novel tool for the treatment of HCC and post-embolization syndrome. It is capable of forming microspheres with desirable size and Ibu-Na release profile after injection into blood vessels.

  2. Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

    PubMed

    Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

    2009-12-01

    To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.

  3. Serum levels of the immune activation marker neopterin change with age and gender and are modified by race, BMI, and percentage of body fat.

    PubMed

    Spencer, Monique E; Jain, Alka; Matteini, Amy; Beamer, Brock A; Wang, Nae-Yuh; Leng, Sean X; Punjabi, Naresh M; Walston, Jeremy D; Fedarko, Neal S

    2010-08-01

    Neopterin, a GTP metabolite expressed by macrophages, is a marker of immune activation. We hypothesize that levels of this serum marker alter with donor age, reflecting increased chronic immune activation in normal aging. In addition to age, we assessed gender, race, body mass index (BMI), and percentage of body fat (%fat) as potential covariates. Serum was obtained from 426 healthy participants whose age ranged from 18 to 87 years. Anthropometric measures included %fat and BMI. Neopterin concentrations were measured by competitive ELISA. The paired associations between neopterin and age, BMI, or %fat were analyzed by Spearman's correlation or by linear regression of log-transformed neopterin, whereas overall associations were modeled by multiple regression of log-transformed neopterin as a function of age, gender, race, BMI, %fat, and interaction terms. Across all participants, neopterin exhibited a positive association with age, BMI, and %fat. Multiple regression modeling of neopterin in women and men as a function of age, BMI, and race revealed that each covariate contributed significantly to neopterin values and that optimal modeling required an interaction term between race and BMI. The covariate %fat was highly correlated with BMI and could be substituted for BMI to yield similar regression coefficients. The association of age and gender with neopterin levels and their modification by race, BMI, or %fat reflect the biology underlying chronic immune activation and perhaps gender differences in disease incidence, morbidity, and mortality.

  4. Travel Demand Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Southworth, Frank; Garrow, Dr. Laurie

    This chapter describes the principal types of both passenger and freight demand models in use today, providing a brief history of model development supported by references to a number of popular texts on the subject, and directing the reader to papers covering some of the more recent technical developments in the area. Over the past half century a variety of methods have been used to estimate and forecast travel demands, drawing concepts from economic/utility maximization theory, transportation system optimization and spatial interaction theory, using and often combining solution techniques as varied as Box-Jenkins methods, non-linear multivariate regression, non-linear mathematical programming,more » and agent-based microsimulation.« less

  5. Automated Screening of Children With Obstructive Sleep Apnea Using Nocturnal Oximetry: An Alternative to Respiratory Polygraphy in Unattended Settings

    PubMed Central

    Álvarez, Daniel; Alonso-Álvarez, María L.; Gutiérrez-Tobal, Gonzalo C.; Crespo, Andrea; Kheirandish-Gozal, Leila; Hornero, Roberto; Gozal, David; Terán-Santos, Joaquín; Del Campo, Félix

    2017-01-01

    Study Objectives: Nocturnal oximetry has become known as a simple, readily available, and potentially useful diagnostic tool of childhood obstructive sleep apnea (OSA). However, at-home respiratory polygraphy (HRP) remains the preferred alternative to polysomnography (PSG) in unattended settings. The aim of this study was twofold: (1) to design and assess a novel methodology for pediatric OSA screening based on automated analysis of at-home oxyhemoglobin saturation (SpO2), and (2) to compare its diagnostic performance with HRP. Methods: SpO2 recordings were parameterized by means of time, frequency, and conventional oximetric measures. Logistic regression models were optimized using genetic algorithms (GAs) for three cutoffs for OSA: 1, 3, and 5 events/h. The diagnostic performance of logistic regression models, manual obstructive apnea-hypopnea index (OAHI) from HRP, and the conventional oxygen desaturation index ≥ 3% (ODI3) were assessed. Results: For a cutoff of 1 event/h, the optimal logistic regression model significantly outperformed both conventional HRP-derived ODI3 and OAHI: 85.5% accuracy (HRP 74.6%; ODI3 65.9%) and 0.97 area under the receiver operating characteristics curve (AUC) (HRP 0.78; ODI3 0.75) were reached. For a cutoff of 3 events/h, the logistic regression model achieved 83.4% accuracy (HRP 85.0%; ODI3 74.5%) and 0.96 AUC (HRP 0.93; ODI3 0.85) whereas using a cutoff of 5 events/h, oximetry reached 82.8% accuracy (HRP 85.1%; ODI3 76.7) and 0.97 AUC (HRP 0.95; ODI3 0.84). Conclusions: Automated analysis of at-home SpO2 recordings provide accurate detection of children with high pretest probability of OSA. Thus, unsupervised nocturnal oximetry may enable a simple and effective alternative to HRP and PSG in unattended settings. Citation: Álvarez D, Alonso-Álvarez ML, Gutiérrez-Tobal GC, Crespo A, Kheirandish-Gozal L, Hornero R, Gozal D, Terán-Santos J, Del Campo F. Automated screening of children with obstructive sleep apnea using nocturnal oximetry: an alternative to respiratory polygraphy in unattended settings. J Clin Sleep Med. 2017;13(5):693–702. PMID:28356177

  6. Treatment of systematic errors in land data assimilation systems

    NASA Astrophysics Data System (ADS)

    Crow, W. T.; Yilmaz, M.

    2012-12-01

    Data assimilation systems are generally designed to minimize the influence of random error on the estimation of system states. Yet, experience with land data assimilation systems has also revealed the presence of large systematic differences between model-derived and remotely-sensed estimates of land surface states. Such differences are commonly resolved prior to data assimilation through implementation of a pre-processing rescaling step whereby observations are scaled (or non-linearly transformed) to somehow "match" comparable predictions made by an assimilation model. While the rationale for removing systematic differences in means (i.e., bias) between models and observations is well-established, relatively little theoretical guidance is currently available to determine the appropriate treatment of higher-order moments during rescaling. This talk presents a simple analytical argument to define an optimal linear-rescaling strategy for observations prior to their assimilation into a land surface model. While a technique based on triple collocation theory is shown to replicate this optimal strategy, commonly-applied rescaling techniques (e.g., so called "least-squares regression" and "variance matching" approaches) are shown to represent only sub-optimal approximations to it. Since the triple collocation approach is likely infeasible in many real-world circumstances, general advice for deciding between various feasible (yet sub-optimal) rescaling approaches will be presented with an emphasis of the implications of this work for the case of directly assimilating satellite radiances. While the bulk of the analysis will deal with linear rescaling techniques, its extension to nonlinear cases will also be discussed.

  7. Correlates of motivation to change in pathological gamblers completing cognitive-behavioral group therapy.

    PubMed

    Gómez-Peña, Mónica; Penelo, Eva; Granero, Roser; Fernández-Aranda, Fernando; Alvarez-Moya, Eva; Santamaría, Juan José; Moragas, Laura; Neus Aymamí, Maria; Gunnard, Katarina; Menchón, José M; Jimenez-Murcia, Susana

    2012-07-01

    The present study analyzes the association between the motivation to change and the cognitive-behavioral group intervention, in terms of dropouts and relapses, in a sample of male pathological gamblers. The specific objectives were as follows: (a) to estimate the predictive value of baseline University of Rhode Island Change Assessment scale (URICA) scores (i.e., at the start of the study) as regards the risk of relapse and dropout during treatment and (b) to assess the incremental predictive ability of URICA scores, as regards the mean change produced in the clinical status of patients between the start and finish of treatment. The relationship between the URICA and the response to treatment was analyzed by means of a pre-post design applied to a sample of 191 patients who were consecutively receiving cognitive-behavioral group therapy. The statistical analysis included logistic regression models and hierarchical multiple linear regression models. The discriminative ability of the models including the four URICA scores regarding the likelihood of relapse and dropout was acceptable (area under the receiver operating haracteristic curve: .73 and .71, respectively). No significant predictive ability was found as regards the differences between baseline and posttreatment scores (changes in R(2) below 5% in the multiple regression models). The availability of useful measures of motivation to change would enable treatment outcomes to be optimized through the application of specific therapeutic interventions. © 2012 Wiley Periodicals, Inc.

  8. 3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

    NASA Astrophysics Data System (ADS)

    Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

    2015-03-01

    During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.

  9. Application of support vector regression for optimization of vibration flow field of high-density polyethylene melts characterized by small angle light scattering

    NASA Astrophysics Data System (ADS)

    Xian, Guangming

    2018-03-01

    In this paper, the vibration flow field parameters of polymer melts in a visual slit die are optimized by using intelligent algorithm. Experimental small angle light scattering (SALS) patterns are shown to characterize the processing process. In order to capture the scattered light, a polarizer and an analyzer are placed before and after the polymer melts. The results reported in this study are obtained using high-density polyethylene (HDPE) with rotation speed at 28 rpm. In addition, support vector regression (SVR) analytical method is introduced for optimization the parameters of vibration flow field. This work establishes the general applicability of SVR for predicting the optimal parameters of vibration flow field.

  10. Modeling Reservoir-River Networks in Support of Optimizing Seasonal-Scale Reservoir Operations

    NASA Astrophysics Data System (ADS)

    Villa, D. L.; Lowry, T. S.; Bier, A.; Barco, J.; Sun, A.

    2011-12-01

    HydroSCOPE (Hydropower Seasonal Concurrent Optimization of Power and the Environment) is a seasonal time-scale tool for scenario analysis and optimization of reservoir-river networks. Developed in MATLAB, HydroSCOPE is an object-oriented model that simulates basin-scale dynamics with an objective of optimizing reservoir operations to maximize revenue from power generation, reliability in the water supply, environmental performance, and flood control. HydroSCOPE is part of a larger toolset that is being developed through a Department of Energy multi-laboratory project. This project's goal is to provide conventional hydropower decision makers with better information to execute their day-ahead and seasonal operations and planning activities by integrating water balance and operational dynamics across a wide range of spatial and temporal scales. This presentation details the modeling approach and functionality of HydroSCOPE. HydroSCOPE consists of a river-reservoir network model and an optimization routine. The river-reservoir network model simulates the heat and water balance of river-reservoir networks for time-scales up to one year. The optimization routine software, DAKOTA (Design Analysis Kit for Optimization and Terascale Applications - dakota.sandia.gov), is seamlessly linked to the network model and is used to optimize daily volumetric releases from the reservoirs to best meet a set of user-defined constraints, such as maximizing revenue while minimizing environmental violations. The network model uses 1-D approximations for both the reservoirs and river reaches and is able to account for surface and sediment heat exchange as well as ice dynamics for both models. The reservoir model also accounts for inflow, density, and withdrawal zone mixing, and diffusive heat exchange. Routing for the river reaches is accomplished using a modified Muskingum-Cunge approach that automatically calculates the internal timestep and sub-reach lengths to match the conditions of each timestep and minimize computational overhead. Power generation for each reservoir is estimated using a 2-dimensional regression that accounts for both the available head and turbine efficiency. The object-oriented architecture makes run configuration easy to update. The dynamic model inputs include inflow and meteorological forecasts while static inputs include bathymetry data, reservoir and power generation characteristics, and topological descriptors. Ensemble forecasts of hydrological and meteorological conditions are supplied in real-time by Pacific Northwest National Laboratory and are used as a proxy for uncertainty, which is carried through the simulation and optimization process to produce output that describes the probability that different operational scenario's will be optimal. The full toolset, which includes HydroSCOPE, is currently being tested on the Feather River system in Northern California and the Upper Colorado Storage Project.

  11. Optimization and performance evaluation for nutrient removal from palm oil mill effluent wastewater using microalgae

    NASA Astrophysics Data System (ADS)

    Ibrahim, Raheek I.; Wong, Z. H.; Mohammad, A. W.

    2015-04-01

    Palm oil mill effluent (POME) wastewater was produced in huge amounts in Malaysia, and if it discharged into the environment, it causes a serious problem regarding its high content of nutrients. This study was devoted to POME wastewater treatment with microalgae. The main objective was to find the optimum conditions (retention time, and pH) in the microalgae treatment of POME wastewater considering retention time as a most important parameter in algae treatment, since after the optimum conditions there is a diverse effect of time and pH and so, the process becomes costly. According to our knowledge, there is no existing study optimized the retention time and pH with % removal of nutrients (ammonia nitrogen NH3-N, and orthophosphorous PO43-) for microalgae treatment of POME wastewater. In order to achieve with optimization, a central composite rotatable design with a second order polynomial model was used, regression coefficients and goodness of fit results in removal percentages of nutrients (NH3-N, and PO43-) were estimated.WinQSB technique was used to optimize the surface response objective functionfor the developed model. Also experiments were done to validate the model results.The optimum conditions were found to be 18 day retention time for ammonia nitrogen, and pH of 9.22, while for orthophosphorous, 15 days were indicated as the optimum retention time with a pH value of 9.2.

  12. Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

    PubMed

    Li, Yaohang; Rata, Ionel; Chiu, See-wing; Jakobsson, Eric

    2010-07-20

    Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods. By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

  13. Improving predicted protein loop structure ranking using a Pareto-optimality consensus method

    PubMed Central

    2010-01-01

    Background Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. Results We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of ~20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods. Conclusions By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set. PMID:20642859

  14. Systems and methods for knowledge discovery in spatial data

    DOEpatents

    Obradovic, Zoran; Fiez, Timothy E.; Vucetic, Slobodan; Lazarevic, Aleksandar; Pokrajac, Dragoljub; Hoskinson, Reed L.

    2005-03-08

    Systems and methods are provided for knowledge discovery in spatial data as well as to systems and methods for optimizing recipes used in spatial environments such as may be found in precision agriculture. A spatial data analysis and modeling module is provided which allows users to interactively and flexibly analyze and mine spatial data. The spatial data analysis and modeling module applies spatial data mining algorithms through a number of steps. The data loading and generation module obtains or generates spatial data and allows for basic partitioning. The inspection module provides basic statistical analysis. The preprocessing module smoothes and cleans the data and allows for basic manipulation of the data. The partitioning module provides for more advanced data partitioning. The prediction module applies regression and classification algorithms on the spatial data. The integration module enhances prediction methods by combining and integrating models. The recommendation module provides the user with site-specific recommendations as to how to optimize a recipe for a spatial environment such as a fertilizer recipe for an agricultural field.

  15. Optimum extrusion-cooking conditions for improving physical properties of fish-cereal based snacks by response surface methodology.

    PubMed

    Singh, R K Ratankumar; Majumdar, Ranendra K; Venkateshwarlu, G

    2014-09-01

    To establish the effect of barrel temperature, screw speed, total moisture and fish flour content on the expansion ratio and bulk density of the fish based extrudates, response surface methodology was adopted in this study. The experiments were optimized using five-levels, four factors central composite design. Analysis of Variance was carried to study the effects of main factors and interaction effects of various factors and regression analysis was carried out to explain the variability. The fitting was done to a second order model with the coded variables for each response. The response surface plots were developed as a function of two independent variables while keeping the other two independent variables at optimal values. Based on the ANOVA, the fitted model confirmed the model fitness for both the dependent variables. Organoleptically highest score was obtained with the combination of temperature-110(0) C, screw speed-480 rpm, moisture-18 % and fish flour-20 %.

  16. Memory control beliefs and everyday forgetfulness in adulthood: the effects of selection, optimization, and compensation strategies.

    PubMed

    Scheibner, Gunnar Benjamin; Leathem, Janet

    2012-01-01

    Controlling for age, gender, education, and self-rated health, the present study used regression analyses to examine the relationships between memory control beliefs and self-reported forgetfulness in the context of the meta-theory of Selective Optimization with Compensation (SOC). Findings from this online survey (N = 409) indicate that, among adult New Zealanders, a higher sense of memory control accounts for a 22.7% reduction in self-reported forgetfulness. Similarly, optimization was found to account for a 5% reduction in forgetfulness while the strategies of selection and compensation were not related to self-reports of forgetfulness. Optimization partially mediated the beneficial effects that some memory beliefs (e.g., believing that memory decline is inevitable and believing in the potential for memory improvement) have on forgetfulness. It was concluded that memory control beliefs are important predictors of self-reported forgetfulness while the support for the SOC model in the context of memory controllability and everyday forgetfulness is limited.

  17. Fatalism, optimism, spirituality, depressive symptoms and stroke outcome: A population-based analysis

    PubMed Central

    Morgenstern, Lewis B.; Sánchez, Brisa N.; Skolarus, Lesli E.; Garcia, Nelda; Risser, Jan M.H.; Wing, Jeffrey J.; Smith, Melinda A.; Zahuranec, Darin B.; Lisabeth, Lynda D.

    2011-01-01

    Background and Purpose We sought to describe the association of spirituality, optimism, fatalism and depressive symptoms with initial stroke severity, stroke recurrence and post-stroke mortality. Methods Stroke cases June 2004–December 2008 were ascertained in Nueces County, Texas. Patients without aphasia were queried on their recall of depressive symptoms, fatalism, optimism, and non-organizational spirituality before stroke using validated scales. The association between scales and stroke outcomes was studied using multiple linear regression with log-transformed NIHSS and Cox proportional hazards regression for recurrence and mortality. Results 669 patients participated, 48.7% were women. In fully adjusted models, an increase in fatalism from the first to third quartile was associated with all-cause mortality (HR=1.41, 95%CI: 1.06, 1.88), marginally associated with risk of recurrence (HR=1.35, 95%CI: 0.97, 1.88), but not stroke severity. Similarly, an increase in depressive symptoms was associated with increased mortality (HR=1.32, 95%CI: 1.02, 1.72), marginally associated with stroke recurrence (HR=1.22, CI: 0.93, 1.62), and with a 9.0% increase in stroke severity (95%CI: 0.01, 18.0). Depressive symptoms altered the fatalism-mortality association such that the association of fatalism and mortality was more pronounced for patients reporting no depressive symptoms. Neither spirituality nor optimism conferred a significant effect on stroke severity, recurrence or mortality. Conclusions Among patients who have already had a stroke, self-described pre-stroke depressive symptoms and fatalism, but not optimism or spirituality, are associated with increased risk of stroke recurrence and mortality. Unconventional risk factors may explain some of the variability in stroke outcomes observed in populations, and may be novel targets for intervention. PMID:21940963

  18. Detection of Oil Chestnuts Infected by Blue Mold Using Near-Infrared Hyperspectral Imaging Combined with Artificial Neural Networks.

    PubMed

    Feng, Lei; Zhu, Susu; Lin, Fucheng; Su, Zhenzhu; Yuan, Kangpei; Zhao, Yiying; He, Yong; Zhang, Chu

    2018-06-15

    Mildew damage is a major reason for chestnut poor quality and yield loss. In this study, a near-infrared hyperspectral imaging system in the 874⁻1734 nm spectral range was applied to detect the mildew damage to chestnuts caused by blue mold. Principal component analysis (PCA) scored images were firstly employed to qualitatively and intuitively distinguish moldy chestnuts from healthy chestnuts. Spectral data were extracted from the hyperspectral images. A successive projections algorithm (SPA) was used to select 12 optimal wavelengths. Artificial neural networks, including back propagation neural network (BPNN), evolutionary neural network (ENN), extreme learning machine (ELM), general regression neural network (GRNN) and radial basis neural network (RBNN) were used to build models using the full spectra and optimal wavelengths to distinguish moldy chestnuts. BPNN and ENN models using full spectra and optimal wavelengths obtained satisfactory performances, with classification accuracies all surpassing 99%. The results indicate the potential for the rapid and non-destructive detection of moldy chestnuts by hyperspectral imaging, which would help to develop online detection system for healthy and blue mold infected chestnuts.

  19. A statistical approach to optimizing concrete mixture design.

    PubMed

    Ahmad, Shamsad; Alghamdi, Saeid A

    2014-01-01

    A step-by-step statistical approach is proposed to obtain optimum proportioning of concrete mixtures using the data obtained through a statistically planned experimental program. The utility of the proposed approach for optimizing the design of concrete mixture is illustrated considering a typical case in which trial mixtures were considered according to a full factorial experiment design involving three factors and their three levels (3(3)). A total of 27 concrete mixtures with three replicates (81 specimens) were considered by varying the levels of key factors affecting compressive strength of concrete, namely, water/cementitious materials ratio (0.38, 0.43, and 0.48), cementitious materials content (350, 375, and 400 kg/m(3)), and fine/total aggregate ratio (0.35, 0.40, and 0.45). The experimental data were utilized to carry out analysis of variance (ANOVA) and to develop a polynomial regression model for compressive strength in terms of the three design factors considered in this study. The developed statistical model was used to show how optimization of concrete mixtures can be carried out with different possible options.

  20. A Statistical Approach to Optimizing Concrete Mixture Design

    PubMed Central

    Alghamdi, Saeid A.

    2014-01-01

    A step-by-step statistical approach is proposed to obtain optimum proportioning of concrete mixtures using the data obtained through a statistically planned experimental program. The utility of the proposed approach for optimizing the design of concrete mixture is illustrated considering a typical case in which trial mixtures were considered according to a full factorial experiment design involving three factors and their three levels (33). A total of 27 concrete mixtures with three replicates (81 specimens) were considered by varying the levels of key factors affecting compressive strength of concrete, namely, water/cementitious materials ratio (0.38, 0.43, and 0.48), cementitious materials content (350, 375, and 400 kg/m3), and fine/total aggregate ratio (0.35, 0.40, and 0.45). The experimental data were utilized to carry out analysis of variance (ANOVA) and to develop a polynomial regression model for compressive strength in terms of the three design factors considered in this study. The developed statistical model was used to show how optimization of concrete mixtures can be carried out with different possible options. PMID:24688405

  1. A nonparametric multiple imputation approach for missing categorical data.

    PubMed

    Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh

    2017-06-06

    Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.

  2. Discovering variable fractional orders of advection-dispersion equations from field data using multi-fidelity Bayesian optimization

    NASA Astrophysics Data System (ADS)

    Pang, Guofei; Perdikaris, Paris; Cai, Wei; Karniadakis, George Em

    2017-11-01

    The fractional advection-dispersion equation (FADE) can describe accurately the solute transport in groundwater but its fractional order has to be determined a priori. Here, we employ multi-fidelity Bayesian optimization to obtain the fractional order under various conditions, and we obtain more accurate results compared to previously published data. Moreover, the present method is very efficient as we use different levels of resolution to construct a stochastic surrogate model and quantify its uncertainty. We consider two different problem set ups. In the first set up, we obtain variable fractional orders of one-dimensional FADE, considering both synthetic and field data. In the second set up, we identify constant fractional orders of two-dimensional FADE using synthetic data. We employ multi-resolution simulations using two-level and three-level Gaussian process regression models to construct the surrogates.

  3. The Design of Hand Gestures for Human-Computer Interaction: Lessons from Sign Language Interpreters.

    PubMed

    Rempel, David; Camilleri, Matt J; Lee, David L

    2015-10-01

    The design and selection of 3D modeled hand gestures for human-computer interaction should follow principles of natural language combined with the need to optimize gesture contrast and recognition. The selection should also consider the discomfort and fatigue associated with distinct hand postures and motions, especially for common commands. Sign language interpreters have extensive and unique experience forming hand gestures and many suffer from hand pain while gesturing. Professional sign language interpreters (N=24) rated discomfort for hand gestures associated with 47 characters and words and 33 hand postures. Clear associations of discomfort with hand postures were identified. In a nominal logistic regression model, high discomfort was associated with gestures requiring a flexed wrist, discordant adjacent fingers, or extended fingers. These and other findings should be considered in the design of hand gestures to optimize the relationship between human cognitive and physical processes and computer gesture recognition systems for human-computer input.

  4. A structural regression model for relationship between indoor air quality with dissatisfaction of occupants in education environment

    NASA Astrophysics Data System (ADS)

    Hosseini, Hamid Reza; Yunos, Mohd Yazid Mohd; Ismail, Sumarni; Yaman, Maheran

    2017-12-01

    This paper analysis the effects of indoor air elements on the dissatisfaction of occupants in education of environments. Tries to find the equation model for increasing the comprehension about these affects and optimizes satisfaction of occupants about indoor environment. Subsequently, increase performance of students, lecturers and staffs. As the method, a satisfaction questionnaire (SQ) and measuring environment elements (MEE) was conducted, 143 respondents at five classrooms, four staff rooms and five lectures rooms were considered. Temperature, air velocity and humidity (TVH) were used as independent variables and dissatisfaction as dependent variable. The hypothesis was tested for significant relationship between variables, and analysis was applied. Results found that indoor air quality presents direct effects on dissatisfaction of occupants and indirect effects on performance and the highest effects fallowed by temperature. These results may help to optimize the quality of efficiency and effectiveness in education environments.

  5. Response surface analysis and modeling of n-alkanes removal through bioremediation of weathered crude oil.

    PubMed

    Mohajeri, Leila; Abdul Aziz, Hamidi; Ali Zahed, Mohammad; Mohajeri, Soraya; Mohamed Kutty, Shamsul Rahman; Hasnain Isa, Mohamed

    2011-01-01

    Central composite design (CCD) and response surface methodology (RSM) were employed to optimize four important variables, i.e. amounts of oil, bacterial inoculum, nitrogen and phosphorus, for the removal of selected n-alkanes during bioremediation of weathered crude oil in coastal sediments using laboratory bioreactors over a 60 day experimentation period. The reactors contained 1 kg soil with different oil, microorganisms and nutrients concentrations. The F Value of 26.89 and the probability value (P < 0.0001) demonstrated significance of the regression model. For crude oil concentration of 2, 16 and 30 g per kg sediments and under optimized conditions, n-alkanes removal was 97.38, 93.14 and 90.21% respectively. Natural attenuation removed 30.07, 25.92 and 23.09% n-alkanes from 2, 16 and 30 g oil/kg sediments respectively. Excessive nutrients addition was found to inhibit bioremediation.

  6. A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction

    NASA Astrophysics Data System (ADS)

    Danandeh Mehr, Ali; Kahya, Ercan

    2017-06-01

    Genetic programming (GP) is able to systematically explore alternative model structures of different accuracy and complexity from observed input and output data. The effectiveness of GP in hydrological system identification has been recognized in recent studies. However, selecting a parsimonious (accurate and simple) model from such alternatives still remains a question. This paper proposes a Pareto-optimal moving average multigene genetic programming (MA-MGGP) approach to develop a parsimonious model for single-station streamflow prediction. The three main components of the approach that take us from observed data to a validated model are: (1) data pre-processing, (2) system identification and (3) system simplification. The data pre-processing ingredient uses a simple moving average filter to diminish the lagged prediction effect of stand-alone data-driven models. The multigene ingredient of the model tends to identify the underlying nonlinear system with expressions simpler than classical monolithic GP and, eventually simplification component exploits Pareto front plot to select a parsimonious model through an interactive complexity-efficiency trade-off. The approach was tested using the daily streamflow records from a station on Senoz Stream, Turkey. Comparing to the efficiency results of stand-alone GP, MGGP, and conventional multi linear regression prediction models as benchmarks, the proposed Pareto-optimal MA-MGGP model put forward a parsimonious solution, which has a noteworthy importance of being applied in practice. In addition, the approach allows the user to enter human insight into the problem to examine evolved models and pick the best performing programs out for further analysis.

  7. TH-E-BRF-01: Exploiting Tumor Shrinkage in Split-Course Radiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Unkelbach, J; Craft, D; Hong, T

    2014-06-15

    Purpose: In split-course radiotherapy, a patient is treated in several stages separated by weeks or months. This regimen has been motivated by radiobiological considerations. However, using modern image-guidance, it also provides an approach to reduce normal tissue dose by exploiting tumor shrinkage. In this work, we consider the optimal design of split-course treatments, motivated by the clinical management of large liver tumors for which normal liver dose constraints prohibit the administration of an ablative radiation dose in a single treatment. Methods: We introduce a dynamic tumor model that incorporates three factors: radiation induced cell kill, tumor shrinkage, and tumor cellmore » repopulation. The design of splitcourse radiotherapy is formulated as a mathematical optimization problem in which the total dose to the liver is minimized, subject to delivering the prescribed dose to the tumor. Based on the model, we gain insight into the optimal administration of radiation over time, i.e. the optimal treatment gaps and dose levels. Results: We analyze treatments consisting of two stages in detail. The analysis confirms the intuition that the second stage should be delivered just before the tumor size reaches a minimum and repopulation overcompensates shrinking. Furthermore, it was found that, for a large range of model parameters, approximately one third of the dose should be delivered in the first stage. The projected benefit of split-course treatments in terms of liver sparing depends on model assumptions. However, the model predicts large liver dose reductions by more than a factor of two for plausible model parameters. Conclusion: The analysis of the tumor model suggests that substantial reduction in normal tissue dose can be achieved by exploiting tumor shrinkage via an optimal design of multi-stage treatments. This suggests taking a fresh look at split-course radiotherapy for selected disease sites where substantial tumor regression translates into reduced target volumes.« less

  8. Modeling and optimization of ultrasound-assisted extraction of polyphenolic compounds from Aronia melanocarpa by-products from filter-tea factory.

    PubMed

    Ramić, Milica; Vidović, Senka; Zeković, Zoran; Vladić, Jelena; Cvejin, Aleksandra; Pavlić, Branimir

    2015-03-01

    Aronia melanocarpa by-product from filter-tea factory was used for the preparation of extracts with high content of bioactive compounds. Extraction process was accelerated using sonication. Three level, three variable face-centered cubic experimental design (FCD) with response surface methodology (RSM) was used for optimization of extraction in terms of maximized yields for total phenolics (TP), flavonoids (TF), anthocyanins (MA) and proanthocyanidins (TPA) contents. Ultrasonic power (X₁: 72-216 W), temperature (X₂: 30-70 °C) and extraction time (X₃: 30-90 min) were investigated as independent variables. Experimental results were fitted to a second-order polynomial model where multiple regression analysis and analysis of variance were used to determine fitness of the model and optimal conditions for investigated responses. Three-dimensional surface plots were generated from the mathematical models. The optimal conditions for ultrasound-assisted extraction of TP, TF, MA and TPA were: X₁=206.64 W, X₂=70 °C, X₃=80.1 min; X₁=210.24 W, X₂=70 °C, X₃=75 min; X₁=216 W, X₂=70 °C, X₃=45.6 min and X₁=199.44 W, X₂=70 °C, X₃=89.7 min, respectively. Generated model predicted values of the TP, TF, MA and TPA to be 15.41 mg GAE/ml, 9.86 mg CE/ml, 2.26 mg C3G/ml and 20.67 mg CE/ml, respectively. Experimental validation was performed and close agreement between experimental and predicted values was found (within 95% confidence interval). Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Optimal design of minimum mean-square error noise reduction algorithms using the simulated annealing technique.

    PubMed

    Bai, Mingsian R; Hsieh, Ping-Ju; Hur, Kur-Nan

    2009-02-01

    The performance of the minimum mean-square error noise reduction (MMSE-NR) algorithm in conjunction with time-recursive averaging (TRA) for noise estimation is found to be very sensitive to the choice of two recursion parameters. To address this problem in a more systematic manner, this paper proposes an optimization method to efficiently search the optimal parameters of the MMSE-TRA-NR algorithms. The objective function is based on a regression model, whereas the optimization process is carried out with the simulated annealing algorithm that is well suited for problems with many local optima. Another NR algorithm proposed in the paper employs linear prediction coding as a preprocessor for extracting the correlated portion of human speech. Objective and subjective tests were undertaken to compare the optimized MMSE-TRA-NR algorithm with several conventional NR algorithms. The results of subjective tests were processed by using analysis of variance to justify the statistic significance. A post hoc test, Tukey's Honestly Significant Difference, was conducted to further assess the pairwise difference between the NR algorithms.

  10. Response surface optimization of medium components for naringinase production from Staphylococcus xylosus MAK2.

    PubMed

    Puri, Munish; Kaur, Aneet; Singh, Ram Sarup; Singh, Anubhav

    2010-09-01

    Response surface methodology was used to optimize the fermentation medium for enhancing naringinase production by Staphylococcus xylosus. The first step of this process involved the individual adjustment and optimization of various medium components at shake flask level. Sources of carbon (sucrose) and nitrogen (sodium nitrate), as well as an inducer (naringin) and pH levels were all found to be the important factors significantly affecting naringinase production. In the second step, a 22 full factorial central composite design was applied to determine the optimal levels of each of the significant variables. A second-order polynomial was derived by multiple regression analysis on the experimental data. Using this methodology, the optimum values for the critical components were obtained as follows: sucrose, 10.0%; sodium nitrate, 10.0%; pH 5.6; biomass concentration, 1.58%; and naringin, 0.50% (w/v), respectively. Under optimal conditions, the experimental naringinase production was 8.45 U/mL. The determination coefficients (R(2)) were 0.9908 and 0.9950 for naringinase activity and biomass production, respectively, indicating an adequate degree of reliability in the model.

  11. Seasonal mean pressure reconstruction for the North Atlantic (1750 1850) based on early marine data

    NASA Astrophysics Data System (ADS)

    Gallego, D.; Garcia-Herrera, R.; Ribera, P.; Jones, P. D.

    2005-12-01

    Measurements of wind strength and direction abstracted from European ships' logbooks during the recently finished CLIWOC project have been used to produce the first gridded Sea Level Pressure (SLP) reconstruction for the 1750-1850 period over the North Atlantic based solely on marine data. The reconstruction is based on a spatial regression analysis calibrated by using data taken from the ICOADS database. An objective methodology has been developed to select the optimal calibration period and spatial domain of the reconstruction by testing several thousands of possible models. The finally selected area, limited by the performance of the regression equations and by the availability of data, covers the region between 28° N and 52° N close to the European coast and between 28° N and 44° N in the open Ocean. The results provide a direct measure of the strength and extension of the Azores High during the 101 years of the study period. The comparison with the recent land-based SLP reconstruction by Luterbacher et al. (2002) indicates the presence of a common signal. The interannual variability of the CLIWOC reconstructions is rather high due to the current scarcity of abstracted wind data in the areas with best response in the regression. Guidelines are proposed to optimize the efficiency of future abstraction work.

  12. Seasonal mean pressure reconstruction for the North Atlantic (1750 1850) based on early marine data

    NASA Astrophysics Data System (ADS)

    Gallego, D.; Garcia-Herrera, R.; Ribera, P.; Jones, P. D.

    2005-08-01

    Measures of wind strength and direction abstracted from European ships' logbooks during the recently finished CLIWOC project have been used to produce the first gridded Sea Level Pressure (SLP) reconstruction for the 1750-1850 period over the North Atlantic based solely on marine data. The reconstruction is based on a spatial regression analysis calibrated by using data taken from the ICOADS database. An objective methodology has been developed to select the optimal calibration period and spatial domain of the reconstruction by testing several thousands of possible models. The finally selected area, limited by the performance of the regression equations and by the availability of data, covers the region between 28°N and 52°N close to the European coast and between 28°N and 44°N in the open Ocean. The results provide a direct measure of the strength and extension of the Azores High during the 101 years of the study period. The comparison with the recent land-based SLP reconstruction by Luterbacher et al. (2002) indicates the presence of a common signal. The interannual variability of the CLIWOC reconstructions is rather high due to the current scarcity of abstracted wind data in the areas with best response in the regression. Guidelines are proposed to optimize the efficiency of future abstraction work.

  13. Factors associated with developing a fear of falling in subjects with primary open-angle glaucoma.

    PubMed

    Adachi, Sayaka; Yuki, Kenya; Awano-Tanabe, Sachiko; Ono, Takeshi; Shiba, Daisuke; Murata, Hiroshi; Asaoka, Ryo; Tsubota, Kazuo

    2018-02-13

    To investigate the relationship between clinical risk factors, including visual field (VF) defects and visual acuity, and a fear of falling, among patients with primary open-angle glaucoma (POAG). All participants answered the following question at a baseline ophthalmic examination: Are you afraid of falling? The same question was then answered every 12 months for 3 years. A binocular integrated visual field was calculated by merging a patient's monocular Humphrey field analyzer VFs, using the 'best sensitivity' method. The means of total deviation values in the whole, superior peripheral, superior central, inferior central, and inferior peripheral VFs were calculated. The relationship between these mean VF measurements, and various clinical factors, against patients' baseline fear of falling and future fear of falling was analyzed using multiple logistic regression. Among 392 POAG subjects, 342 patients (87.2%) responded to the fear of falling question at least twice in the 3 years study period. The optimal regression model for patients' baseline fear of falling included age, gender, mean of total deviation values in the inferior peripheral VF and number of previous falls. The optimal regression equation for future fear of falling included age, gender, mean of total deviation values in the inferior peripheral VF and number of previous falls. Defects in the inferior peripheral VF area are significantly related to the development of a fear of falling.

  14. Optimism, Cynical Hostility, Falls, and Fractures: The Women's Health Initiative Observational Study (WHI-OS).

    PubMed

    Cauley, Jane A; Smagula, Stephen F; Hovey, Kathleen M; Wactawski-Wende, Jean; Andrews, Christopher A; Crandall, Carolyn J; LeBoff, Meryl S; Li, Wenjun; Coday, Mace; Sattari, Maryam; Tindle, Hilary A

    2017-02-01

    Traits of optimism and cynical hostility are features of personality that could influence the risk of falls and fractures by influencing risk-taking behaviors, health behaviors, or inflammation. To test the hypothesis that personality influences falls and fracture risk, we studied 87,342 women enrolled in WHI-OS. Optimism was assessed by the Life Orientation Test-Revised and cynical hostility, the cynicism subscale of the Cook-Medley questionnaire. Higher scores indicate greater optimism and hostility. Optimism and hostility were correlated at r = -0. 31, p < 0.001. Annual self-report of falling ≥2 times in the past year was modeled using repeated measures logistic regression. Cox proportional hazards models were used for the fracture outcomes. We examined the risk of falls and fractures across the quartiles (Q) of optimism and hostility with tests for trends; Q1 formed the referent group. The average follow-up for fractures was 11.4 years and for falls was 7.6 years. In multivariable (MV)-adjusted models, women with the highest optimism scores (Q4) were 11% less likely to report ≥2 falls in the past year (odds ratio [OR] = 0.89; 95% confidence intervals [CI] 0.85-0.90). Women in Q4 for hostility had a 12% higher risk of ≥2 falls (OR = 1.12; 95% CI 1.07-1.17). Higher optimism scores were also associated with a 10% lower risk of fractures, but this association was attenuated in MV models. Women with the greatest hostility (Q4) had a modest increased risk of any fracture (MV-adjusted hazard ratio = 1. 05; 95% CI 1.01-1.09), but there was no association with specific fracture sites. In conclusion, optimism was independently associated with a decreased risk of ≥2 falls, and hostility with an increased risk of ≥2 falls, independent of traditional risk factors. The magnitude of the association was similar to aging 5 years. Whether interventions aimed at attitudes could reduce fall risks remains to be determined. © 2016 American Society for Bone and Mineral Research. © 2016 American Society for Bone and Mineral Research.

  15. Parameterizing sorption isotherms using a hybrid global-local fitting procedure.

    PubMed

    Matott, L Shawn; Singh, Anshuman; Rabideau, Alan J

    2017-05-01

    Predictive modeling of the transport and remediation of groundwater contaminants requires an accurate description of the sorption process, which is usually provided by fitting an isotherm model to site-specific laboratory data. Commonly used calibration procedures, listed in order of increasing sophistication, include: trial-and-error, linearization, non-linear regression, global search, and hybrid global-local search. Given the considerable variability in fitting procedures applied in published isotherm studies, we investigated the importance of algorithm selection through a series of numerical experiments involving 13 previously published sorption datasets. These datasets, considered representative of state-of-the-art for isotherm experiments, had been previously analyzed using trial-and-error, linearization, or non-linear regression methods. The isotherm expressions were re-fit using a 3-stage hybrid global-local search procedure (i.e. global search using particle swarm optimization followed by Powell's derivative free local search method and Gauss-Marquardt-Levenberg non-linear regression). The re-fitted expressions were then compared to previously published fits in terms of the optimized weighted sum of squared residuals (WSSR) fitness function, the final estimated parameters, and the influence on contaminant transport predictions - where easily computed concentration-dependent contaminant retardation factors served as a surrogate measure of likely transport behavior. Results suggest that many of the previously published calibrated isotherm parameter sets were local minima. In some cases, the updated hybrid global-local search yielded order-of-magnitude reductions in the fitness function. In particular, of the candidate isotherms, the Polanyi-type models were most likely to benefit from the use of the hybrid fitting procedure. In some cases, improvements in fitness function were associated with slight (<10%) changes in parameter values, but in other cases significant (>50%) changes in parameter values were noted. Despite these differences, the influence of isotherm misspecification on contaminant transport predictions was quite variable and difficult to predict from inspection of the isotherms. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Support vector regression-guided unravelling: antioxidant capacity and quantitative structure-activity relationship predict reduction and promotion effects of flavonoids on acrylamide formation

    PubMed Central

    Huang, Mengmeng; Wei, Yan; Wang, Jun; Zhang, Yu

    2016-01-01

    We used the support vector regression (SVR) approach to predict and unravel reduction/promotion effect of characteristic flavonoids on the acrylamide formation under a low-moisture Maillard reaction system. Results demonstrated the reduction/promotion effects by flavonoids at addition levels of 1–10000 μmol/L. The maximal inhibition rates (51.7%, 68.8% and 26.1%) and promote rates (57.7%, 178.8% and 27.5%) caused by flavones, flavonols and isoflavones were observed at addition levels of 100 μmol/L and 10000 μmol/L, respectively. The reduction/promotion effects were closely related to the change of trolox equivalent antioxidant capacity (ΔTEAC) and well predicted by triple ΔTEAC measurements via SVR models (R: 0.633–0.900). Flavonols exhibit stronger effects on the acrylamide formation than flavones and isoflavones as well as their O-glycosides derivatives, which may be attributed to the number and position of phenolic and 3-enolic hydroxyls. The reduction/promotion effects were well predicted by using optimized quantitative structure-activity relationship (QSAR) descriptors and SVR models (R: 0.926–0.994). Compared to artificial neural network and multi-linear regression models, SVR models exhibited better fitting performance for both TEAC-dependent and QSAR descriptor-dependent predicting work. These observations demonstrated that the SVR models are competent for predicting our understanding on the future use of natural antioxidants for decreasing the acrylamide formation. PMID:27586851

  17. Support vector regression-guided unravelling: antioxidant capacity and quantitative structure-activity relationship predict reduction and promotion effects of flavonoids on acrylamide formation

    NASA Astrophysics Data System (ADS)

    Huang, Mengmeng; Wei, Yan; Wang, Jun; Zhang, Yu

    2016-09-01

    We used the support vector regression (SVR) approach to predict and unravel reduction/promotion effect of characteristic flavonoids on the acrylamide formation under a low-moisture Maillard reaction system. Results demonstrated the reduction/promotion effects by flavonoids at addition levels of 1-10000 μmol/L. The maximal inhibition rates (51.7%, 68.8% and 26.1%) and promote rates (57.7%, 178.8% and 27.5%) caused by flavones, flavonols and isoflavones were observed at addition levels of 100 μmol/L and 10000 μmol/L, respectively. The reduction/promotion effects were closely related to the change of trolox equivalent antioxidant capacity (ΔTEAC) and well predicted by triple ΔTEAC measurements via SVR models (R: 0.633-0.900). Flavonols exhibit stronger effects on the acrylamide formation than flavones and isoflavones as well as their O-glycosides derivatives, which may be attributed to the number and position of phenolic and 3-enolic hydroxyls. The reduction/promotion effects were well predicted by using optimized quantitative structure-activity relationship (QSAR) descriptors and SVR models (R: 0.926-0.994). Compared to artificial neural network and multi-linear regression models, SVR models exhibited better fitting performance for both TEAC-dependent and QSAR descriptor-dependent predicting work. These observations demonstrated that the SVR models are competent for predicting our understanding on the future use of natural antioxidants for decreasing the acrylamide formation.

  18. Non-metallic coating thickness prediction using artificial neural network and support vector machine with time resolved thermography

    NASA Astrophysics Data System (ADS)

    Wang, Hongjin; Hsieh, Sheng-Jen; Peng, Bo; Zhou, Xunfei

    2016-07-01

    A method without requirements on knowledge about thermal properties of coatings or those of substrates will be interested in the industrial application. Supervised machine learning regressions may provide possible solution to the problem. This paper compares the performances of two regression models (artificial neural networks (ANN) and support vector machines for regression (SVM)) with respect to coating thickness estimations made based on surface temperature increments collected via time resolved thermography. We describe SVM roles in coating thickness prediction. Non-dimensional analyses are conducted to illustrate the effects of coating thicknesses and various factors on surface temperature increments. It's theoretically possible to correlate coating thickness with surface increment. Based on the analyses, the laser power is selected in such a way: during the heating, the temperature increment is high enough to determine the coating thickness variance but low enough to avoid surface melting. Sixty-one pain-coated samples with coating thicknesses varying from 63.5 μm to 571 μm are used to train models. Hyper-parameters of the models are optimized by 10-folder cross validation. Another 28 sets of data are then collected to test the performance of the three methods. The study shows that SVM can provide reliable predictions of unknown data, due to its deterministic characteristics, and it works well when used for a small input data group. The SVM model generates more accurate coating thickness estimates than the ANN model.

  19. Perceived Parenting Styles on College Students' Optimism

    ERIC Educational Resources Information Center

    Baldwin, Debora R.; McIntyre, Anne; Hardaway, Elizabeth

    2007-01-01

    The purpose of this study was to examine the relationship between perceived parenting styles and levels of optimism in undergraduate college students. Sixty-three participants were administered surveys measuring dispositional optimism and perceived parental Authoritative and Authoritarian styles. Multiple regression analysis revealed that both…

  20. Identification of Alfalfa Leaf Diseases Using Image Recognition Technology

    PubMed Central

    Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

    2016-01-01

    Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767

Top