Sample records for variability methods modeling

  1. Balancing precision and risk: should multiple detection methods be analyzed separately in N-mixture models?

    USGS Publications Warehouse

    Graves, Tabitha A.; Royle, J. Andrew; Kendall, Katherine C.; Beier, Paul; Stetz, Jeffrey B.; Macleod, Amy C.

    2012-01-01

    Using multiple detection methods can increase the number, kind, and distribution of individuals sampled, which may increase accuracy and precision and reduce cost of population abundance estimates. However, when variables influencing abundance are of interest, if individuals detected via different methods are influenced by the landscape differently, separate analysis of multiple detection methods may be more appropriate. We evaluated the effects of combining two detection methods on the identification of variables important to local abundance using detections of grizzly bears with hair traps (systematic) and bear rubs (opportunistic). We used hierarchical abundance models (N-mixture models) with separate model components for each detection method. If both methods sample the same population, the use of either data set alone should (1) lead to the selection of the same variables as important and (2) provide similar estimates of relative local abundance. We hypothesized that the inclusion of 2 detection methods versus either method alone should (3) yield more support for variables identified in single method analyses (i.e. fewer variables and models with greater weight), and (4) improve precision of covariate estimates for variables selected in both separate and combined analyses because sample size is larger. As expected, joint analysis of both methods increased precision as well as certainty in variable and model selection. However, the single-method analyses identified different variables and the resulting predicted abundances had different spatial distributions. We recommend comparing single-method and jointly modeled results to identify the presence of individual heterogeneity between detection methods in N-mixture models, along with consideration of detection probabilities, correlations among variables, and tolerance to risk of failing to identify variables important to a subset of the population. The benefits of increased precision should be weighed against those risks. The analysis framework presented here will be useful for other species exhibiting heterogeneity by detection method.

  2. An Efficient Variable Screening Method for Effective Surrogate Models for Reliability-Based Design Optimization

    DTIC Science & Technology

    2014-04-01

    surrogate model generation is difficult for high -dimensional problems, due to the curse of dimensionality. Variable screening methods have been...a variable screening model was developed for the quasi-molecular treatment of ion-atom collision [16]. In engineering, a confidence interval of...for high -level radioactive waste [18]. Moreover, the design sensitivity method can be extended to the variable screening method because vital

  3. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.

    PubMed

    Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan

    2017-01-01

    Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  4. [Variable selection methods combined with local linear embedding theory used for optimization of near infrared spectral quantitative models].

    PubMed

    Hao, Yong; Sun, Xu-Dong; Yang, Qiang

    2012-12-01

    Variables selection strategy combined with local linear embedding (LLE) was introduced for the analysis of complex samples by using near infrared spectroscopy (NIRS). Three methods include Monte Carlo uninformation variable elimination (MCUVE), successive projections algorithm (SPA) and MCUVE connected with SPA were used for eliminating redundancy spectral variables. Partial least squares regression (PLSR) and LLE-PLSR were used for modeling complex samples. The results shown that MCUVE can both extract effective informative variables and improve the precision of models. Compared with PLSR models, LLE-PLSR models can achieve more accurate analysis results. MCUVE combined with LLE-PLSR is an effective modeling method for NIRS quantitative analysis.

  5. Variable Density Effects in Stochastic Lagrangian Models for Turbulent Combustion

    DTIC Science & Technology

    2016-07-20

    PDF methods in dealing with chemical reaction and convection are preserved irrespective of density variation. Since the density variation in a typical...combustion process may be as large as factor of seven, including variable- density effects in PDF methods is of significance. Conventionally, the...strategy of modelling variable density flows in PDF methods is similar to that used for second-moment closure models (SMCM): models are developed based on

  6. Computational Fluid Dynamics Modeling of a Supersonic Nozzle and Integration into a Variable Cycle Engine Model

    NASA Technical Reports Server (NTRS)

    Connolly, Joseph W.; Friedlander, David; Kopasakis, George

    2015-01-01

    This paper covers the development of an integrated nonlinear dynamic simulation for a variable cycle turbofan engine and nozzle that can be integrated with an overall vehicle Aero-Propulso-Servo-Elastic (APSE) model. A previously developed variable cycle turbofan engine model is used for this study and is enhanced here to include variable guide vanes allowing for operation across the supersonic flight regime. The primary focus of this study is to improve the fidelity of the model's thrust response by replacing the simple choked flow equation convergent-divergent nozzle model with a MacCormack method based quasi-1D model. The dynamic response of the nozzle model using the MacCormack method is verified by comparing it against a model of the nozzle using the conservation element/solution element method. A methodology is also presented for the integration of the MacCormack nozzle model with the variable cycle engine.

  7. Computational Fluid Dynamics Modeling of a Supersonic Nozzle and Integration into a Variable Cycle Engine Model

    NASA Technical Reports Server (NTRS)

    Connolly, Joseph W.; Friedlander, David; Kopasakis, George

    2014-01-01

    This paper covers the development of an integrated nonlinear dynamic simulation for a variable cycle turbofan engine and nozzle that can be integrated with an overall vehicle Aero-Propulso-Servo-Elastic (APSE) model. A previously developed variable cycle turbofan engine model is used for this study and is enhanced here to include variable guide vanes allowing for operation across the supersonic flight regime. The primary focus of this study is to improve the fidelity of the model's thrust response by replacing the simple choked flow equation convergent-divergent nozzle model with a MacCormack method based quasi-1D model. The dynamic response of the nozzle model using the MacCormack method is verified by comparing it against a model of the nozzle using the conservation element/solution element method. A methodology is also presented for the integration of the MacCormack nozzle model with the variable cycle engine.

  8. Identification of solid state fermentation degree with FT-NIR spectroscopy: Comparison of wavelength variable selection methods of CARS and SCARS.

    PubMed

    Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai

    2015-01-01

    The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Identification of solid state fermentation degree with FT-NIR spectroscopy: Comparison of wavelength variable selection methods of CARS and SCARS

    NASA Astrophysics Data System (ADS)

    Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai

    2015-10-01

    The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree.

  10. Robust check loss-based variable selection of high-dimensional single-index varying-coefficient model

    NASA Astrophysics Data System (ADS)

    Song, Yunquan; Lin, Lu; Jian, Ling

    2016-07-01

    Single-index varying-coefficient model is an important mathematical modeling method to model nonlinear phenomena in science and engineering. In this paper, we develop a variable selection method for high-dimensional single-index varying-coefficient models using a shrinkage idea. The proposed procedure can simultaneously select significant nonparametric components and parametric components. Under defined regularity conditions, with appropriate selection of tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Moreover, due to the robustness of the check loss function to outliers in the finite samples, our proposed variable selection method is more robust than the ones based on the least squares criterion. Finally, the method is illustrated with numerical simulations.

  11. Two-Point Turbulence Closure Applied to Variable Resolution Modeling

    NASA Technical Reports Server (NTRS)

    Girimaji, Sharath S.; Rubinstein, Robert

    2011-01-01

    Variable resolution methods have become frontline CFD tools, but in order to take full advantage of this promising new technology, more formal theoretical development is desirable. Two general classes of variable resolution methods can be identified: hybrid or zonal methods in which RANS and LES models are solved in different flow regions, and bridging or seamless models which interpolate smoothly between RANS and LES. This paper considers the formulation of bridging methods using methods of two-point closure theory. The fundamental problem is to derive a subgrid two-equation model. We compare and reconcile two different approaches to this goal: the Partially Integrated Transport Model, and the Partially Averaged Navier-Stokes method.

  12. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.

    PubMed

    Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H

    2017-07-01

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.

  13. Affected States Soft Independent Modeling by Class Analogy from the Relation Between Independent Variables, Number of Independent Variables and Sample Size

    PubMed Central

    Kanık, Emine Arzu; Temel, Gülhan Orekici; Erdoğan, Semra; Kaya, İrem Ersöz

    2013-01-01

    Objective: The aim of study is to introduce method of Soft Independent Modeling of Class Analogy (SIMCA), and to express whether the method is affected from the number of independent variables, the relationship between variables and sample size. Study Design: Simulation study. Material and Methods: SIMCA model is performed in two stages. In order to determine whether the method is influenced by the number of independent variables, the relationship between variables and sample size, simulations were done. Conditions in which sample sizes in both groups are equal, and where there are 30, 100 and 1000 samples; where the number of variables is 2, 3, 5, 10, 50 and 100; moreover where the relationship between variables are quite high, in medium level and quite low were mentioned. Results: Average classification accuracy of simulation results which were carried out 1000 times for each possible condition of trial plan were given as tables. Conclusion: It is seen that diagnostic accuracy results increase as the number of independent variables increase. SIMCA method is a method in which the relationship between variables are quite high, the number of independent variables are many in number and where there are outlier values in the data that can be used in conditions having outlier values. PMID:25207065

  14. Assessing the accuracy and stability of variable selection ...

    EPA Pesticide Factsheets

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used, or stepwise procedures are employed which iteratively add/remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating dataset consists of the good/poor condition of n=1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p=212) of landscape features from the StreamCat dataset. Two types of RF models are compared: a full variable set model with all 212 predictors, and a reduced variable set model selected using a backwards elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors, and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substanti

  15. Can the super model (SUMO) method improve hydrological simulations? Exploratory tests with the GR hydrological models

    NASA Astrophysics Data System (ADS)

    Santos, Léonard; Thirel, Guillaume; Perrin, Charles

    2017-04-01

    Errors made by hydrological models may come from a problem in parameter estimation, uncertainty on observed measurements, numerical problems and from the model conceptualization that simplifies the reality. Here we focus on this last issue of hydrological modeling. One of the solutions to reduce structural uncertainty is to use a multimodel method, taking advantage of the great number and the variability of existing hydrological models. In particular, because different models are not similarly good in all situations, using multimodel approaches can improve the robustness of modeled outputs. Traditionally, in hydrology, multimodel methods are based on the output of the model (the simulated flow series). The aim of this poster is to introduce a different approach based on the internal variables of the models. The method is inspired by the SUper MOdel (SUMO, van den Berge et al., 2011) developed for climatology. The idea of the SUMO method is to correct the internal variables of a model taking into account the values of the internal variables of (an)other model(s). This correction is made bilaterally between the different models. The ensemble of the different models constitutes a super model in which all the models exchange information on their internal variables with each other at each time step. Due to this continuity in the exchanges, this multimodel algorithm is more dynamic than traditional multimodel methods. The method will be first tested using two GR4J models (in a state-space representation) with different parameterizations. The results will be presented and compared to traditional multimodel methods that will serve as benchmarks. In the future, other rainfall-runoff models will be used in the super model. References van den Berge, L. A., Selten, F. M., Wiegerinck, W., and Duane, G. S. (2011). A multi-model ensemble method that combines imperfect models through learning. Earth System Dynamics, 2(1) :161-177.

  16. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology

    EPA Science Inventory

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...

  17. Implementation of Kalman filter algorithm on models reduced using singular pertubation approximation method and its application to measurement of water level

    NASA Astrophysics Data System (ADS)

    Rachmawati, Vimala; Khusnul Arif, Didik; Adzkiya, Dieky

    2018-03-01

    The systems contained in the universe often have a large order. Thus, the mathematical model has many state variables that affect the computation time. In addition, generally not all variables are known, so estimations are needed to measure the magnitude of the system that cannot be measured directly. In this paper, we discuss the model reduction and estimation of state variables in the river system to measure the water level. The model reduction of a system is an approximation method of a system with a lower order without significant errors but has a dynamic behaviour that is similar to the original system. The Singular Perturbation Approximation method is one of the model reduction methods where all state variables of the equilibrium system are partitioned into fast and slow modes. Then, The Kalman filter algorithm is used to estimate state variables of stochastic dynamic systems where estimations are computed by predicting state variables based on system dynamics and measurement data. Kalman filters are used to estimate state variables in the original system and reduced system. Then, we compare the estimation results of the state and computational time between the original and reduced system.

  18. Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

    NASA Astrophysics Data System (ADS)

    Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

    2015-11-01

    One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.

  19. Affected States soft independent modeling by class analogy from the relation between independent variables, number of independent variables and sample size.

    PubMed

    Kanık, Emine Arzu; Temel, Gülhan Orekici; Erdoğan, Semra; Kaya, Irem Ersöz

    2013-03-01

    The aim of study is to introduce method of Soft Independent Modeling of Class Analogy (SIMCA), and to express whether the method is affected from the number of independent variables, the relationship between variables and sample size. Simulation study. SIMCA model is performed in two stages. In order to determine whether the method is influenced by the number of independent variables, the relationship between variables and sample size, simulations were done. Conditions in which sample sizes in both groups are equal, and where there are 30, 100 and 1000 samples; where the number of variables is 2, 3, 5, 10, 50 and 100; moreover where the relationship between variables are quite high, in medium level and quite low were mentioned. Average classification accuracy of simulation results which were carried out 1000 times for each possible condition of trial plan were given as tables. It is seen that diagnostic accuracy results increase as the number of independent variables increase. SIMCA method is a method in which the relationship between variables are quite high, the number of independent variables are many in number and where there are outlier values in the data that can be used in conditions having outlier values.

  20. Binary recursive partitioning: background, methods, and application to psychology.

    PubMed

    Merkle, Edgar C; Shaffer, Victoria A

    2011-02-01

    Binary recursive partitioning (BRP) is a computationally intensive statistical method that can be used in situations where linear models are often used. Instead of imposing many assumptions to arrive at a tractable statistical model, BRP simply seeks to accurately predict a response variable based on values of predictor variables. The method outputs a decision tree depicting the predictor variables that were related to the response variable, along with the nature of the variables' relationships. No significance tests are involved, and the tree's 'goodness' is judged based on its predictive accuracy. In this paper, we describe BRP methods in a detailed manner and illustrate their use in psychological research. We also provide R code for carrying out the methods.

  1. A non-linear data mining parameter selection algorithm for continuous variables

    PubMed Central

    Razavi, Marianne; Brady, Sean

    2017-01-01

    In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables. PMID:29131829

  2. Variable-intercept panel model for deformation zoning of a super-high arch dam.

    PubMed

    Shi, Zhongwen; Gu, Chongshi; Qin, Dong

    2016-01-01

    This study determines dam deformation similarity indexes based on an analysis of deformation zoning features and panel data clustering theory, with comprehensive consideration to the actual deformation law of super-high arch dams and the spatial-temporal features of dam deformation. Measurement methods of these indexes are studied. Based on the established deformation similarity criteria, the principle used to determine the number of dam deformation zones is constructed through entropy weight method. This study proposes the deformation zoning method for super-high arch dams and the implementation steps, analyzes the effect of special influencing factors of different dam zones on the deformation, introduces dummy variables that represent the special effect of dam deformation, and establishes a variable-intercept panel model for deformation zoning of super-high arch dams. Based on different patterns of the special effect in the variable-intercept panel model, two panel analysis models were established to monitor fixed and random effects of dam deformation. Hausman test method of model selection and model effectiveness assessment method are discussed. Finally, the effectiveness of established models is verified through a case study.

  3. A Geometric Method for Model Reduction of Biochemical Networks with Polynomial Rate Functions.

    PubMed

    Samal, Satya Swarup; Grigoriev, Dima; Fröhlich, Holger; Weber, Andreas; Radulescu, Ovidiu

    2015-12-01

    Model reduction of biochemical networks relies on the knowledge of slow and fast variables. We provide a geometric method, based on the Newton polytope, to identify slow variables of a biochemical network with polynomial rate functions. The gist of the method is the notion of tropical equilibration that provides approximate descriptions of slow invariant manifolds. Compared to extant numerical algorithms such as the intrinsic low-dimensional manifold method, our approach is symbolic and utilizes orders of magnitude instead of precise values of the model parameters. Application of this method to a large collection of biochemical network models supports the idea that the number of dynamical variables in minimal models of cell physiology can be small, in spite of the large number of molecular regulatory actors.

  4. Input variable selection and calibration data selection for storm water quality regression models.

    PubMed

    Sun, Siao; Bertrand-Krajewski, Jean-Luc

    2013-01-01

    Storm water quality models are useful tools in storm water management. Interest has been growing in analyzing existing data for developing models for urban storm water quality evaluations. It is important to select appropriate model inputs when many candidate explanatory variables are available. Model calibration and verification are essential steps in any storm water quality modeling. This study investigates input variable selection and calibration data selection in storm water quality regression models. The two selection problems are mutually interacted. A procedure is developed in order to fulfil the two selection tasks in order. The procedure firstly selects model input variables using a cross validation method. An appropriate number of variables are identified as model inputs to ensure that a model is neither overfitted nor underfitted. Based on the model input selection results, calibration data selection is studied. Uncertainty of model performances due to calibration data selection is investigated with a random selection method. An approach using the cluster method is applied in order to enhance model calibration practice based on the principle of selecting representative data for calibration. The comparison between results from the cluster selection method and random selection shows that the former can significantly improve performances of calibrated models. It is found that the information content in calibration data is important in addition to the size of calibration data.

  5. Bayesian Techniques for Comparing Time-dependent GRMHD Simulations to Variable Event Horizon Telescope Observations

    NASA Astrophysics Data System (ADS)

    Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios

    2016-12-01

    The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.

  6. BAYESIAN TECHNIQUES FOR COMPARING TIME-DEPENDENT GRMHD SIMULATIONS TO VARIABLE EVENT HORIZON TELESCOPE OBSERVATIONS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan

    2016-12-01

    The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore themore » robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.« less

  7. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    USGS Publications Warehouse

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.

  8. On the upscaling of process-based models in deltaic applications

    NASA Astrophysics Data System (ADS)

    Li, L.; Storms, J. E. A.; Walstra, D. J. R.

    2018-03-01

    Process-based numerical models are increasingly used to study the evolution of marine and terrestrial depositional environments. Whilst a detailed description of small-scale processes provides an accurate representation of reality, application on geological timescales is restrained by the associated increase in computational time. In order to reduce the computational time, a number of acceleration methods are combined and evaluated for a schematic supply-driven delta (static base level) and an accommodation-driven delta (variable base level). The performance of the combined acceleration methods is evaluated by comparing the morphological indicators such as distributary channel networking and delta volumes derived from the model predictions for various levels of acceleration. The results of the accelerated models are compared to the outcomes from a series of simulations to capture autogenic variability. Autogenic variability is quantified by re-running identical models on an initial bathymetry with 1 cm added noise. The overall results show that the variability of the accelerated models fall within the autogenic variability range, suggesting that the application of acceleration methods does not significantly affect the simulated delta evolution. The Time-scale compression method (the acceleration method introduced in this paper) results in an increased computational efficiency of 75% without adversely affecting the simulated delta evolution compared to a base case. The combination of the Time-scale compression method with the existing acceleration methods has the potential to extend the application range of process-based models towards geologic timescales.

  9. A Framework for Multifaceted Evaluation of Student Models

    ERIC Educational Resources Information Center

    Huang, Yun; González-Brenes, José P.; Kumar, Rohit; Brusilovsky, Peter

    2015-01-01

    Latent variable models, such as the popular Knowledge Tracing method, are often used to enable adaptive tutoring systems to personalize education. However, finding optimal model parameters is usually a difficult non-convex optimization problem when considering latent variable models. Prior work has reported that latent variable models obtained…

  10. Analytic uncertainty and sensitivity analysis of models with input correlations

    NASA Astrophysics Data System (ADS)

    Zhu, Yueying; Wang, Qiuping A.; Li, Wei; Cai, Xu

    2018-03-01

    Probabilistic uncertainty analysis is a common means of evaluating mathematical models. In mathematical modeling, the uncertainty in input variables is specified through distribution laws. Its contribution to the uncertainty in model response is usually analyzed by assuming that input variables are independent of each other. However, correlated parameters are often happened in practical applications. In the present paper, an analytic method is built for the uncertainty and sensitivity analysis of models in the presence of input correlations. With the method, it is straightforward to identify the importance of the independence and correlations of input variables in determining the model response. This allows one to decide whether or not the input correlations should be considered in practice. Numerical examples suggest the effectiveness and validation of our analytic method in the analysis of general models. A practical application of the method is also proposed to the uncertainty and sensitivity analysis of a deterministic HIV model.

  11. Datamining approaches for modeling tumor control probability.

    PubMed

    Naqa, Issam El; Deasy, Joseph O; Mu, Yi; Huang, Ellen; Hope, Andrew J; Lindsay, Patricia E; Apte, Aditya; Alaly, James; Bradley, Jeffrey D

    2010-11-01

    Tumor control probability (TCP) to radiotherapy is determined by complex interactions between tumor biology, tumor microenvironment, radiation dosimetry, and patient-related variables. The complexity of these heterogeneous variable interactions constitutes a challenge for building predictive models for routine clinical practice. We describe a datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively. Several datamining approaches are discussed that include dose-volume metrics, equivalent uniform dose, mechanistic Poisson model, and model building methods using statistical regression and machine learning techniques. Institutional datasets of non-small cell lung cancer (NSCLC) patients are used to demonstrate these methods. The performance of the different methods was evaluated using bivariate Spearman rank correlations (rs). Over-fitting was controlled via resampling methods. Using a dataset of 56 patients with primary NCSLC tumors and 23 candidate variables, we estimated GTV volume and V75 to be the best model parameters for predicting TCP using statistical resampling and a logistic model. Using these variables, the support vector machine (SVM) kernel method provided superior performance for TCP prediction with an rs=0.68 on leave-one-out testing compared to logistic regression (rs=0.4), Poisson-based TCP (rs=0.33), and cell kill equivalent uniform dose model (rs=0.17). The prediction of treatment response can be improved by utilizing datamining approaches, which are able to unravel important non-linear complex interactions among model variables and have the capacity to predict on unseen data for prospective clinical applications.

  12. Variable pixel size ionospheric tomography

    NASA Astrophysics Data System (ADS)

    Zheng, Dunyong; Zheng, Hongwei; Wang, Yanjun; Nie, Wenfeng; Li, Chaokui; Ao, Minsi; Hu, Wusheng; Zhou, Wei

    2017-06-01

    A novel ionospheric tomography technique based on variable pixel size was developed for the tomographic reconstruction of the ionospheric electron density (IED) distribution. In variable pixel size computerized ionospheric tomography (VPSCIT) model, the IED distribution is parameterized by a decomposition of the lower and upper ionosphere with different pixel sizes. Thus, the lower and upper IED distribution may be very differently determined by the available data. The variable pixel size ionospheric tomography and constant pixel size tomography are similar in most other aspects. There are some differences between two kinds of models with constant and variable pixel size respectively, one is that the segments of GPS signal pay should be assigned to the different kinds of pixel in inversion; the other is smoothness constraint factor need to make the appropriate modified where the pixel change in size. For a real dataset, the variable pixel size method distinguishes different electron density distribution zones better than the constant pixel size method. Furthermore, it can be non-chided that when the effort is spent to identify the regions in a model with best data coverage. The variable pixel size method can not only greatly improve the efficiency of inversion, but also produce IED images with high fidelity which are the same as a used uniform pixel size method. In addition, variable pixel size tomography can reduce the underdetermined problem in an ill-posed inverse problem when the data coverage is irregular or less by adjusting quantitative proportion of pixels with different sizes. In comparison with constant pixel size tomography models, the variable pixel size ionospheric tomography technique achieved relatively good results in a numerical simulation. A careful validation of the reliability and superiority of variable pixel size ionospheric tomography was performed. Finally, according to the results of the statistical analysis and quantitative comparison, the proposed method offers an improvement of 8% compared with conventional constant pixel size tomography models in the forward modeling.

  13. Person Re-Identification via Distance Metric Learning With Latent Variables.

    PubMed

    Sun, Chong; Wang, Dong; Lu, Huchuan

    2017-01-01

    In this paper, we propose an effective person re-identification method with latent variables, which represents a pedestrian as the mixture of a holistic model and a number of flexible models. Three types of latent variables are introduced to model uncertain factors in the re-identification problem, including vertical misalignments, horizontal misalignments and leg posture variations. The distance between two pedestrians can be determined by minimizing a given distance function with respect to latent variables, and then be used to conduct the re-identification task. In addition, we develop a latent metric learning method for learning the effective metric matrix, which can be solved via an iterative manner: once latent information is specified, the metric matrix can be obtained based on some typical metric learning methods; with the computed metric matrix, the latent variables can be determined by searching the state space exhaustively. Finally, extensive experiments are conducted on seven databases to evaluate the proposed method. The experimental results demonstrate that our method achieves better performance than other competing algorithms.

  14. Predictive models for Escherichia coli concentrations at inland lake beaches and relationship of model variables to pathogen detection

    EPA Science Inventory

    Methods are needed improve the timeliness and accuracy of recreational water‐quality assessments. Traditional culture methods require 18–24 h to obtain results and may not reflect current conditions. Predictive models, based on environmental and water quality variables, have been...

  15. Variable selection in subdistribution hazard frailty models with competing risks data

    PubMed Central

    Do Ha, Il; Lee, Minjung; Oh, Seungyoung; Jeong, Jong-Hyeon; Sylvester, Richard; Lee, Youngjo

    2014-01-01

    The proportional subdistribution hazards model (i.e. Fine-Gray model) has been widely used for analyzing univariate competing risks data. Recently, this model has been extended to clustered competing risks data via frailty. To the best of our knowledge, however, there has been no literature on variable selection method for such competing risks frailty models. In this paper, we propose a simple but unified procedure via a penalized h-likelihood (HL) for variable selection of fixed effects in a general class of subdistribution hazard frailty models, in which random effects may be shared or correlated. We consider three penalty functions (LASSO, SCAD and HL) in our variable selection procedure. We show that the proposed method can be easily implemented using a slight modification to existing h-likelihood estimation approaches. Numerical studies demonstrate that the proposed procedure using the HL penalty performs well, providing a higher probability of choosing the true model than LASSO and SCAD methods without losing prediction accuracy. The usefulness of the new method is illustrated using two actual data sets from multi-center clinical trials. PMID:25042872

  16. The relationship between venture capital investment and macro economic variables via statistical computation method

    NASA Astrophysics Data System (ADS)

    Aygunes, Gunes

    2017-07-01

    The objective of this paper is to survey and determine the macroeconomic factors affecting the level of venture capital (VC) investments in a country. The literary depends on venture capitalists' quality and countries' venture capital investments. The aim of this paper is to give relationship between venture capital investment and macro economic variables via statistical computation method. We investigate the countries and macro economic variables. By using statistical computation method, we derive correlation between venture capital investments and macro economic variables. According to method of logistic regression model (logit regression or logit model), macro economic variables are correlated with each other in three group. Venture capitalists regard correlations as a indicator. Finally, we give correlation matrix of our results.

  17. Rebuilding DEMATEL threshold value: an example of a food and beverage information system.

    PubMed

    Hsieh, Yi-Fang; Lee, Yu-Cheng; Lin, Shao-Bin

    2016-01-01

    This study demonstrates how a decision-making trial and evaluation laboratory (DEMATEL) threshold value can be quickly and reasonably determined in the process of combining DEMATEL and decomposed theory of planned behavior (DTPB) models. Models are combined to identify the key factors of a complex problem. This paper presents a case study of a food and beverage information system as an example. The analysis of the example indicates that, given direct and indirect relationships among variables, if a traditional DTPB model only simulates the effects of the variables without considering that the variables will affect the original cause-and-effect relationships among the variables, then the original DTPB model variables cannot represent a complete relationship. For the food and beverage example, a DEMATEL method was employed to reconstruct a DTPB model and, more importantly, to calculate reasonable DEMATEL threshold value for determining additional relationships of variables in the original DTPB model. This study is method-oriented, and the depth of investigation into any individual case is limited. Therefore, the methods proposed in various fields of study should ideally be used to identify deeper and more practical implications.

  18. Bayesian dynamical systems modelling in the social sciences.

    PubMed

    Ranganathan, Shyam; Spaiser, Viktoria; Mann, Richard P; Sumpter, David J T

    2014-01-01

    Data arising from social systems is often highly complex, involving non-linear relationships between the macro-level variables that characterize these systems. We present a method for analyzing this type of longitudinal or panel data using differential equations. We identify the best non-linear functions that capture interactions between variables, employing Bayes factor to decide how many interaction terms should be included in the model. This method punishes overly complicated models and identifies models with the most explanatory power. We illustrate our approach on the classic example of relating democracy and economic growth, identifying non-linear relationships between these two variables. We show how multiple variables and variable lags can be accounted for and provide a toolbox in R to implement our approach.

  19. Understanding the context of healthcare utilization: assessing environmental and provider-related variables in the behavioral model of utilization.

    PubMed Central

    Phillips, K A; Morrison, K R; Andersen, R; Aday, L A

    1998-01-01

    OBJECTIVE: The behavioral model of utilization, developed by Andersen, Aday, and others, is one of the most frequently used frameworks for analyzing the factors that are associated with patient utilization of healthcare services. However, the use of the model for examining the context within which utilization occurs-the role of the environment and provider-related factors-has been largely neglected. OBJECTIVE: To conduct a systematic review and analysis to determine if studies of medical care utilization that have used the behavioral model during the last 20 years have included environmental and provider-related variables and the methods used to analyze these variables. We discuss barriers to the use of these contextual variables and potential solutions. DATA SOURCES: The Social Science Citation Index and Science Citation Index. We included all articles from 1975-1995 that cited any of three key articles on the behavioral model, that included all articles that were empirical analyses and studies of formal medical care utilization, and articles that specifically stated their use of the behavioral model (n = 139). STUDY DESIGN: Design was a systematic literature review. DATA ANALYSIS: We used a structured review process to code articles on whether they included contextual variables: (1) environmental variables (characteristics of the healthcare delivery system, external environment, and community-level enabling factors); and (2) provider-related variables (patient factors that may be influenced by providers and provider characteristics that interact with patient characteristics to influence utilization). We also examined the methods used in studies that included contextual variables. PRINCIPAL FINDINGS: Forty-five percent of the studies included environmental variables and 51 percent included provider-related variables. Few studies examined specific measures of the healthcare system or provider characteristics or used methods other than simple regression analysis with hierarchical entry of variables. Only 14 percent of studies analyzed the context of healthcare by including both environmental and provider-related variables as well as using relevant methods. CONCLUSIONS: By assessing whether and how contextual variables are used, we are able to highlight the contributions made by studies using these approaches, to identify variables and methods that have been relatively underused, and to suggest solutions to barriers in using contextual variables. PMID:9685123

  20. Modeling the human development index and the percentage of poor people using quantile smoothing splines

    NASA Astrophysics Data System (ADS)

    Mulyani, Sri; Andriyana, Yudhie; Sudartianto

    2017-03-01

    Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.

  1. Input variable selection for data-driven models of Coriolis flowmeters for two-phase flow measurement

    NASA Astrophysics Data System (ADS)

    Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao

    2017-03-01

    Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction.

  2. Experimental verification of a real-time tuning method of a model-based controller by perturbations to its poles

    NASA Astrophysics Data System (ADS)

    Kajiwara, Itsuro; Furuya, Keiichiro; Ishizuka, Shinichi

    2018-07-01

    Model-based controllers with adaptive design variables are often used to control an object with time-dependent characteristics. However, the controller's performance is influenced by many factors such as modeling accuracy and fluctuations in the object's characteristics. One method to overcome these negative factors is to tune model-based controllers. Herein we propose an online tuning method to maintain control performance for an object that exhibits time-dependent variations. The proposed method employs the poles of the controller as design variables because the poles significantly impact performance. Specifically, we use the simultaneous perturbation stochastic approximation (SPSA) to optimize a model-based controller with multiple design variables. Moreover, a vibration control experiment of an object with time-dependent characteristics as the temperature is varied demonstrates that the proposed method allows adaptive control and stably maintains the closed-loop characteristics.

  3. Review of Factors, Methods, and Outcome Definition in Designing Opioid Abuse Predictive Models.

    PubMed

    Alzeer, Abdullah H; Jones, Josette; Bair, Matthew J

    2018-05-01

    Several opioid risk assessment tools are available to prescribers to evaluate opioid analgesic abuse among chronic patients. The objectives of this study are to 1) identify variables available in the literature to predict opioid abuse; 2) explore and compare methods (population, database, and analysis) used to develop statistical models that predict opioid abuse; and 3) understand how outcomes were defined in each statistical model predicting opioid abuse. The OVID database was searched for this study. The search was limited to articles written in English and published from January 1990 to April 2016. This search generated 1,409 articles. Only seven studies and nine models met our inclusion-exclusion criteria. We found nine models and identified 75 distinct variables. Three studies used administrative claims data, and four studies used electronic health record data. The majority, four out of seven articles (six out of nine models), were primarily dependent on the presence or absence of opioid abuse or dependence (ICD-9 diagnosis code) to define opioid abuse. However, two articles used a predefined list of opioid-related aberrant behaviors. We identified variables used to predict opioid abuse from electronic health records and administrative data. Medication variables are the recurrent variables in the articles reviewed (33 variables). Age and gender are the most consistent demographic variables in predicting opioid abuse. Overall, there is similarity in the sampling method and inclusion/exclusion criteria (age, number of prescriptions, follow-up period, and data analysis methods). Intuitive research to utilize unstructured data may increase opioid abuse models' accuracy.

  4. A Comparison of Methods for Estimating Quadratic Effects in Nonlinear Structural Equation Models

    ERIC Educational Resources Information Center

    Harring, Jeffrey R.; Weiss, Brandi A.; Hsu, Jui-Chen

    2012-01-01

    Two Monte Carlo simulations were performed to compare methods for estimating and testing hypotheses of quadratic effects in latent variable regression models. The methods considered in the current study were (a) a 2-stage moderated regression approach using latent variable scores, (b) an unconstrained product indicator approach, (c) a latent…

  5. Stochastic Time Models of Syllable Structure

    PubMed Central

    Shaw, Jason A.; Gafos, Adamantios I.

    2015-01-01

    Drawing on phonology research within the generative linguistics tradition, stochastic methods, and notions from complex systems, we develop a modelling paradigm linking phonological structure, expressed in terms of syllables, to speech movement data acquired with 3D electromagnetic articulography and X-ray microbeam methods. The essential variable in the models is syllable structure. When mapped to discrete coordination topologies, syllabic organization imposes systematic patterns of variability on the temporal dynamics of speech articulation. We simulated these dynamics under different syllabic parses and evaluated simulations against experimental data from Arabic and English, two languages claimed to parse similar strings of segments into different syllabic structures. Model simulations replicated several key experimental results, including the fallibility of past phonetic heuristics for syllable structure, and exposed the range of conditions under which such heuristics remain valid. More importantly, the modelling approach consistently diagnosed syllable structure proving resilient to multiple sources of variability in experimental data including measurement variability, speaker variability, and contextual variability. Prospects for extensions of our modelling paradigm to acoustic data are also discussed. PMID:25996153

  6. The intermediate endpoint effect in logistic and probit regression

    PubMed Central

    MacKinnon, DP; Lockwood, CM; Brown, CH; Wang, W; Hoffman, JM

    2010-01-01

    Background An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used. Purpose The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results. Methods The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods. Results Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression. Limitations More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models. Conclusions Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted conclusions regarding the intermediate effect. PMID:17942466

  7. Variable selection in discrete survival models including heterogeneity.

    PubMed

    Groll, Andreas; Tutz, Gerhard

    2017-04-01

    Several variable selection procedures are available for continuous time-to-event data. However, if time is measured in a discrete way and therefore many ties occur models for continuous time are inadequate. We propose penalized likelihood methods that perform efficient variable selection in discrete survival modeling with explicit modeling of the heterogeneity in the population. The method is based on a combination of ridge and lasso type penalties that are tailored to the case of discrete survival. The performance is studied in simulation studies and an application to the birth of the first child.

  8. Variable selection based near infrared spectroscopy quantitative and qualitative analysis on wheat wet gluten

    NASA Astrophysics Data System (ADS)

    Lü, Chengxu; Jiang, Xunpeng; Zhou, Xingfan; Zhang, Yinqiao; Zhang, Naiqian; Wei, Chongfeng; Mao, Wenhua

    2017-10-01

    Wet gluten is a useful quality indicator for wheat, and short wave near infrared spectroscopy (NIRS) is a high performance technique with the advantage of economic rapid and nondestructive test. To study the feasibility of short wave NIRS analyzing wet gluten directly from wheat seed, 54 representative wheat seed samples were collected and scanned by spectrometer. 8 spectral pretreatment method and genetic algorithm (GA) variable selection method were used to optimize analysis. Both quantitative and qualitative model of wet gluten were built by partial least squares regression and discriminate analysis. For quantitative analysis, normalization is the optimized pretreatment method, 17 wet gluten sensitive variables are selected by GA, and GA model performs a better result than that of all variable model, with R2V=0.88, and RMSEV=1.47. For qualitative analysis, automatic weighted least squares baseline is the optimized pretreatment method, all variable models perform better results than those of GA models. The correct classification rates of 3 class of <24%, 24-30%, >30% wet gluten content are 95.45, 84.52, and 90.00%, respectively. The short wave NIRS technique shows potential for both quantitative and qualitative analysis of wet gluten for wheat seed.

  9. Impacts of correcting the inter-variable correlation of climate model outputs on hydrological modeling

    NASA Astrophysics Data System (ADS)

    Chen, Jie; Li, Chao; Brissette, François P.; Chen, Hua; Wang, Mingna; Essou, Gilles R. C.

    2018-05-01

    Bias correction is usually implemented prior to using climate model outputs for impact studies. However, bias correction methods that are commonly used treat climate variables independently and often ignore inter-variable dependencies. The effects of ignoring such dependencies on impact studies need to be investigated. This study aims to assess the impacts of correcting the inter-variable correlation of climate model outputs on hydrological modeling. To this end, a joint bias correction (JBC) method which corrects the joint distribution of two variables as a whole is compared with an independent bias correction (IBC) method; this is considered in terms of correcting simulations of precipitation and temperature from 26 climate models for hydrological modeling over 12 watersheds located in various climate regimes. The results show that the simulated precipitation and temperature are considerably biased not only in the individual distributions, but also in their correlations, which in turn result in biased hydrological simulations. In addition to reducing the biases of the individual characteristics of precipitation and temperature, the JBC method can also reduce the bias in precipitation-temperature (P-T) correlations. In terms of hydrological modeling, the JBC method performs significantly better than the IBC method for 11 out of the 12 watersheds over the calibration period. For the validation period, the advantages of the JBC method are greatly reduced as the performance becomes dependent on the watershed, GCM and hydrological metric considered. For arid/tropical and snowfall-rainfall-mixed watersheds, JBC performs better than IBC. For snowfall- or rainfall-dominated watersheds, however, the two methods behave similarly, with IBC performing somewhat better than JBC. Overall, the results emphasize the advantages of correcting the P-T correlation when using climate model-simulated precipitation and temperature to assess the impact of climate change on watershed hydrology. However, a thorough validation and a comparison with other methods are recommended before using the JBC method, since it may perform worse than the IBC method for some cases due to bias nonstationarity of climate model outputs.

  10. Examining Parallelism of Sets of Psychometric Measures Using Latent Variable Modeling

    ERIC Educational Resources Information Center

    Raykov, Tenko; Patelis, Thanos; Marcoulides, George A.

    2011-01-01

    A latent variable modeling approach that can be used to examine whether several psychometric tests are parallel is discussed. The method consists of sequentially testing the properties of parallel measures via a corresponding relaxation of parameter constraints in a saturated model or an appropriately constructed latent variable model. The…

  11. Variable Selection for Regression Models of Percentile Flows

    NASA Astrophysics Data System (ADS)

    Fouad, G.

    2017-12-01

    Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high degree of multicollinearity, possibly illustrating the co-evolution of climatic and physiographic conditions. Given the ineffectiveness of many variables used here, future work should develop new variables that target specific processes associated with percentile flows.

  12. Firmness prediction in Prunus persica 'Calrico' peaches by visible/short-wave near infrared spectroscopy and acoustic measurements using optimised linear and non-linear chemometric models.

    PubMed

    Lafuente, Victoria; Herrera, Luis J; Pérez, María del Mar; Val, Jesús; Negueruela, Ignacio

    2015-08-15

    In this work, near infrared spectroscopy (NIR) and an acoustic measure (AWETA) (two non-destructive methods) were applied in Prunus persica fruit 'Calrico' (n = 260) to predict Magness-Taylor (MT) firmness. Separate and combined use of these measures was evaluated and compared using partial least squares (PLS) and least squares support vector machine (LS-SVM) regression methods. Also, a mutual-information-based variable selection method, seeking to find the most significant variables to produce optimal accuracy of the regression models, was applied to a joint set of variables (NIR wavelengths and AWETA measure). The newly proposed combined NIR-AWETA model gave good values of the determination coefficient (R(2)) for PLS and LS-SVM methods (0.77 and 0.78, respectively), improving the reliability of MT firmness prediction in comparison with separate NIR and AWETA predictions. The three variables selected by the variable selection method (AWETA measure plus NIR wavelengths 675 and 697 nm) achieved R(2) values 0.76 and 0.77, PLS and LS-SVM. These results indicated that the proposed mutual-information-based variable selection algorithm was a powerful tool for the selection of the most relevant variables. © 2014 Society of Chemical Industry.

  13. [Correlation coefficient-based classification method of hydrological dependence variability: With auto-regression model as example].

    PubMed

    Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi

    2018-04-01

    Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.

  14. Attributing runoff changes to climate variability and human activities: uncertainty analysis using four monthly water balance models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Shuai; Xiong, Lihua; Li, Hong-Yi

    2015-05-26

    Hydrological simulations to delineate the impacts of climate variability and human activities are subjected to uncertainties related to both parameter and structure of the hydrological models. To analyze the impact of these uncertainties on the model performance and to yield more reliable simulation results, a global calibration and multimodel combination method that integrates the Shuffled Complex Evolution Metropolis (SCEM) and Bayesian Model Averaging (BMA) of four monthly water balance models was proposed. The method was applied to the Weihe River Basin (WRB), the largest tributary of the Yellow River, to determine the contribution of climate variability and human activities tomore » runoff changes. The change point, which was used to determine the baseline period (1956-1990) and human-impacted period (1991-2009), was derived using both cumulative curve and Pettitt’s test. Results show that the combination method from SCEM provides more skillful deterministic predictions than the best calibrated individual model, resulting in the smallest uncertainty interval of runoff changes attributed to climate variability and human activities. This combination methodology provides a practical and flexible tool for attribution of runoff changes to climate variability and human activities by hydrological models.« less

  15. An Integrated Optimization Design Method Based on Surrogate Modeling Applied to Diverging Duct Design

    NASA Astrophysics Data System (ADS)

    Hanan, Lu; Qiushi, Li; Shaobin, Li

    2016-12-01

    This paper presents an integrated optimization design method in which uniform design, response surface methodology and genetic algorithm are used in combination. In detail, uniform design is used to select the experimental sampling points in the experimental domain and the system performance is evaluated by means of computational fluid dynamics to construct a database. After that, response surface methodology is employed to generate a surrogate mathematical model relating the optimization objective and the design variables. Subsequently, genetic algorithm is adopted and applied to the surrogate model to acquire the optimal solution in the case of satisfying some constraints. The method has been applied to the optimization design of an axisymmetric diverging duct, dealing with three design variables including one qualitative variable and two quantitative variables. The method of modeling and optimization design performs well in improving the duct aerodynamic performance and can be also applied to wider fields of mechanical design and seen as a useful tool for engineering designers, by reducing the design time and computation consumption.

  16. Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

    NASA Astrophysics Data System (ADS)

    Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

    2008-04-01

    Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

  17. The Houdini Transformation: True, but Illusory.

    PubMed

    Bentler, Peter M; Molenaar, Peter C M

    2012-01-01

    Molenaar (2003, 2011) showed that a common factor model could be transformed into an equivalent model without factors, involving only observed variables and residual errors. He called this invertible transformation the Houdini transformation. His derivation involved concepts from time series and state space theory. This paper verifies the Houdini transformation on a general latent variable model using algebraic methods. The results show that the Houdini transformation is illusory, in the sense that the Houdini transformed model remains a latent variable model. Contrary to common knowledge, a model that is a path model with only observed variables and residual errors may, in fact, be a latent variable model.

  18. The Houdini Transformation: True, but Illusory

    PubMed Central

    Bentler, Peter M.; Molenaar, Peter C. M.

    2012-01-01

    Molenaar (2003, 2011) showed that a common factor model could be transformed into an equivalent model without factors, involving only observed variables and residual errors. He called this invertible transformation the Houdini transformation. His derivation involved concepts from time series and state space theory. This paper verifies the Houdini transformation on a general latent variable model using algebraic methods. The results show that the Houdini transformation is illusory, in the sense that the Houdini transformed model remains a latent variable model. Contrary to common knowledge, a model that is a path model with only observed variables and residual errors may, in fact, be a latent variable model. PMID:23180888

  19. Covariate Selection for Multilevel Models with Missing Data

    PubMed Central

    Marino, Miguel; Buxton, Orfeu M.; Li, Yi

    2017-01-01

    Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457

  20. A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks.

    PubMed

    Mujalli, Randa Oqab; de Oña, Juan

    2011-10-01

    This study describes a method for reducing the number of variables frequently considered in modeling the severity of traffic accidents. The method's efficiency is assessed by constructing Bayesian networks (BN). It is based on a two stage selection process. Several variable selection algorithms, commonly used in data mining, are applied in order to select subsets of variables. BNs are built using the selected subsets and their performance is compared with the original BN (with all the variables) using five indicators. The BNs that improve the indicators' values are further analyzed for identifying the most significant variables (accident type, age, atmospheric factors, gender, lighting, number of injured, and occupant involved). A new BN is built using these variables, where the results of the indicators indicate, in most of the cases, a statistically significant improvement with respect to the original BN. It is possible to reduce the number of variables used to model traffic accidents injury severity through BNs without reducing the performance of the model. The study provides the safety analysts a methodology that could be used to minimize the number of variables used in order to determine efficiently the injury severity of traffic accidents without reducing the performance of the model. Copyright © 2011 Elsevier Ltd. All rights reserved.

  1. Comparison of methods for the analysis of relatively simple mediation models.

    PubMed

    Rijnhart, Judith J M; Twisk, Jos W R; Chinapaw, Mai J M; de Boer, Michiel R; Heymans, Martijn W

    2017-09-01

    Statistical mediation analysis is an often used method in trials, to unravel the pathways underlying the effect of an intervention on a particular outcome variable. Throughout the years, several methods have been proposed, such as ordinary least square (OLS) regression, structural equation modeling (SEM), and the potential outcomes framework. Most applied researchers do not know that these methods are mathematically equivalent when applied to mediation models with a continuous mediator and outcome variable. Therefore, the aim of this paper was to demonstrate the similarities between OLS regression, SEM, and the potential outcomes framework in three mediation models: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction. Secondary data analysis of a randomized controlled trial that included 546 schoolchildren. In our data example, the mediator and outcome variable were both continuous. We compared the estimates of the total, direct and indirect effects, proportion mediated, and 95% confidence intervals (CIs) for the indirect effect across OLS regression, SEM, and the potential outcomes framework. OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates in the crude mediation model, the confounder-adjusted mediation model, and the mediation model with an interaction term for exposure-mediator interaction. Since OLS regression, SEM, and the potential outcomes framework yield the same results in three mediation models with a continuous mediator and outcome variable, researchers can continue using the method that is most convenient to them.

  2. A new adaptive estimation method of spacecraft thermal mathematical model with an ensemble Kalman filter

    NASA Astrophysics Data System (ADS)

    Akita, T.; Takaki, R.; Shima, E.

    2012-04-01

    An adaptive estimation method of spacecraft thermal mathematical model is presented. The method is based on the ensemble Kalman filter, which can effectively handle the nonlinearities contained in the thermal model. The state space equations of the thermal mathematical model is derived, where both temperature and uncertain thermal characteristic parameters are considered as the state variables. In the method, the thermal characteristic parameters are automatically estimated as the outputs of the filtered state variables, whereas, in the usual thermal model correlation, they are manually identified by experienced engineers using trial-and-error approach. A numerical experiment of a simple small satellite is provided to verify the effectiveness of the presented method.

  3. Systems and methods for modeling and analyzing networks

    DOEpatents

    Hill, Colin C; Church, Bruce W; McDonagh, Paul D; Khalil, Iya G; Neyarapally, Thomas A; Pitluk, Zachary W

    2013-10-29

    The systems and methods described herein utilize a probabilistic modeling framework for reverse engineering an ensemble of causal models, from data and then forward simulating the ensemble of models to analyze and predict the behavior of the network. In certain embodiments, the systems and methods described herein include data-driven techniques for developing causal models for biological networks. Causal network models include computational representations of the causal relationships between independent variables such as a compound of interest and dependent variables such as measured DNA alterations, changes in mRNA, protein, and metabolites to phenotypic readouts of efficacy and toxicity.

  4. Flexible and scalable methods for quantifying stochastic variability in the era of massive time-domain astronomical data sets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kelly, Brandon C.; Becker, Andrew C.; Sobolewska, Malgosia

    2014-06-10

    We present the use of continuous-time autoregressive moving average (CARMA) models as a method for estimating the variability features of a light curve, and in particular its power spectral density (PSD). CARMA models fully account for irregular sampling and measurement errors, making them valuable for quantifying variability, forecasting and interpolating light curves, and variability-based classification. We show that the PSD of a CARMA model can be expressed as a sum of Lorentzian functions, which makes them extremely flexible and able to model a broad range of PSDs. We present the likelihood function for light curves sampled from CARMA processes, placingmore » them on a statistically rigorous foundation, and we present a Bayesian method to infer the probability distribution of the PSD given the measured light curve. Because calculation of the likelihood function scales linearly with the number of data points, CARMA modeling scales to current and future massive time-domain data sets. We conclude by applying our CARMA modeling approach to light curves for an X-ray binary, two active galactic nuclei, a long-period variable star, and an RR Lyrae star in order to illustrate their use, applicability, and interpretation.« less

  5. General methods for sensitivity analysis of equilibrium dynamics in patch occupancy models

    USGS Publications Warehouse

    Miller, David A.W.

    2012-01-01

    Sensitivity analysis is a useful tool for the study of ecological models that has many potential applications for patch occupancy modeling. Drawing from the rich foundation of existing methods for Markov chain models, I demonstrate new methods for sensitivity analysis of the equilibrium state dynamics of occupancy models. Estimates from three previous studies are used to illustrate the utility of the sensitivity calculations: a joint occupancy model for a prey species, its predators, and habitat used by both; occurrence dynamics from a well-known metapopulation study of three butterfly species; and Golden Eagle occupancy and reproductive dynamics. I show how to deal efficiently with multistate models and how to calculate sensitivities involving derived state variables and lower-level parameters. In addition, I extend methods to incorporate environmental variation by allowing for spatial and temporal variability in transition probabilities. The approach used here is concise and general and can fully account for environmental variability in transition parameters. The methods can be used to improve inferences in occupancy studies by quantifying the effects of underlying parameters, aiding prediction of future system states, and identifying priorities for sampling effort.

  6. Soft sensor modeling based on variable partition ensemble method for nonlinear batch processes

    NASA Astrophysics Data System (ADS)

    Wang, Li; Chen, Xiangguang; Yang, Kai; Jin, Huaiping

    2017-01-01

    Batch processes are always characterized by nonlinear and system uncertain properties, therefore, the conventional single model may be ill-suited. A local learning strategy soft sensor based on variable partition ensemble method is developed for the quality prediction of nonlinear and non-Gaussian batch processes. A set of input variable sets are obtained by bootstrapping and PMI criterion. Then, multiple local GPR models are developed based on each local input variable set. When a new test data is coming, the posterior probability of each best performance local model is estimated based on Bayesian inference and used to combine these local GPR models to get the final prediction result. The proposed soft sensor is demonstrated by applying to an industrial fed-batch chlortetracycline fermentation process.

  7. A numerical solution for a variable-order reaction-diffusion model by using fractional derivatives with non-local and non-singular kernel

    NASA Astrophysics Data System (ADS)

    Coronel-Escamilla, A.; Gómez-Aguilar, J. F.; Torres, L.; Escobar-Jiménez, R. F.

    2018-02-01

    A reaction-diffusion system can be represented by the Gray-Scott model. The reaction-diffusion dynamic is described by a pair of time and space dependent Partial Differential Equations (PDEs). In this paper, a generalization of the Gray-Scott model by using variable-order fractional differential equations is proposed. The variable-orders were set as smooth functions bounded in (0 , 1 ] and, specifically, the Liouville-Caputo and the Atangana-Baleanu-Caputo fractional derivatives were used to express the time differentiation. In order to find a numerical solution of the proposed model, the finite difference method together with the Adams method were applied. The simulations results showed the chaotic behavior of the proposed model when different variable-orders are applied.

  8. The prediction of nonlinear dynamic loads on helicopters from flight variables using artificial neural networks

    NASA Technical Reports Server (NTRS)

    Cook, A. B.; Fuller, C. R.; O'Brien, W. F.; Cabell, R. H.

    1992-01-01

    A method of indirectly monitoring component loads through common flight variables is proposed which requires an accurate model of the underlying nonlinear relationships. An artificial neural network (ANN) model learns relationships through exposure to a database of flight variable records and corresponding load histories from an instrumented military helicopter undergoing standard maneuvers. The ANN model, utilizing eight standard flight variables as inputs, is trained to predict normalized time-varying mean and oscillatory loads on two critical components over a range of seven maneuvers. Both interpolative and extrapolative capabilities are demonstrated with agreement between predicted and measured loads on the order of 90 percent to 95 percent. This work justifies pursuing the ANN method of predicting loads from flight variables.

  9. Variable selection under multiple imputation using the bootstrap in a prognostic study

    PubMed Central

    Heymans, Martijn W; van Buuren, Stef; Knol, Dirk L; van Mechelen, Willem; de Vet, Henrica CW

    2007-01-01

    Background Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. Method In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. Results We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. Conclusion We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values. PMID:17629912

  10. Automatic variable selection method and a comparison for quantitative analysis in laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Duan, Fajie; Fu, Xiao; Jiang, Jiajia; Huang, Tingting; Ma, Ling; Zhang, Cong

    2018-05-01

    In this work, an automatic variable selection method for quantitative analysis of soil samples using laser-induced breakdown spectroscopy (LIBS) is proposed, which is based on full spectrum correction (FSC) and modified iterative predictor weighting-partial least squares (mIPW-PLS). The method features automatic selection without artificial processes. To illustrate the feasibility and effectiveness of the method, a comparison with genetic algorithm (GA) and successive projections algorithm (SPA) for different elements (copper, barium and chromium) detection in soil was implemented. The experimental results showed that all the three methods could accomplish variable selection effectively, among which FSC-mIPW-PLS required significantly shorter computation time (12 s approximately for 40,000 initial variables) than the others. Moreover, improved quantification models were got with variable selection approaches. The root mean square errors of prediction (RMSEP) of models utilizing the new method were 27.47 (copper), 37.15 (barium) and 39.70 (chromium) mg/kg, which showed comparable prediction effect with GA and SPA.

  11. Free variable selection QSPR study to predict 19F chemical shifts of some fluorinated organic compounds using Random Forest and RBF-PLS methods

    NASA Astrophysics Data System (ADS)

    Goudarzi, Nasser

    2016-04-01

    In this work, two new and powerful chemometrics methods are applied for the modeling and prediction of the 19F chemical shift values of some fluorinated organic compounds. The radial basis function-partial least square (RBF-PLS) and random forest (RF) are employed to construct the models to predict the 19F chemical shifts. In this study, we didn't used from any variable selection method and RF method can be used as variable selection and modeling technique. Effects of the important parameters affecting the ability of the RF prediction power such as the number of trees (nt) and the number of randomly selected variables to split each node (m) were investigated. The root-mean-square errors of prediction (RMSEP) for the training set and the prediction set for the RBF-PLS and RF models were 44.70, 23.86, 29.77, and 23.69, respectively. Also, the correlation coefficients of the prediction set for the RBF-PLS and RF models were 0.8684 and 0.9313, respectively. The results obtained reveal that the RF model can be used as a powerful chemometrics tool for the quantitative structure-property relationship (QSPR) studies.

  12. Expanding the occupational health methodology: A concatenated artificial neural network approach to model the burnout process in Chinese nurses.

    PubMed

    Ladstätter, Felix; Garrosa, Eva; Moreno-Jiménez, Bernardo; Ponsoda, Vicente; Reales Aviles, José Manuel; Dai, Junming

    2016-01-01

    Artificial neural networks are sophisticated modelling and prediction tools capable of extracting complex, non-linear relationships between predictor (input) and predicted (output) variables. This study explores this capacity by modelling non-linearities in the hardiness-modulated burnout process with a neural network. Specifically, two multi-layer feed-forward artificial neural networks are concatenated in an attempt to model the composite non-linear burnout process. Sensitivity analysis, a Monte Carlo-based global simulation technique, is then utilised to examine the first-order effects of the predictor variables on the burnout sub-dimensions and consequences. Results show that (1) this concatenated artificial neural network approach is feasible to model the burnout process, (2) sensitivity analysis is a prolific method to study the relative importance of predictor variables and (3) the relationships among variables involved in the development of burnout and its consequences are to different degrees non-linear. Many relationships among variables (e.g., stressors and strains) are not linear, yet researchers use linear methods such as Pearson correlation or linear regression to analyse these relationships. Artificial neural network analysis is an innovative method to analyse non-linear relationships and in combination with sensitivity analysis superior to linear methods.

  13. Variable selection in semiparametric cure models based on penalized likelihood, with application to breast cancer clinical trials.

    PubMed

    Liu, Xiang; Peng, Yingwei; Tu, Dongsheng; Liang, Hua

    2012-10-30

    Survival data with a sizable cure fraction are commonly encountered in cancer research. The semiparametric proportional hazards cure model has been recently used to analyze such data. As seen in the analysis of data from a breast cancer study, a variable selection approach is needed to identify important factors in predicting the cure status and risk of breast cancer recurrence. However, no specific variable selection method for the cure model is available. In this paper, we present a variable selection approach with penalized likelihood for the cure model. The estimation can be implemented easily by combining the computational methods for penalized logistic regression and the penalized Cox proportional hazards models with the expectation-maximization algorithm. We illustrate the proposed approach on data from a breast cancer study. We conducted Monte Carlo simulations to evaluate the performance of the proposed method. We used and compared different penalty functions in the simulation studies. Copyright © 2012 John Wiley & Sons, Ltd.

  14. Diversified models for portfolio selection based on uncertain semivariance

    NASA Astrophysics Data System (ADS)

    Chen, Lin; Peng, Jin; Zhang, Bo; Rosyida, Isnaini

    2017-02-01

    Since the financial markets are complex, sometimes the future security returns are represented mainly based on experts' estimations due to lack of historical data. This paper proposes a semivariance method for diversified portfolio selection, in which the security returns are given subjective to experts' estimations and depicted as uncertain variables. In the paper, three properties of the semivariance of uncertain variables are verified. Based on the concept of semivariance of uncertain variables, two types of mean-semivariance diversified models for uncertain portfolio selection are proposed. Since the models are complex, a hybrid intelligent algorithm which is based on 99-method and genetic algorithm is designed to solve the models. In this hybrid intelligent algorithm, 99-method is applied to compute the expected value and semivariance of uncertain variables, and genetic algorithm is employed to seek the best allocation plan for portfolio selection. At last, several numerical examples are presented to illustrate the modelling idea and the effectiveness of the algorithm.

  15. Selection of climate change scenario data for impact modelling.

    PubMed

    Sloth Madsen, M; Maule, C Fox; MacKellar, N; Olesen, J E; Christensen, J Hesselbjerg

    2012-01-01

    Impact models investigating climate change effects on food safety often need detailed climate data. The aim of this study was to select climate change projection data for selected crop phenology and mycotoxin impact models. Using the ENSEMBLES database of climate model output, this study illustrates how the projected climate change signal of important variables as temperature, precipitation and relative humidity depends on the choice of the climate model. Using climate change projections from at least two different climate models is recommended to account for model uncertainty. To make the climate projections suitable for impact analysis at the local scale a weather generator approach was adopted. As the weather generator did not treat all the necessary variables, an ad-hoc statistical method was developed to synthesise realistic values of missing variables. The method is presented in this paper, applied to relative humidity, but it could be adopted to other variables if needed.

  16. Estimating and Interpreting Latent Variable Interactions: A Tutorial for Applying the Latent Moderated Structural Equations Method

    ERIC Educational Resources Information Center

    Maslowsky, Julie; Jager, Justin; Hemken, Douglas

    2015-01-01

    Latent variables are common in psychological research. Research questions involving the interaction of two variables are likewise quite common. Methods for estimating and interpreting interactions between latent variables within a structural equation modeling framework have recently become available. The latent moderated structural equations (LMS)…

  17. Evaluation of internal noise methods for Hotelling observers

    NASA Astrophysics Data System (ADS)

    Zhang, Yani; Pham, Binh T.; Eckstein, Miguel P.

    2005-04-01

    Including internal noise in computer model observers to degrade model observer performance to human levels is a common method to allow for quantitatively comparisons of human and model performance. In this paper, we studied two different types of methods for injecting internal noise to Hotelling model observers. The first method adds internal noise to the output of the individual channels: a) Independent non-uniform channel noise, b) Independent uniform channel noise. The second method adds internal noise to the decision variable arising from the combination of channel responses: a) internal noise standard deviation proportional to decision variable's standard deviation due to the external noise, b) internal noise standard deviation proportional to decision variable's variance caused by the external noise. We tested the square window Hotelling observer (HO), channelized Hotelling observer (CHO), and Laguerre-Gauss Hotelling observer (LGHO). The studied task was detection of a filling defect of varying size/shape in one of four simulated arterial segment locations with real x-ray angiography backgrounds. Results show that the internal noise method that leads to the best prediction of human performance differs across the studied models observers. The CHO model best predicts human observer performance with the channel internal noise. The HO and LGHO best predict human observer performance with the decision variable internal noise. These results might help explain why previous studies have found different results on the ability of each Hotelling model to predict human performance. Finally, the present results might guide researchers with the choice of method to include internal noise into their Hotelling models.

  18. Group Comparisons in the Presence of Missing Data Using Latent Variable Modeling Techniques

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2010-01-01

    A latent variable modeling approach for examining population similarities and differences in observed variable relationship and mean indexes in incomplete data sets is discussed. The method is based on the full information maximum likelihood procedure of model fitting and parameter estimation. The procedure can be employed to test group identities…

  19. Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice

    ERIC Educational Resources Information Center

    Rapp, Kelly E.

    2012-01-01

    The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational attainment) are…

  20. Analysis and modeling of wafer-level process variability in 28 nm FD-SOI using split C-V measurements

    NASA Astrophysics Data System (ADS)

    Pradeep, Krishna; Poiroux, Thierry; Scheer, Patrick; Juge, André; Gouget, Gilles; Ghibaudo, Gérard

    2018-07-01

    This work details the analysis of wafer level global process variability in 28 nm FD-SOI using split C-V measurements. The proposed approach initially evaluates the native on wafer process variability using efficient extraction methods on split C-V measurements. The on-wafer threshold voltage (VT) variability is first studied and modeled using a simple analytical model. Then, a statistical model based on the Leti-UTSOI compact model is proposed to describe the total C-V variability in different bias conditions. This statistical model is finally used to study the contribution of each process parameter to the total C-V variability.

  1. Motivational antecedents to contraceptive method change following a pregnancy scare: a couple analysis.

    PubMed

    Miller, W B; Pasta, D J

    2001-01-01

    In this study we develop and then test a couple model of contraceptive method choice decision-making following a pregnancy scare. The central constructs in our model are satisfaction with one's current method and confidence in the use of it. Downstream in the decision sequence, satisfaction and confidence predict desires and intentions to change methods. Upstream they are predicted by childbearing motivations, contraceptive attitudes, and the residual effects of the couples' previous method decisions. We collected data from 175 mostly unmarried and racially/ethnically diverse couples who were seeking pregnancy tests. We used LISREL and its latent variable capacity to estimate a structural equation model of the couple decision-making sequence leading to a change (or not) in contraceptive method. Results confirm most elements in our model and demonstrate a number of important cross-partner effects. Almost one-half of the sample had positive pregnancy tests and the base model fitted to this subsample indicates less accuracy in partner perception and greater influence of the female partner on method change decision-making. The introduction of some hypothesis-generating exogenous variables to our base couple model, together with some unexpected findings for the contraceptive attitude variables, suggest interesting questions that require further exploration.

  2. Influential input classification in probabilistic multimedia models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maddalena, Randy L.; McKone, Thomas E.; Hsieh, Dennis P.H.

    1999-05-01

    Monte Carlo analysis is a statistical simulation method that is often used to assess and quantify the outcome variance in complex environmental fate and effects models. Total outcome variance of these models is a function of (1) the uncertainty and/or variability associated with each model input and (2) the sensitivity of the model outcome to changes in the inputs. To propagate variance through a model using Monte Carlo techniques, each variable must be assigned a probability distribution. The validity of these distributions directly influences the accuracy and reliability of the model outcome. To efficiently allocate resources for constructing distributions onemore » should first identify the most influential set of variables in the model. Although existing sensitivity and uncertainty analysis methods can provide a relative ranking of the importance of model inputs, they fail to identify the minimum set of stochastic inputs necessary to sufficiently characterize the outcome variance. In this paper, we describe and demonstrate a novel sensitivity/uncertainty analysis method for assessing the importance of each variable in a multimedia environmental fate model. Our analyses show that for a given scenario, a relatively small number of input variables influence the central tendency of the model and an even smaller set determines the shape of the outcome distribution. For each input, the level of influence depends on the scenario under consideration. This information is useful for developing site specific models and improving our understanding of the processes that have the greatest influence on the variance in outcomes from multimedia models.« less

  3. Vulnerability in Determining the Cost of Information System Project to Avoid Loses

    NASA Astrophysics Data System (ADS)

    Haryono, Kholid; Ikhsani, Zulfa Amalia

    2018-03-01

    Context: This study discusses the priority of cost required in software development projects. Objectives: To show the costing models, the variables involved, and how practitioners assess and decide the priorities of each variable. To strengthen the information, each variable also confirmed the risk if ignored. Method: The method is done by two approaches. First, systematic literature reviews to find the models and variables used to decide the cost of software development. Second, confirm and take judgments about the level of importance and risk of each variable to the software developer. Result: Obtained about 54 variables that appear on the 10 models discussed. The variables are categorized into 15 groups based on the similarity of meaning. Each group becomes a variable. Confirmation results with practitioners on the level of importance and risk. It shown there are two variables that are considered very important and high risk if ignored. That is duration and effort. Conclusion: The relationship of variable rates between the results of literature studies and confirmation of practitioners contributes to the use of software business actors in considering project cost variables.

  4. Permutation importance: a corrected feature importance measure.

    PubMed

    Altmann, André; Toloşi, Laura; Sander, Oliver; Lengauer, Thomas

    2010-05-15

    In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support vector machines and RandomForest (RF) models. Recently, it has been observed that RF models are biased in such a way that categorical variables with a large number of categories are preferred. In this work, we introduce a heuristic for normalizing feature importance measures that can correct the feature importance bias. The method is based on repeated permutations of the outcome vector for estimating the distribution of measured importance for each variable in a non-informative setting. The P-value of the observed importance provides a corrected measure of feature importance. We apply our method to simulated data and demonstrate that (i) non-informative predictors do not receive significant P-values, (ii) informative variables can successfully be recovered among non-informative variables and (iii) P-values computed with permutation importance (PIMP) are very helpful for deciding the significance of variables, and therefore improve model interpretability. Furthermore, PIMP was used to correct RF-based importance measures for two real-world case studies. We propose an improved RF model that uses the significant variables with respect to the PIMP measure and show that its prediction accuracy is superior to that of other existing models. R code for the method presented in this article is available at http://www.mpi-inf.mpg.de/ approximately altmann/download/PIMP.R CONTACT: altmann@mpi-inf.mpg.de, laura.tolosi@mpi-inf.mpg.de Supplementary data are available at Bioinformatics online.

  5. Novel harmonic regularization approach for variable selection in Cox's proportional hazards model.

    PubMed

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.

  6. Modelling Geomechanical Heterogeneity of Rock Masses Using Direct and Indirect Geostatistical Conditional Simulation Methods

    NASA Astrophysics Data System (ADS)

    Eivazy, Hesameddin; Esmaieli, Kamran; Jean, Raynald

    2017-12-01

    An accurate characterization and modelling of rock mass geomechanical heterogeneity can lead to more efficient mine planning and design. Using deterministic approaches and random field methods for modelling rock mass heterogeneity is known to be limited in simulating the spatial variation and spatial pattern of the geomechanical properties. Although the applications of geostatistical techniques have demonstrated improvements in modelling the heterogeneity of geomechanical properties, geostatistical estimation methods such as Kriging result in estimates of geomechanical variables that are not fully representative of field observations. This paper reports on the development of 3D models for spatial variability of rock mass geomechanical properties using geostatistical conditional simulation method based on sequential Gaussian simulation. A methodology to simulate the heterogeneity of rock mass quality based on the rock mass rating is proposed and applied to a large open-pit mine in Canada. Using geomechanical core logging data collected from the mine site, a direct and an indirect approach were used to model the spatial variability of rock mass quality. The results of the two modelling approaches were validated against collected field data. The study aims to quantify the risks of pit slope failure and provides a measure of uncertainties in spatial variability of rock mass properties in different areas of the pit.

  7. Multiple imputation by chained equations for systematically and sporadically missing multilevel data.

    PubMed

    Resche-Rigon, Matthieu; White, Ian R

    2018-06-01

    In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.

  8. Abstract: Inference and Interval Estimation for Indirect Effects With Latent Variable Models.

    PubMed

    Falk, Carl F; Biesanz, Jeremy C

    2011-11-30

    Models specifying indirect effects (or mediation) and structural equation modeling are both popular in the social sciences. Yet relatively little research has compared methods that test for indirect effects among latent variables and provided precise estimates of the effectiveness of different methods. This simulation study provides an extensive comparison of methods for constructing confidence intervals and for making inferences about indirect effects with latent variables. We compared the percentile (PC) bootstrap, bias-corrected (BC) bootstrap, bias-corrected accelerated (BC a ) bootstrap, likelihood-based confidence intervals (Neale & Miller, 1997), partial posterior predictive (Biesanz, Falk, and Savalei, 2010), and joint significance tests based on Wald tests or likelihood ratio tests. All models included three reflective latent variables representing the independent, dependent, and mediating variables. The design included the following fully crossed conditions: (a) sample size: 100, 200, and 500; (b) number of indicators per latent variable: 3 versus 5; (c) reliability per set of indicators: .7 versus .9; (d) and 16 different path combinations for the indirect effect (α = 0, .14, .39, or .59; and β = 0, .14, .39, or .59). Simulations were performed using a WestGrid cluster of 1680 3.06GHz Intel Xeon processors running R and OpenMx. Results based on 1,000 replications per cell and 2,000 resamples per bootstrap method indicated that the BC and BC a bootstrap methods have inflated Type I error rates. Likelihood-based confidence intervals and the PC bootstrap emerged as methods that adequately control Type I error and have good coverage rates.

  9. Accuracy of latent-variable estimation in Bayesian semi-supervised learning.

    PubMed

    Yamazaki, Keisuke

    2015-09-01

    Hierarchical probabilistic models, such as Gaussian mixture models, are widely used for unsupervised learning tasks. These models consist of observable and latent variables, which represent the observable data and the underlying data-generation process, respectively. Unsupervised learning tasks, such as cluster analysis, are regarded as estimations of latent variables based on the observable ones. The estimation of latent variables in semi-supervised learning, where some labels are observed, will be more precise than that in unsupervised, and one of the concerns is to clarify the effect of the labeled data. However, there has not been sufficient theoretical analysis of the accuracy of the estimation of latent variables. In a previous study, a distribution-based error function was formulated, and its asymptotic form was calculated for unsupervised learning with generative models. It has been shown that, for the estimation of latent variables, the Bayes method is more accurate than the maximum-likelihood method. The present paper reveals the asymptotic forms of the error function in Bayesian semi-supervised learning for both discriminative and generative models. The results show that the generative model, which uses all of the given data, performs better when the model is well specified. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Total sulfur determination in residues of crude oil distillation using FT-IR/ATR and variable selection methods

    NASA Astrophysics Data System (ADS)

    Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; Mello, Paola de Azevedo; Ferrão, Marco Flores; dos Santos, Maria de Fátima Pereira; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes

    2012-04-01

    Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm-1). This model produced a RMSECV of 400 mg kg-1 S and RMSEP of 420 mg kg-1 S, showing a correlation coefficient of 0.990.

  11. Comparing Indirect Effects in SEM: A Sequential Model Fitting Method Using Covariance-Equivalent Specifications

    ERIC Educational Resources Information Center

    Chan, Wai

    2007-01-01

    In social science research, an indirect effect occurs when the influence of an antecedent variable on the effect variable is mediated by an intervening variable. To compare indirect effects within a sample or across different samples, structural equation modeling (SEM) can be used if the computer program supports model fitting with nonlinear…

  12. Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have difficulties when non-informative variables (i.e., random noise) are included in the model. Furthermore, the distribution of the random noise greatly impacts the…

  13. Comparison Between Two Methods for Estimating the Vertical Scale of Fluctuation for Modeling Random Geotechnical Problems

    NASA Astrophysics Data System (ADS)

    Pieczyńska-Kozłowska, Joanna M.

    2015-12-01

    The design process in geotechnical engineering requires the most accurate mapping of soil. The difficulty lies in the spatial variability of soil parameters, which has been a site of investigation of many researches for many years. This study analyses the soil-modeling problem by suggesting two effective methods of acquiring information for modeling that consists of variability from cone penetration test (CPT). The first method has been used in geotechnical engineering, but the second one has not been associated with geotechnics so far. Both methods are applied to a case study in which the parameters of changes are estimated. The knowledge of the variability of parameters allows in a long term more effective estimation, for example, bearing capacity probability of failure.

  14. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    PubMed Central

    Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

    2014-01-01

    Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565

  15. A Priori Subgrid Scale Modeling for a Droplet Laden Temporal Mixing Layer

    NASA Technical Reports Server (NTRS)

    Okongo, Nora; Bellan, Josette

    2000-01-01

    Subgrid analysis of a transitional temporal mixing layer with evaporating droplets has been performed using a direct numerical simulation (DNS) database. The DNS is for a Reynolds number (based on initial vorticity thickness) of 600, with droplet mass loading of 0.2. The gas phase is computed using a Eulerian formulation, with Lagrangian droplet tracking. Since Large Eddy Simulation (LES) of this flow requires the computation of unfiltered gas-phase variables at droplet locations from filtered gas-phase variables at the grid points, it is proposed to model these by assuming the gas-phase variables to be given by the filtered variables plus a correction based on the filtered standard deviation, which can be computed from the sub-grid scale (SGS) standard deviation. This model predicts unfiltered variables at droplet locations better than simply interpolating the filtered variables. Three methods are investigated for modeling the SGS standard deviation: Smagorinsky, gradient and scale-similarity. When properly calibrated, the gradient and scale-similarity methods give results in excellent agreement with the DNS.

  16. Bayesian Normalization Model for Label-Free Quantitative Analysis by LC-MS

    PubMed Central

    Nezami Ranjbar, Mohammad R.; Tadesse, Mahlet G.; Wang, Yue; Ressom, Habtom W.

    2016-01-01

    We introduce a new method for normalization of data acquired by liquid chromatography coupled with mass spectrometry (LC-MS) in label-free differential expression analysis. Normalization of LC-MS data is desired prior to subsequent statistical analysis to adjust variabilities in ion intensities that are not caused by biological differences but experimental bias. There are different sources of bias including variabilities during sample collection and sample storage, poor experimental design, noise, etc. In addition, instrument variability in experiments involving a large number of LC-MS runs leads to a significant drift in intensity measurements. Although various methods have been proposed for normalization of LC-MS data, there is no universally applicable approach. In this paper, we propose a Bayesian normalization model (BNM) that utilizes scan-level information from LC-MS data. Specifically, the proposed method uses peak shapes to model the scan-level data acquired from extracted ion chromatograms (EIC) with parameters considered as a linear mixed effects model. We extended the model into BNM with drift (BNMD) to compensate for the variability in intensity measurements due to long LC-MS runs. We evaluated the performance of our method using synthetic and experimental data. In comparison with several existing methods, the proposed BNM and BNMD yielded significant improvement. PMID:26357332

  17. [Measurement of Water COD Based on UV-Vis Spectroscopy Technology].

    PubMed

    Wang, Xiao-ming; Zhang, Hai-liang; Luo, Wei; Liu, Xue-mei

    2016-01-01

    Ultraviolet/visible (UV/Vis) spectroscopy technology was used to measure water COD. A total of 135 water samples were collected from Zhejiang province. Raw spectra with 3 different pretreatment methods (Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV) and 1st Derivatives were compared to determine the optimal pretreatment method for analysis. Spectral variable selection is an important strategy in spectrum modeling analysis, because it tends to parsimonious data representation and can lead to multivariate models with better performance. In order to simply calibration models, the preprocessed spectra were then used to select sensitive wavelengths by competitive adaptive reweighted sampling (CARS), Random frog and Successive Genetic Algorithm (GA) methods. Different numbers of sensitive wavelengths were selected by different variable selection methods with SNV preprocessing method. Partial least squares (PLS) was used to build models with the full spectra, and Extreme Learning Machine (ELM) was applied to build models with the selected wavelength variables. The overall results showed that ELM model performed better than PLS model, and the ELM model with the selected wavelengths based on CARS obtained the best results with the determination coefficient (R2), RMSEP and RPD were 0.82, 14.48 and 2.34 for prediction set. The results indicated that it was feasible to use UV/Vis with characteristic wavelengths which were obtained by CARS variable selection method, combined with ELM calibration could apply for the rapid and accurate determination of COD in aquaculture water. Moreover, this study laid the foundation for further implementation of online analysis of aquaculture water and rapid determination of other water quality parameters.

  18. Variable selection for distribution-free models for longitudinal zero-inflated count responses.

    PubMed

    Chen, Tian; Wu, Pan; Tang, Wan; Zhang, Hui; Feng, Changyong; Kowalski, Jeanne; Tu, Xin M

    2016-07-20

    Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  19. Bayesian Group Bridge for Bi-level Variable Selection.

    PubMed

    Mallick, Himel; Yi, Nengjun

    2017-06-01

    A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.

  20. A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling.

    PubMed

    Deng, Bai-chuan; Yun, Yong-huan; Liang, Yi-zeng; Yi, Lun-zhao

    2014-10-07

    In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.

  1. An outline of graphical Markov models in dentistry.

    PubMed

    Helfenstein, U; Steiner, M; Menghini, G

    1999-12-01

    In the usual multiple regression model there is one response variable and one block of several explanatory variables. In contrast, in reality there may be a block of several possibly interacting response variables one would like to explain. In addition, the explanatory variables may split into a sequence of several blocks, each block containing several interacting variables. The variables in the second block are explained by those in the first block; the variables in the third block by those in the first and the second block etc. During recent years methods have been developed allowing analysis of problems where the data set has the above complex structure. The models involved are called graphical models or graphical Markov models. The main result of an analysis is a picture, a conditional independence graph with precise statistical meaning, consisting of circles representing variables and lines or arrows representing significant conditional associations. The absence of a line between two circles signifies that the corresponding two variables are independent conditional on the presence of other variables in the model. An example from epidemiology is presented in order to demonstrate application and use of the models. The data set in the example has a complex structure consisting of successive blocks: the variable in the first block is year of investigation; the variables in the second block are age and gender; the variables in the third block are indices of calculus, gingivitis and mutans streptococci and the final response variables in the fourth block are different indices of caries. Since the statistical methods may not be easily accessible to dentists, this article presents them in an introductory form. Graphical models may be of great value to dentists in allowing analysis and visualisation of complex structured multivariate data sets consisting of a sequence of blocks of interacting variables and, in particular, several possibly interacting responses in the final block.

  2. Method and system to estimate variables in an integrated gasification combined cycle (IGCC) plant

    DOEpatents

    Kumar, Aditya; Shi, Ruijie; Dokucu, Mustafa

    2013-09-17

    System and method to estimate variables in an integrated gasification combined cycle (IGCC) plant are provided. The system includes a sensor suite to measure respective plant input and output variables. An extended Kalman filter (EKF) receives sensed plant input variables and includes a dynamic model to generate a plurality of plant state estimates and a covariance matrix for the state estimates. A preemptive-constraining processor is configured to preemptively constrain the state estimates and covariance matrix to be free of constraint violations. A measurement-correction processor may be configured to correct constrained state estimates and a constrained covariance matrix based on processing of sensed plant output variables. The measurement-correction processor is coupled to update the dynamic model with corrected state estimates and a corrected covariance matrix. The updated dynamic model may be configured to estimate values for at least one plant variable not originally sensed by the sensor suite.

  3. Finite element modeling of diffusion and partitioning in biological systems: the infinite composite medium problem.

    PubMed

    Missel, P J

    2000-01-01

    Four methods are proposed for modeling diffusion in heterogeneous media where diffusion and partition coefficients take on differing values in each subregion. The exercise was conducted to validate finite element modeling (FEM) procedures in anticipation of modeling drug diffusion with regional partitioning into ocular tissue, though the approach can be useful for other organs, or for modeling diffusion in laminate devices. Partitioning creates a discontinuous value in the dependent variable (concentration) at an intertissue boundary that is not easily handled by available general-purpose FEM codes, which allow for only one value at each node. The discontinuity is handled using a transformation on the dependent variable based upon the region-specific partition coefficient. Methods were evaluated by their ability to reproduce a known exact result, for the problem of the infinite composite medium (Crank, J. The Mathematics of Diffusion, 2nd ed. New York: Oxford University Press, 1975, pp. 38-39.). The most physically intuitive method is based upon the concept of chemical potential, which is continuous across an interphase boundary (method III). This method makes the equation of the dependent variable highly nonlinear. This can be linearized easily by a change of variables (method IV). Results are also given for a one-dimensional problem simulating bolus injection into the vitreous, predicting time disposition of drug in vitreous and retina.

  4. Variable Selection in the Presence of Missing Data: Imputation-based Methods.

    PubMed

    Zhao, Yize; Long, Qi

    2017-01-01

    Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.

  5. Sharpening method of satellite thermal image based on the geographical statistical model

    NASA Astrophysics Data System (ADS)

    Qi, Pengcheng; Hu, Shixiong; Zhang, Haijun; Guo, Guangmeng

    2016-04-01

    To improve the effectiveness of thermal sharpening in mountainous regions, paying more attention to the laws of land surface energy balance, a thermal sharpening method based on the geographical statistical model (GSM) is proposed. Explanatory variables were selected from the processes of land surface energy budget and thermal infrared electromagnetic radiation transmission, then high spatial resolution (57 m) raster layers were generated for these variables through spatially simulating or using other raster data as proxies. Based on this, the local adaptation statistical relationship between brightness temperature (BT) and the explanatory variables, i.e., the GSM, was built at 1026-m resolution using the method of multivariate adaptive regression splines. Finally, the GSM was applied to the high-resolution (57-m) explanatory variables; thus, the high-resolution (57-m) BT image was obtained. This method produced a sharpening result with low error and good visual effect. The method can avoid the blind choice of explanatory variables and remove the dependence on synchronous imagery at visible and near-infrared bands. The influences of the explanatory variable combination, sampling method, and the residual error correction on sharpening results were analyzed deliberately, and their influence mechanisms are reported herein.

  6. Assessing Extratropical Influence on Tropical Climatology and Variability with Regional Coupled Data Assimilation

    NASA Astrophysics Data System (ADS)

    Lu, F.; Liu, Z.; Liu, Y.; Zhang, S.; Jacob, R. L.

    2017-12-01

    The Regional Coupled Data Assimilation (RCDA) method is introduced as a tool to study coupled climate dynamics and teleconnections. The RCDA method is built on an ensemble-based coupled data assimilation (CDA) system in a coupled general circulation model (CGCM). The RCDA method limits the data assimilation to the desired model components (e.g. atmosphere) and regions (e.g. the extratropics), and studies the ensemble-mean model response (e.g. tropical response to "observed" extratropical atmospheric variability). When applied to the extratropical influence on tropical climate, the RCDA method has shown some unique advantages, namely the combination of a fully coupled model, real-world observations and an ensemble approach. Tropical variability (e.g. El Niño-Southern Oscillation or ENSO) and climatology (e.g. asymmetric Inter-Tropical Convergence Zone or ITCZ) were initially thought to be determined mostly by local forcing and ocean-atmosphere interaction in the tropics. Since late 20th century, numerous studies have showed that extratropical forcing could affect, or even largely determine some aspects of the tropical climate. Due to the coupled nature of the climate system, however, the challenge of determining and further quantifying the causality of extratropical forcing on the tropical climate remains. Using the RCDA method, we have demonstrated significant control of extratropical atmospheric forcing on ENSO variability in a CGCM, both with model-generated and real-world observation datasets. The RCDA method has also shown robust extratropical impact on the tropical double-ITCZ bias in a CGCM. The RCDA method has provided the first systematic and quantitative assessment of extratropical influence on tropical climatology and variability by incorporating real world observations in a CGCM.

  7. Reliability Overhaul Model

    DTIC Science & Technology

    1989-08-01

    Random variables for the conditional exponential distribution are generated using the inverse transform method. C1) Generate U - UCO,i) (2) Set s - A ln...e - [(x+s - 7)/ n] 0 + [Cx-T)/n]0 c. Random variables from the conditional weibull distribution are generated using the inverse transform method. C1...using a standard normal transformation and the inverse transform method. B - 3 APPENDIX 3 DISTRIBUTIONS SUPPORTED BY THE MODEL (1) Generate Y - PCX S

  8. Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall

    NASA Astrophysics Data System (ADS)

    Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik

    2016-02-01

    Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.

  9. Survival curve estimation with dependent left truncated data using Cox's model.

    PubMed

    Mackenzie, Todd

    2012-10-19

    The Kaplan-Meier and closely related Lynden-Bell estimators are used to provide nonparametric estimation of the distribution of a left-truncated random variable. These estimators assume that the left-truncation variable is independent of the time-to-event. This paper proposes a semiparametric method for estimating the marginal distribution of the time-to-event that does not require independence. It models the conditional distribution of the time-to-event given the truncation variable using Cox's model for left truncated data, and uses inverse probability weighting. We report the results of simulations and illustrate the method using a survival study.

  10. The application of the Routh approximation method to turbofan engine models

    NASA Technical Reports Server (NTRS)

    Merrill, W. C.

    1977-01-01

    The Routh approximation technique is applied in the frequency domain to a 16th order state variable turbofan engine model. The results obtained motivate the extension of the frequency domain formulation of the Routh method to the time domain to handle the state variable formulation directly. The time domain formulation is derived, and a characterization, which specifies all possible Routh similarity transformations, is given. The characterization is computed by the solution of two eigenvalue eigenvector problems. The application of the time domain Routh technique to the state variable engine model is described, and some results are given.

  11. Novel Harmonic Regularization Approach for Variable Selection in Cox's Proportional Hazards Model

    PubMed Central

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods. PMID:25506389

  12. ODE constrained mixture modelling: a method for unraveling subpopulation structures and dynamics.

    PubMed

    Hasenauer, Jan; Hasenauer, Christine; Hucho, Tim; Theis, Fabian J

    2014-07-01

    Functional cell-to-cell variability is ubiquitous in multicellular organisms as well as bacterial populations. Even genetically identical cells of the same cell type can respond differently to identical stimuli. Methods have been developed to analyse heterogeneous populations, e.g., mixture models and stochastic population models. The available methods are, however, either incapable of simultaneously analysing different experimental conditions or are computationally demanding and difficult to apply. Furthermore, they do not account for biological information available in the literature. To overcome disadvantages of existing methods, we combine mixture models and ordinary differential equation (ODE) models. The ODE models provide a mechanistic description of the underlying processes while mixture models provide an easy way to capture variability. In a simulation study, we show that the class of ODE constrained mixture models can unravel the subpopulation structure and determine the sources of cell-to-cell variability. In addition, the method provides reliable estimates for kinetic rates and subpopulation characteristics. We use ODE constrained mixture modelling to study NGF-induced Erk1/2 phosphorylation in primary sensory neurones, a process relevant in inflammatory and neuropathic pain. We propose a mechanistic pathway model for this process and reconstructed static and dynamical subpopulation characteristics across experimental conditions. We validate the model predictions experimentally, which verifies the capabilities of ODE constrained mixture models. These results illustrate that ODE constrained mixture models can reveal novel mechanistic insights and possess a high sensitivity.

  13. Methodological development for selection of significant predictors explaining fatal road accidents.

    PubMed

    Dadashova, Bahar; Arenas-Ramírez, Blanca; Mira-McWilliams, José; Aparicio-Izquierdo, Francisco

    2016-05-01

    Identification of the most relevant factors for explaining road accident occurrence is an important issue in road safety research, particularly for future decision-making processes in transport policy. However model selection for this particular purpose is still an ongoing research. In this paper we propose a methodological development for model selection which addresses both explanatory variable and adequate model selection issues. A variable selection procedure, TIM (two-input model) method is carried out by combining neural network design and statistical approaches. The error structure of the fitted model is assumed to follow an autoregressive process. All models are estimated using Markov Chain Monte Carlo method where the model parameters are assigned non-informative prior distributions. The final model is built using the results of the variable selection. For the application of the proposed methodology the number of fatal accidents in Spain during 2000-2011 was used. This indicator has experienced the maximum reduction internationally during the indicated years thus making it an interesting time series from a road safety policy perspective. Hence the identification of the variables that have affected this reduction is of particular interest for future decision making. The results of the variable selection process show that the selected variables are main subjects of road safety policy measures. Published by Elsevier Ltd.

  14. A Selective Review of Group Selection in High-Dimensional Models

    PubMed Central

    Huang, Jian; Breheny, Patrick; Ma, Shuangge

    2013-01-01

    Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707

  15. Optimal variable-grid finite-difference modeling for porous media

    NASA Astrophysics Data System (ADS)

    Liu, Xinxin; Yin, Xingyao; Li, Haishan

    2014-12-01

    Numerical modeling of poroelastic waves by the finite-difference (FD) method is more expensive than that of acoustic or elastic waves. To improve the accuracy and computational efficiency of seismic modeling, variable-grid FD methods have been developed. In this paper, we derived optimal staggered-grid finite difference schemes with variable grid-spacing and time-step for seismic modeling in porous media. FD operators with small grid-spacing and time-step are adopted for low-velocity or small-scale geological bodies, while FD operators with big grid-spacing and time-step are adopted for high-velocity or large-scale regions. The dispersion relations of FD schemes were derived based on the plane wave theory, then the FD coefficients were obtained using the Taylor expansion. Dispersion analysis and modeling results demonstrated that the proposed method has higher accuracy with lower computational cost for poroelastic wave simulation in heterogeneous reservoirs.

  16. A fast collocation method for a variable-coefficient nonlocal diffusion model

    NASA Astrophysics Data System (ADS)

    Wang, Che; Wang, Hong

    2017-02-01

    We develop a fast collocation scheme for a variable-coefficient nonlocal diffusion model, for which a numerical discretization would yield a dense stiffness matrix. The development of the fast method is achieved by carefully handling the variable coefficients appearing inside the singular integral operator and exploiting the structure of the dense stiffness matrix. The resulting fast method reduces the computational work from O (N3) required by a commonly used direct solver to O (Nlog ⁡ N) per iteration and the memory requirement from O (N2) to O (N). Furthermore, the fast method reduces the computational work of assembling the stiffness matrix from O (N2) to O (N). Numerical results are presented to show the utility of the fast method.

  17. Much Ado about Nothing--Or at Best, Very Little

    ERIC Educational Resources Information Center

    Widaman, Keith F.

    2014-01-01

    Latent variable structural equation modeling has become the analytic method of choice in many domains of research in psychology and allied social sciences. One important aspect of a latent variable model concerns the relations hypothesized to hold between latent variables and their indicators. The most common specification of structural equation…

  18. Structural Equations and Path Analysis for Discrete Data.

    ERIC Educational Resources Information Center

    Winship, Christopher; Mare, Robert D.

    1983-01-01

    Presented is an approach to causal models in which some or all variables are discretely measured, showing that path analytic methods permit quantification of causal relationships among variables with the same flexibility and power of interpretation as is feasible in models including only continuous variables. Examples are provided. (Author/IS)

  19. A Direct Latent Variable Modeling Based Method for Point and Interval Estimation of Coefficient Alpha

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2015-01-01

    A direct approach to point and interval estimation of Cronbach's coefficient alpha for multiple component measuring instruments is outlined. The procedure is based on a latent variable modeling application with widely circulated software. As a by-product, using sample data the method permits ascertaining whether the population discrepancy…

  20. The conditional moment closure method for modeling lean premixed turbulent combustion

    NASA Astrophysics Data System (ADS)

    Martin, Scott Montgomery

    Natural gas fired lean premixed gas turbines have become the method of choice for new power generation systems due to their high efficiency and low pollutant emissions. As emission regulations for these combustion systems become more stringent, the use of numerical modeling has become an important a priori tool in designing clean and efficient combustors. Here a new turbulent combustion model is developed in an attempt to improve the state of the art. The Conditional Moment Closure (CMC) method is a new theory that has been applied to non-premixed combustion with good success. The application of the CMC method to premixed systems has been proposed, but has not yet been done. The premixed CMC method replaces the species mass fractions as independent variables with the species mass fractions that are conditioned on a reaction progress variable (RPV). Conservation equations for these new variables are then derived and solved. The general idea behind the CMC method is that the behavior of the chemical species is closely coupled to the reaction progress variable. Thus, species conservation equations that are conditioned on the RPV will have terms involving the fluctuating quantities that are much more likely to be negligible. The CMC method accounts for the interaction between scalar dissipation (micromixing) and chemistry, while de-coupling the kinetics from the bulk flow (macromixing). Here the CMC method is combined with a commercial computational fluid dynamics program, which calculates the large-scale fluid motions. The CMC model is validated by comparison to 2-D reacting backward facing step data. Predicted species, temperature and velocity fields are compared to experimental data with good success. The CMC model is also validated against the University of Washington's 3-D jet stirred reactor (JSR) data, which is an idealized lean premixed combustor. The JSR results are encouraging, but not as good as the backward facing step. The largest source of error is from the turbulence models, which are inadequate for the variable density and recirculating flows modeled here. The limitations of the turbulence models affected the calculation of the flow statistics, which are used to calculate the variance of the RPV, the scalar dissipation and the PDF.

  1. Incorporating wind availability into land use regression modelling of air quality in mountainous high-density urban environment.

    PubMed

    Shi, Yuan; Lau, Kevin Ka-Lun; Ng, Edward

    2017-08-01

    Urban air quality serves as an important function of the quality of urban life. Land use regression (LUR) modelling of air quality is essential for conducting health impacts assessment but more challenging in mountainous high-density urban scenario due to the complexities of the urban environment. In this study, a total of 21 LUR models are developed for seven kinds of air pollutants (gaseous air pollutants CO, NO 2 , NO x , O 3 , SO 2 and particulate air pollutants PM 2.5 , PM 10 ) with reference to three different time periods (summertime, wintertime and annual average of 5-year long-term hourly monitoring data from local air quality monitoring network) in Hong Kong. Under the mountainous high-density urban scenario, we improved the traditional LUR modelling method by incorporating wind availability information into LUR modelling based on surface geomorphometrical analysis. As a result, 269 independent variables were examined to develop the LUR models by using the "ADDRESS" independent variable selection method and stepwise multiple linear regression (MLR). Cross validation has been performed for each resultant model. The results show that wind-related variables are included in most of the resultant models as statistically significant independent variables. Compared with the traditional method, a maximum increase of 20% was achieved in the prediction performance of annual averaged NO 2 concentration level by incorporating wind-related variables into LUR model development. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    NASA Astrophysics Data System (ADS)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  3. Model-Averaged ℓ1 Regularization using Markov Chain Monte Carlo Model Composition

    PubMed Central

    Fraley, Chris; Percival, Daniel

    2014-01-01

    Bayesian Model Averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining ℓ1 regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the ℓ1 regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional datasets. We apply our technique in simulations, as well as to some applications that arise in genomics. PMID:25642001

  4. Model reduction for slow–fast stochastic systems with metastable behaviour

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bruna, Maria, E-mail: bruna@maths.ox.ac.uk; Computational Science Laboratory, Microsoft Research, Cambridge CB1 2FB; Chapman, S. Jonathan

    2014-05-07

    The quasi-steady-state approximation (or stochastic averaging principle) is a useful tool in the study of multiscale stochastic systems, giving a practical method by which to reduce the number of degrees of freedom in a model. The method is extended here to slow–fast systems in which the fast variables exhibit metastable behaviour. The key parameter that determines the form of the reduced model is the ratio of the timescale for the switching of the fast variables between metastable states to the timescale for the evolution of the slow variables. The method is illustrated with two examples: one from biochemistry (a fast-species-mediatedmore » chemical switch coupled to a slower varying species), and one from ecology (a predator–prey system). Numerical simulations of each model reduction are compared with those of the full system.« less

  5. Modeling and forecasting US presidential election using learning algorithms

    NASA Astrophysics Data System (ADS)

    Zolghadr, Mohammad; Niaki, Seyed Armin Akhavan; Niaki, S. T. A.

    2017-09-01

    The primary objective of this research is to obtain an accurate forecasting model for the US presidential election. To identify a reliable model, artificial neural networks (ANN) and support vector regression (SVR) models are compared based on some specified performance measures. Moreover, six independent variables such as GDP, unemployment rate, the president's approval rate, and others are considered in a stepwise regression to identify significant variables. The president's approval rate is identified as the most significant variable, based on which eight other variables are identified and considered in the model development. Preprocessing methods are applied to prepare the data for the learning algorithms. The proposed procedure significantly increases the accuracy of the model by 50%. The learning algorithms (ANN and SVR) proved to be superior to linear regression based on each method's calculated performance measures. The SVR model is identified as the most accurate model among the other models as this model successfully predicted the outcome of the election in the last three elections (2004, 2008, and 2012). The proposed approach significantly increases the accuracy of the forecast.

  6. Specifying and Refining a Complex Measurement Model.

    ERIC Educational Resources Information Center

    Levy, Roy; Mislevy, Robert J.

    This paper aims to describe a Bayesian approach to modeling and estimating cognitive models both in terms of statistical machinery and actual instrument development. Such a method taps the knowledge of experts to provide initial estimates for the probabilistic relationships among the variables in a multivariate latent variable model and refines…

  7. Sensitivity analysis of a sound absorption model with correlated inputs

    NASA Astrophysics Data System (ADS)

    Chai, W.; Christen, J.-L.; Zine, A.-M.; Ichchou, M.

    2017-04-01

    Sound absorption in porous media is a complex phenomenon, which is usually addressed with homogenized models, depending on macroscopic parameters. Since these parameters emerge from the structure at microscopic scale, they may be correlated. This paper deals with sensitivity analysis methods of a sound absorption model with correlated inputs. Specifically, the Johnson-Champoux-Allard model (JCA) is chosen as the objective model with correlation effects generated by a secondary micro-macro semi-empirical model. To deal with this case, a relatively new sensitivity analysis method Fourier Amplitude Sensitivity Test with Correlation design (FASTC), based on Iman's transform, is taken into application. This method requires a priori information such as variables' marginal distribution functions and their correlation matrix. The results are compared to the Correlation Ratio Method (CRM) for reference and validation. The distribution of the macroscopic variables arising from the microstructure, as well as their correlation matrix are studied. Finally the results of tests shows that the correlation has a very important impact on the results of sensitivity analysis. Assessment of correlation strength among input variables on the sensitivity analysis is also achieved.

  8. An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt

    NASA Astrophysics Data System (ADS)

    Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir

    2018-04-01

    Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.

  9. Small area estimation for semicontinuous data.

    PubMed

    Chandra, Hukum; Chambers, Ray

    2016-03-01

    Survey data often contain measurements for variables that are semicontinuous in nature, i.e. they either take a single fixed value (we assume this is zero) or they have a continuous, often skewed, distribution on the positive real line. Standard methods for small area estimation (SAE) based on the use of linear mixed models can be inefficient for such variables. We discuss SAE techniques for semicontinuous variables under a two part random effects model that allows for the presence of excess zeros as well as the skewed nature of the nonzero values of the response variable. In particular, we first model the excess zeros via a generalized linear mixed model fitted to the probability of a nonzero, i.e. strictly positive, value being observed, and then model the response, given that it is strictly positive, using a linear mixed model fitted on the logarithmic scale. Empirical results suggest that the proposed method leads to efficient small area estimates for semicontinuous data of this type. We also propose a parametric bootstrap method to estimate the MSE of the proposed small area estimator. These bootstrap estimates of the MSE are compared to the true MSE in a simulation study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. A regularized variable selection procedure in additive hazards model with stratified case-cohort design.

    PubMed

    Ni, Ai; Cai, Jianwen

    2018-07-01

    Case-cohort designs are commonly used in large epidemiological studies to reduce the cost associated with covariate measurement. In many such studies the number of covariates is very large. An efficient variable selection method is needed for case-cohort studies where the covariates are only observed in a subset of the sample. Current literature on this topic has been focused on the proportional hazards model. However, in many studies the additive hazards model is preferred over the proportional hazards model either because the proportional hazards assumption is violated or the additive hazards model provides more relevent information to the research question. Motivated by one such study, the Atherosclerosis Risk in Communities (ARIC) study, we investigate the properties of a regularized variable selection procedure in stratified case-cohort design under an additive hazards model with a diverging number of parameters. We establish the consistency and asymptotic normality of the penalized estimator and prove its oracle property. Simulation studies are conducted to assess the finite sample performance of the proposed method with a modified cross-validation tuning parameter selection methods. We apply the variable selection procedure to the ARIC study to demonstrate its practical use.

  11. Total sulfur determination in residues of crude oil distillation using FT-IR/ATR and variable selection methods.

    PubMed

    Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; de Azevedo Mello, Paola; Ferrão, Marco Flores; de Fátima Pereira dos Santos, Maria; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes

    2012-04-01

    Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm(-1)). This model produced a RMSECV of 400 mg kg(-1) S and RMSEP of 420 mg kg(-1) S, showing a correlation coefficient of 0.990. Copyright © 2011 Elsevier B.V. All rights reserved.

  12. The Contribution of Vegetation and Landscape Configuration for Predicting Environmental Change Impacts on Iberian Birds

    PubMed Central

    Triviño, Maria; Thuiller, Wilfried; Cabeza, Mar; Hickler, Thomas; Araújo, Miguel B.

    2011-01-01

    Although climate is known to be one of the key factors determining animal species distributions amongst others, projections of global change impacts on their distributions often rely on bioclimatic envelope models. Vegetation structure and landscape configuration are also key determinants of distributions, but they are rarely considered in such assessments. We explore the consequences of using simulated vegetation structure and composition as well as its associated landscape configuration in models projecting global change effects on Iberian bird species distributions. Both present-day and future distributions were modelled for 168 bird species using two ensemble forecasting methods: Random Forests (RF) and Boosted Regression Trees (BRT). For each species, several models were created, differing in the predictor variables used (climate, vegetation, and landscape configuration). Discrimination ability of each model in the present-day was then tested with four commonly used evaluation methods (AUC, TSS, specificity and sensitivity). The different sets of predictor variables yielded similar spatial patterns for well-modelled species, but the future projections diverged for poorly-modelled species. Models using all predictor variables were not significantly better than models fitted with climate variables alone for ca. 50% of the cases. Moreover, models fitted with climate data were always better than models fitted with landscape configuration variables, and vegetation variables were found to correlate with bird species distributions in 26–40% of the cases with BRT, and in 1–18% of the cases with RF. We conclude that improvements from including vegetation and its landscape configuration variables in comparison with climate only variables might not always be as great as expected for future projections of Iberian bird species. PMID:22216263

  13. Decadal predictions of Southern Ocean sea ice : testing different initialization methods with an Earth-system Model of Intermediate Complexity

    NASA Astrophysics Data System (ADS)

    Zunz, Violette; Goosse, Hugues; Dubinkina, Svetlana

    2013-04-01

    The sea ice extent in the Southern Ocean has increased since 1979 but the causes of this expansion have not been firmly identified. In particular, the contribution of internal variability and external forcing to this positive trend has not been fully established. In this region, the lack of observations and the overestimation of internal variability of the sea ice by contemporary General Circulation Models (GCMs) make it difficult to understand the behaviour of the sea ice. Nevertheless, if its evolution is governed by the internal variability of the system and if this internal variability is in some way predictable, a suitable initialization method should lead to simulations results that better fit the reality. Current GCMs decadal predictions are generally initialized through a nudging towards some observed fields. This relatively simple method does not seem to be appropriated to the initialization of sea ice in the Southern Ocean. The present study aims at identifying an initialization method that could improve the quality of the predictions of Southern Ocean sea ice at decadal timescales. We use LOVECLIM, an Earth-system Model of Intermediate Complexity that allows us to perform, within a reasonable computational time, the large amount of simulations required to test systematically different initialization procedures. These involve three data assimilation methods: a nudging, a particle filter and an efficient particle filter. In a first step, simulations are performed in an idealized framework, i.e. data from a reference simulation of LOVECLIM are used instead of observations, herein after called pseudo-observations. In this configuration, the internal variability of the model obviously agrees with the one of the pseudo-observations. This allows us to get rid of the issues related to the overestimation of the internal variability by models compared to the observed one. This way, we can work out a suitable methodology to assess the efficiency of the initialization procedures tested. It also allows us determine the upper limit of improvement that can be expected if more sophisticated initialization methods are used in decadal prediction simulations and if models have an internal variability agreeing with the observed one. Furthermore, since pseudo-observations are available everywhere at any time step, we also analyse the differences between simulations initialized with a complete dataset of pseudo-observations and the ones for which pseudo-observations data are not assimilated everywhere. In a second step, simulations are realized in a realistic framework, i.e. through the use of actual available observations. The same data assimilation methods are tested in order to check if more sophisticated methods can improve the reliability and the accuracy of decadal prediction simulations, even if they are performed with models that overestimate the internal variability of the sea ice extent in the Southern Ocean.

  14. Structural Equation Models in a Redundancy Analysis Framework With Covariates.

    PubMed

    Lovaglio, Pietro Giorgio; Vittadini, Giorgio

    2014-01-01

    A recent method to specify and fit structural equation modeling in the Redundancy Analysis framework based on so-called Extended Redundancy Analysis (ERA) has been proposed in the literature. In this approach, the relationships between the observed exogenous variables and the observed endogenous variables are moderated by the presence of unobservable composites, estimated as linear combinations of exogenous variables. However, in the presence of direct effects linking exogenous and endogenous variables, or concomitant indicators, the composite scores are estimated by ignoring the presence of the specified direct effects. To fit structural equation models, we propose a new specification and estimation method, called Generalized Redundancy Analysis (GRA), allowing us to specify and fit a variety of relationships among composites, endogenous variables, and external covariates. The proposed methodology extends the ERA method, using a more suitable specification and estimation algorithm, by allowing for covariates that affect endogenous indicators indirectly through the composites and/or directly. To illustrate the advantages of GRA over ERA we propose a simulation study of small samples. Moreover, we propose an application aimed at estimating the impact of formal human capital on the initial earnings of graduates of an Italian university, utilizing a structural model consistent with well-established economic theory.

  15. The stock-flow model of spatial data infrastructure development refined by fuzzy logic.

    PubMed

    Abdolmajidi, Ehsan; Harrie, Lars; Mansourian, Ali

    2016-01-01

    The system dynamics technique has been demonstrated to be a proper method by which to model and simulate the development of spatial data infrastructures (SDI). An SDI is a collaborative effort to manage and share spatial data at different political and administrative levels. It is comprised of various dynamically interacting quantitative and qualitative (linguistic) variables. To incorporate linguistic variables and their joint effects in an SDI-development model more effectively, we suggest employing fuzzy logic. Not all fuzzy models are able to model the dynamic behavior of SDIs properly. Therefore, this paper aims to investigate different fuzzy models and their suitability for modeling SDIs. To that end, two inference and two defuzzification methods were used for the fuzzification of the joint effect of two variables in an existing SDI model. The results show that the Average-Average inference and Center of Area defuzzification can better model the dynamics of SDI development.

  16. Rank-based estimation in the {ell}1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data.

    PubMed

    Johnson, Brent A

    2009-10-01

    We consider estimation and variable selection in the partial linear model for censored data. The partial linear model for censored data is a direct extension of the accelerated failure time model, the latter of which is a very important alternative model to the proportional hazards model. We extend rank-based lasso-type estimators to a model that may contain nonlinear effects. Variable selection in such partial linear model has direct application to high-dimensional survival analyses that attempt to adjust for clinical predictors. In the microarray setting, previous methods can adjust for other clinical predictors by assuming that clinical and gene expression data enter the model linearly in the same fashion. Here, we select important variables after adjusting for prognostic clinical variables but the clinical effects are assumed nonlinear. Our estimator is based on stratification and can be extended naturally to account for multiple nonlinear effects. We illustrate the utility of our method through simulation studies and application to the Wisconsin prognostic breast cancer data set.

  17. Improvement of the variable storage coefficient method with water surface gradient as a variable

    USDA-ARS?s Scientific Manuscript database

    The variable storage coefficient (VSC) method has been used for streamflow routing in continuous hydrological simulation models such as the Agricultural Policy/Environmental eXtender (APEX) and the Soil and Water Assessment Tool (SWAT) for more than 30 years. APEX operates on a daily time step and ...

  18. A Method for Modeling the Intrinsic Dynamics of Intraindividual Variability: Recovering the Parameters of Simulated Oscillators in Multi-Wave Panel Data.

    ERIC Educational Resources Information Center

    Boker, Steven M.; Nesselroade, John R.

    2002-01-01

    Examined two methods for fitting models of intrinsic dynamics to intraindividual variability data by testing these techniques' behavior in equations through simulation studies. Among the main results is the demonstration that a local linear approximation of derivatives can accurately recover the parameters of a simulated linear oscillator, with…

  19. Evaluating mediation and moderation effects in school psychology: A presentation of methods and review of current practice

    PubMed Central

    Fairchild, Amanda J.; McQuillin, Samuel D.

    2017-01-01

    Third variable effects elucidate the relation between two other variables, and can describe why they are related or under what conditions they are related. This article demonstrates methods to analyze two third-variable effects: moderation and mediation. The utility of examining moderation and mediation effects in school psychology is described and current use of the analyses in applied school psychology research is reviewed and evaluated. Proper statistical methods to test the effects are presented, and different effect size measures for the models are provided. Extensions of the basic moderator and mediator models are also described. PMID:20006988

  20. Evaluating mediation and moderation effects in school psychology: a presentation of methods and review of current practice.

    PubMed

    Fairchild, Amanda J; McQuillin, Samuel D

    2010-02-01

    Third variable effects elucidate the relation between two other variables, and can describe why they are related or under what conditions they are related. This article demonstrates methods to analyze two third-variable effects: moderation and mediation. The utility of examining moderation and mediation effects in school psychology is described and current use of the analyses in applied school psychology research is reviewed and evaluated. Proper statistical methods to test the effects are presented, and different effect size measures for the models are provided. Extensions of the basic moderator and mediator models are also described.

  1. Causal inference with measurement error in outcomes: Bias analysis and estimation methods.

    PubMed

    Shu, Di; Yi, Grace Y

    2017-01-01

    Inverse probability weighting estimation has been popularly used to consistently estimate the average treatment effect. Its validity, however, is challenged by the presence of error-prone variables. In this paper, we explore the inverse probability weighting estimation with mismeasured outcome variables. We study the impact of measurement error for both continuous and discrete outcome variables and reveal interesting consequences of the naive analysis which ignores measurement error. When a continuous outcome variable is mismeasured under an additive measurement error model, the naive analysis may still yield a consistent estimator; when the outcome is binary, we derive the asymptotic bias in a closed-form. Furthermore, we develop consistent estimation procedures for practical scenarios where either validation data or replicates are available. With validation data, we propose an efficient method for estimation of average treatment effect; the efficiency gain is substantial relative to usual methods of using validation data. To provide protection against model misspecification, we further propose a doubly robust estimator which is consistent even when either the treatment model or the outcome model is misspecified. Simulation studies are reported to assess the performance of the proposed methods. An application to a smoking cessation dataset is presented.

  2. Confirmatory Factor Analysis of Ordinal Variables with Misspecified Models

    ERIC Educational Resources Information Center

    Yang-Wallentin, Fan; Joreskog, Karl G.; Luo, Hao

    2010-01-01

    Ordinal variables are common in many empirical investigations in the social and behavioral sciences. Researchers often apply the maximum likelihood method to fit structural equation models to ordinal data. This assumes that the observed measures have normal distributions, which is not the case when the variables are ordinal. A better approach is…

  3. Censored Hurdle Negative Binomial Regression (Case Study: Neonatorum Tetanus Case in Indonesia)

    NASA Astrophysics Data System (ADS)

    Yuli Rusdiana, Riza; Zain, Ismaini; Wulan Purnami, Santi

    2017-06-01

    Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.

  4. Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft

    NASA Technical Reports Server (NTRS)

    Hopkins, Dale A.; Lavelle, Thomas M.; Patnaik, Surya

    2003-01-01

    The neural network and regression methods of NASA Glenn Research Center s COMETBOARDS design optimization testbed were used to generate approximate analysis and design models for a subsonic aircraft operating at Mach 0.85 cruise speed. The analytical model is defined by nine design variables: wing aspect ratio, engine thrust, wing area, sweep angle, chord-thickness ratio, turbine temperature, pressure ratio, bypass ratio, fan pressure; and eight response parameters: weight, landing velocity, takeoff and landing field lengths, approach thrust, overall efficiency, and compressor pressure and temperature. The variables were adjusted to optimally balance the engines to the airframe. The solution strategy included a sensitivity model and the soft analysis model. Researchers generated the sensitivity model by training the approximators to predict an optimum design. The trained neural network predicted all response variables, within 5-percent error. This was reduced to 1 percent by the regression method. The soft analysis model was developed to replace aircraft analysis as the reanalyzer in design optimization. Soft models have been generated for a neural network method, a regression method, and a hybrid method obtained by combining the approximators. The performance of the models is graphed for aircraft weight versus thrust as well as for wing area and turbine temperature. The regression method followed the analytical solution with little error. The neural network exhibited 5-percent maximum error over all parameters. Performance of the hybrid method was intermediate in comparison to the individual approximators. Error in the response variable is smaller than that shown in the figure because of a distortion scale factor. The overall performance of the approximators was considered to be satisfactory because aircraft analysis with NASA Langley Research Center s FLOPS (Flight Optimization System) code is a synthesis of diverse disciplines: weight estimation, aerodynamic analysis, engine cycle analysis, propulsion data interpolation, mission performance, airfield length for landing and takeoff, noise footprint, and others.

  5. Uncertain dynamic analysis for rigid-flexible mechanisms with random geometry and material properties

    NASA Astrophysics Data System (ADS)

    Wu, Jinglai; Luo, Zhen; Zhang, Nong; Zhang, Yunqing; Walker, Paul D.

    2017-02-01

    This paper proposes an uncertain modelling and computational method to analyze dynamic responses of rigid-flexible multibody systems (or mechanisms) with random geometry and material properties. Firstly, the deterministic model for the rigid-flexible multibody system is built with the absolute node coordinate formula (ANCF), in which the flexible parts are modeled by using ANCF elements, while the rigid parts are described by ANCF reference nodes (ANCF-RNs). Secondly, uncertainty for the geometry of rigid parts is expressed as uniform random variables, while the uncertainty for the material properties of flexible parts is modeled as a continuous random field, which is further discretized to Gaussian random variables using a series expansion method. Finally, a non-intrusive numerical method is developed to solve the dynamic equations of systems involving both types of random variables, which systematically integrates the deterministic generalized-α solver with Latin Hypercube sampling (LHS) and Polynomial Chaos (PC) expansion. The benchmark slider-crank mechanism is used as a numerical example to demonstrate the characteristics of the proposed method.

  6. Probe-specific mixed-model approach to detect copy number differences using multiplex ligation-dependent probe amplification (MLPA)

    PubMed Central

    González, Juan R; Carrasco, Josep L; Armengol, Lluís; Villatoro, Sergi; Jover, Lluís; Yasui, Yutaka; Estivill, Xavier

    2008-01-01

    Background MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes based on linear mixed-model. This method establishes a threshold by using different tolerance intervals that accommodates the specific random error variability observed in each test sample. Results Through simulation studies we have shown that our proposed method outperforms two existing methods that are based on simple threshold rules or iterative regression. We have illustrated the method using a controlled MLPA assay in which targeted regions are variable in copy number in individuals suffering from different disorders such as Prader-Willi, DiGeorge or Autism showing the best performace. Conclusion Using the proposed mixed-model, we are able to determine thresholds to decide whether a region is altered. These threholds are specific for each individual, incorporating experimental variability, resulting in improved sensitivity and specificity as the examples with real data have revealed. PMID:18522760

  7. Geodesic least squares regression on information manifolds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be

    We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply thismore » to scaling laws in magnetic confinement fusion.« less

  8. Apparatus and method for controlling autotroph cultivation

    DOEpatents

    Fuxman, Adrian M; Tixier, Sebastien; Stewart, Gregory E; Haran, Frank M; Backstrom, Johan U; Gerbrandt, Kelsey

    2013-07-02

    A method includes receiving at least one measurement of a dissolved carbon dioxide concentration of a mixture of fluid containing an autotrophic organism. The method also includes determining an adjustment to one or more manipulated variables using the at least one measurement. The method further includes generating one or more signals to modify the one or more manipulated variables based on the determined adjustment. The one or more manipulated variables could include a carbon dioxide flow rate, an air flow rate, a water temperature, and an agitation level for the mixture. At least one model relates the dissolved carbon dioxide concentration to one or more manipulated variables, and the adjustment could be determined by using the at least one model to drive the dissolved carbon dioxide concentration to at least one target that optimize a goal function. The goal function could be to optimize biomass growth rate, nutrient removal and/or lipid production.

  9. TopoSCALE v.1.0: downscaling gridded climate data in complex terrain

    NASA Astrophysics Data System (ADS)

    Fiddes, J.; Gruber, S.

    2014-02-01

    Simulation of land surface processes is problematic in heterogeneous terrain due to the the high resolution required of model grids to capture strong lateral variability caused by, for example, topography, and the lack of accurate meteorological forcing data at the site or scale it is required. Gridded data products produced by atmospheric models can fill this gap, however, often not at an appropriate spatial resolution to drive land-surface simulations. In this study we describe a method that uses the well-resolved description of the atmospheric column provided by climate models, together with high-resolution digital elevation models (DEMs), to downscale coarse-grid climate variables to a fine-scale subgrid. The main aim of this approach is to provide high-resolution driving data for a land-surface model (LSM). The method makes use of an interpolation of pressure-level data according to topographic height of the subgrid. An elevation and topography correction is used to downscale short-wave radiation. Long-wave radiation is downscaled by deriving a cloud-component of all-sky emissivity at grid level and using downscaled temperature and relative humidity fields to describe variability with elevation. Precipitation is downscaled with a simple non-linear lapse and optionally disaggregated using a climatology approach. We test the method in comparison with unscaled grid-level data and a set of reference methods, against a large evaluation dataset (up to 210 stations per variable) in the Swiss Alps. We demonstrate that the method can be used to derive meteorological inputs in complex terrain, with most significant improvements (with respect to reference methods) seen in variables derived from pressure levels: air temperature, relative humidity, wind speed and incoming long-wave radiation. This method may be of use in improving inputs to numerical simulations in heterogeneous and/or remote terrain, especially when statistical methods are not possible, due to lack of observations (i.e. remote areas or future periods).

  10. ODE Constrained Mixture Modelling: A Method for Unraveling Subpopulation Structures and Dynamics

    PubMed Central

    Hasenauer, Jan; Hasenauer, Christine; Hucho, Tim; Theis, Fabian J.

    2014-01-01

    Functional cell-to-cell variability is ubiquitous in multicellular organisms as well as bacterial populations. Even genetically identical cells of the same cell type can respond differently to identical stimuli. Methods have been developed to analyse heterogeneous populations, e.g., mixture models and stochastic population models. The available methods are, however, either incapable of simultaneously analysing different experimental conditions or are computationally demanding and difficult to apply. Furthermore, they do not account for biological information available in the literature. To overcome disadvantages of existing methods, we combine mixture models and ordinary differential equation (ODE) models. The ODE models provide a mechanistic description of the underlying processes while mixture models provide an easy way to capture variability. In a simulation study, we show that the class of ODE constrained mixture models can unravel the subpopulation structure and determine the sources of cell-to-cell variability. In addition, the method provides reliable estimates for kinetic rates and subpopulation characteristics. We use ODE constrained mixture modelling to study NGF-induced Erk1/2 phosphorylation in primary sensory neurones, a process relevant in inflammatory and neuropathic pain. We propose a mechanistic pathway model for this process and reconstructed static and dynamical subpopulation characteristics across experimental conditions. We validate the model predictions experimentally, which verifies the capabilities of ODE constrained mixture models. These results illustrate that ODE constrained mixture models can reveal novel mechanistic insights and possess a high sensitivity. PMID:24992156

  11. Prediction of the birch pollen season characteristics in Cracow, Poland using an 18-year data series.

    PubMed

    Dorota, Myszkowska

    2013-03-01

    The aim of the study was to construct the model forecasting the birch pollen season characteristics in Cracow on the basis of an 18-year data series. The study was performed using the volumetric method (Lanzoni/Burkard trap). The 98/95 % method was used to calculate the pollen season. The Spearman's correlation test was applied to find the relationship between the meteorological parameters and pollen season characteristics. To construct the predictive model, the backward stepwise multiple regression analysis was used including the multi-collinearity of variables. The predictive models best fitted the pollen season start and end, especially models containing two independent variables. The peak concentration value was predicted with the higher prediction error. Also the accuracy of the models predicting the pollen season characteristics in 2009 was higher in comparison with 2010. Both, the multi-variable model and one-variable model for the beginning of the pollen season included air temperature during the last 10 days of February, while the multi-variable model also included humidity at the beginning of April. The models forecasting the end of the pollen season were based on temperature in March-April, while the peak day was predicted using the temperature during the last 10 days of March.

  12. Regression calibration for models with two predictor variables measured with error and their interaction, using instrumental variables and longitudinal data.

    PubMed

    Strand, Matthew; Sillau, Stefan; Grunwald, Gary K; Rabinovitch, Nathan

    2014-02-10

    Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, we derive explicit and novel forms of regression calibration estimators and associated asymptotic variances for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations. The motivating application involves a longitudinal study of exposure to two pollutants (predictors) - outdoor fine particulate matter and cigarette smoke - and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE 4 , outcome) in asthmatic children. Because the exposure concentrations could not be directly observed, we used measurements from a fixed outdoor monitor and urinary cotinine concentrations as instrumental variables, and we used concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors as unbiased surrogate variables. We applied the derived regression calibration methods to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. We used simulations to verify accuracy of inferential methods based on asymptotic theory. Copyright © 2013 John Wiley & Sons, Ltd.

  13. Spatial generalised linear mixed models based on distances.

    PubMed

    Melo, Oscar O; Mateu, Jorge; Melo, Carlos E

    2016-10-01

    Risk models derived from environmental data have been widely shown to be effective in delineating geographical areas of risk because they are intuitively easy to understand. We present a new method based on distances, which allows the modelling of continuous and non-continuous random variables through distance-based spatial generalised linear mixed models. The parameters are estimated using Markov chain Monte Carlo maximum likelihood, which is a feasible and a useful technique. The proposed method depends on a detrending step built from continuous or categorical explanatory variables, or a mixture among them, by using an appropriate Euclidean distance. The method is illustrated through the analysis of the variation in the prevalence of Loa loa among a sample of village residents in Cameroon, where the explanatory variables included elevation, together with maximum normalised-difference vegetation index and the standard deviation of normalised-difference vegetation index calculated from repeated satellite scans over time. © The Author(s) 2013.

  14. Sensitivity analysis of infectious disease models: methods, advances and their application

    PubMed Central

    Wu, Jianyong; Dhingra, Radhika; Gambhir, Manoj; Remais, Justin V.

    2013-01-01

    Sensitivity analysis (SA) can aid in identifying influential model parameters and optimizing model structure, yet infectious disease modelling has yet to adopt advanced SA techniques that are capable of providing considerable insights over traditional methods. We investigate five global SA methods—scatter plots, the Morris and Sobol’ methods, Latin hypercube sampling-partial rank correlation coefficient and the sensitivity heat map method—and detail their relative merits and pitfalls when applied to a microparasite (cholera) and macroparasite (schistosomaisis) transmission model. The methods investigated yielded similar results with respect to identifying influential parameters, but offered specific insights that vary by method. The classical methods differed in their ability to provide information on the quantitative relationship between parameters and model output, particularly over time. The heat map approach provides information about the group sensitivity of all model state variables, and the parameter sensitivity spectrum obtained using this method reveals the sensitivity of all state variables to each parameter over the course of the simulation period, especially valuable for expressing the dynamic sensitivity of a microparasite epidemic model to its parameters. A summary comparison is presented to aid infectious disease modellers in selecting appropriate methods, with the goal of improving model performance and design. PMID:23864497

  15. The use of generalised additive models (GAM) in dentistry.

    PubMed

    Helfenstein, U; Steiner, M; Menghini, G

    1997-12-01

    Ordinary multiple regression and logistic multiple regression are widely applied statistical methods which allow a researcher to 'explain' or 'predict' a response variable from a set of explanatory variables or predictors. In these models it is usually assumed that quantitative predictors such as age enter linearly into the model. During recent years these methods have been further developed to allow more flexibility in the way explanatory variables 'act' on a response variable. The methods are called 'generalised additive models' (GAM). The rigid linear terms characterising the association between response and predictors are replaced in an optimal way by flexible curved functions of the predictors (the 'profiles'). Plotting the 'profiles' allows the researcher to visualise easily the shape by which predictors 'act' over the whole range of values. The method facilitates detection of particular shapes such as 'bumps', 'U-shapes', 'J-shapes, 'threshold values' etc. Information about the shape of the association is not revealed by traditional methods. The shapes of the profiles may be checked by performing a Monte Carlo simulation ('bootstrapping'). After the presentation of the GAM a relevant case study is presented in order to demonstrate application and use of the method. The dependence of caries in primary teeth on a set of explanatory variables is investigated. Since GAMs may not be easily accessible to dentists, this article presents them in an introductory condensed form. It was thought that a nonmathematical summary and a worked example might encourage readers to consider the methods described. GAMs may be of great value to dentists in allowing visualisation of the shape by which predictors 'act' and obtaining a better understanding of the complex relationships between predictors and response.

  16. A review of global terrestrial evapotranspiration: Observation, modeling, climatology, and climatic variability

    NASA Astrophysics Data System (ADS)

    Wang, Kaicun; Dickinson, Robert E.

    2012-06-01

    This review surveys the basic theories, observational methods, satellite algorithms, and land surface models for terrestrial evapotranspiration, E (or λE, i.e., latent heat flux), including a long-term variability and trends perspective. The basic theories used to estimate E are the Monin-Obukhov similarity theory (MOST), the Bowen ratio method, and the Penman-Monteith equation. The latter two theoretical expressions combine MOST with surface energy balance. Estimates of E can differ substantially between these three approaches because of their use of different input data. Surface and satellite-based measurement systems can provide accurate estimates of diurnal, daily, and annual variability of E. But their estimation of longer time variability is largely not established. A reasonable estimate of E as a global mean can be obtained from a surface water budget method, but its regional distribution is still rather uncertain. Current land surface models provide widely different ratios of the transpiration by vegetation to total E. This source of uncertainty therefore limits the capability of models to provide the sensitivities of E to precipitation deficits and land cover change.

  17. Matrix completion by deep matrix factorization.

    PubMed

    Fan, Jicong; Cheng, Jieyu

    2018-02-01

    Conventional methods of matrix completion are linear methods that are not effective in handling data of nonlinear structures. Recently a few researchers attempted to incorporate nonlinear techniques into matrix completion but there still exists considerable limitations. In this paper, a novel method called deep matrix factorization (DMF) is proposed for nonlinear matrix completion. Different from conventional matrix completion methods that are based on linear latent variable models, DMF is on the basis of a nonlinear latent variable model. DMF is formulated as a deep-structure neural network, in which the inputs are the low-dimensional unknown latent variables and the outputs are the partially observed variables. In DMF, the inputs and the parameters of the multilayer neural network are simultaneously optimized to minimize the reconstruction errors for the observed entries. Then the missing entries can be readily recovered by propagating the latent variables to the output layer. DMF is compared with state-of-the-art methods of linear and nonlinear matrix completion in the tasks of toy matrix completion, image inpainting and collaborative filtering. The experimental results verify that DMF is able to provide higher matrix completion accuracy than existing methods do and DMF is applicable to large matrices. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures

    PubMed Central

    Chen, Yun; Yang, Hui

    2016-01-01

    In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering. PMID:27966581

  19. A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures.

    PubMed

    Chen, Yun; Yang, Hui

    2016-12-14

    In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.

  20. An improved strategy for regression of biophysical variables and Landsat ETM+ data.

    Treesearch

    Warren B. Cohen; Thomas K. Maiersperger; Stith T. Gower; David P. Turner

    2003-01-01

    Empirical models are important tools for relating field-measured biophysical variables to remote sensing data. Regression analysis has been a popular empirical method of linking these two types of data to provide continuous estimates for variables such as biomass, percent woody canopy cover, and leaf area index (LAI). Traditional methods of regression are not...

  1. [Application of characteristic NIR variables selection in portable detection of soluble solids content of apple by near infrared spectroscopy].

    PubMed

    Fan, Shu-Xiang; Huang, Wen-Qian; Li, Jiang-Bo; Guo, Zhi-Ming; Zhaq, Chun-Jiang

    2014-10-01

    In order to detect the soluble solids content(SSC)of apple conveniently and rapidly, a ring fiber probe and a portable spectrometer were applied to obtain the spectroscopy of apple. Different wavelength variable selection methods, including unin- formative variable elimination (UVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm (GA) were pro- posed to select effective wavelength variables of the NIR spectroscopy of the SSC in apple based on PLS. The back interval LS- SVM (BiLS-SVM) and GA were used to select effective wavelength variables based on LS-SVM. Selected wavelength variables and full wavelength range were set as input variables of PLS model and LS-SVM model, respectively. The results indicated that PLS model built using GA-CARS on 50 characteristic variables selected from full-spectrum which had 1512 wavelengths achieved the optimal performance. The correlation coefficient (Rp) and root mean square error of prediction (RMSEP) for prediction sets were 0.962, 0.403°Brix respectively for SSC. The proposed method of GA-CARS could effectively simplify the portable detection model of SSC in apple based on near infrared spectroscopy and enhance the predictive precision. The study can provide a reference for the development of portable apple soluble solids content spectrometer.

  2. Review of literature on the finite-element solution of the equations of two-dimensional surface-water flow in the horizontal plane

    USGS Publications Warehouse

    Lee, Jonathan K.; Froehlich, David C.

    1987-01-01

    Published literature on the application of the finite-element method to solving the equations of two-dimensional surface-water flow in the horizontal plane is reviewed in this report. The finite-element method is ideally suited to modeling two-dimensional flow over complex topography with spatially variable resistance. A two-dimensional finite-element surface-water flow model with depth and vertically averaged velocity components as dependent variables allows the user great flexibility in defining geometric features such as the boundaries of a water body, channels, islands, dikes, and embankments. The following topics are reviewed in this report: alternative formulations of the equations of two-dimensional surface-water flow in the horizontal plane; basic concepts of the finite-element method; discretization of the flow domain and representation of the dependent flow variables; treatment of boundary conditions; discretization of the time domain; methods for modeling bottom, surface, and lateral stresses; approaches to solving systems of nonlinear equations; techniques for solving systems of linear equations; finite-element alternatives to Galerkin's method of weighted residuals; techniques of model validation; and preparation of model input data. References are listed in the final chapter.

  3. Applications of information theory, genetic algorithms, and neural models to predict oil flow

    NASA Astrophysics Data System (ADS)

    Ludwig, Oswaldo; Nunes, Urbano; Araújo, Rui; Schnitman, Leizer; Lepikson, Herman Augusto

    2009-07-01

    This work introduces a new information-theoretic methodology for choosing variables and their time lags in a prediction setting, particularly when neural networks are used in non-linear modeling. The first contribution of this work is the Cross Entropy Function (XEF) proposed to select input variables and their lags in order to compose the input vector of black-box prediction models. The proposed XEF method is more appropriate than the usually applied Cross Correlation Function (XCF) when the relationship among the input and output signals comes from a non-linear dynamic system. The second contribution is a method that minimizes the Joint Conditional Entropy (JCE) between the input and output variables by means of a Genetic Algorithm (GA). The aim is to take into account the dependence among the input variables when selecting the most appropriate set of inputs for a prediction problem. In short, theses methods can be used to assist the selection of input training data that have the necessary information to predict the target data. The proposed methods are applied to a petroleum engineering problem; predicting oil production. Experimental results obtained with a real-world dataset are presented demonstrating the feasibility and effectiveness of the method.

  4. Estimation of pharmacokinetic parameters from non-compartmental variables using Microsoft Excel.

    PubMed

    Dansirikul, Chantaratsamon; Choi, Malcolm; Duffull, Stephen B

    2005-06-01

    This study was conducted to develop a method, termed 'back analysis (BA)', for converting non-compartmental variables to compartment model dependent pharmacokinetic parameters for both one- and two-compartment models. A Microsoft Excel spreadsheet was implemented with the use of Solver and visual basic functions. The performance of the BA method in estimating pharmacokinetic parameter values was evaluated by comparing the parameter values obtained to a standard modelling software program, NONMEM, using simulated data. The results show that the BA method was reasonably precise and provided low bias in estimating fixed and random effect parameters for both one- and two-compartment models. The pharmacokinetic parameters estimated from the BA method were similar to those of NONMEM estimation.

  5. A multi-segment foot model based on anatomically registered technical coordinate systems: method repeatability in pediatric feet.

    PubMed

    Saraswat, Prabhav; MacWilliams, Bruce A; Davis, Roy B

    2012-04-01

    Several multi-segment foot models to measure the motion of intrinsic joints of the foot have been reported. Use of these models in clinical decision making is limited due to lack of rigorous validation including inter-clinician, and inter-lab variability measures. A model with thoroughly quantified variability may significantly improve the confidence in the results of such foot models. This study proposes a new clinical foot model with the underlying strategy of using separate anatomic and technical marker configurations and coordinate systems. Anatomical landmark and coordinate system identification is determined during a static subject calibration. Technical markers are located at optimal sites for dynamic motion tracking. The model is comprised of the tibia and three foot segments (hindfoot, forefoot and hallux) and inter-segmental joint angles are computed in three planes. Data collection was carried out on pediatric subjects at two sites (Site 1: n=10 subjects by two clinicians and Site 2: five subjects by one clinician). A plaster mold method was used to quantify static intra-clinician and inter-clinician marker placement variability by allowing direct comparisons of marker data between sessions for each subject. Intra-clinician and inter-clinician joint angle variability were less than 4°. For dynamic walking kinematics, intra-clinician, inter-clinician and inter-laboratory variability were less than 6° for the ankle and forefoot, but slightly higher for the hallux. Inter-trial variability accounted for 2-4° of the total dynamic variability. Results indicate the proposed foot model reduces the effects of marker placement variability on computed foot kinematics during walking compared to similar measures in previous models. Copyright © 2011 Elsevier B.V. All rights reserved.

  6. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Treesearch

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  7. A Correlation-Based Transition Model using Local Variables. Part 2; Test Cases and Industrial Applications

    NASA Technical Reports Server (NTRS)

    Langtry, R. B.; Menter, F. R.; Likki, S. R.; Suzen, Y. B.; Huang, P. G.; Volker, S.

    2006-01-01

    A new correlation-based transition model has been developed, which is built strictly on local variables. As a result, the transition model is compatible with modern computational fluid dynamics (CFD) methods using unstructured grids and massive parallel execution. The model is based on two transport equations, one for the intermittency and one for the transition onset criteria in terms of momentum thickness Reynolds number. The proposed transport equations do not attempt to model the physics of the transition process (unlike, e.g., turbulence models), but form a framework for the implementation of correlation-based models into general-purpose CFD methods.

  8. Estimating the Uncertainty In Diameter Growth Model Predictions and Its Effects On The Uncertainty of Annual Inventory Estimates

    Treesearch

    Ronald E. McRoberts; Veronica C. Lessard

    2001-01-01

    Uncertainty in diameter growth predictions is attributed to three general sources: measurement error or sampling variability in predictor variables, parameter covariances, and residual or unexplained variation around model expectations. Using measurement error and sampling variability distributions obtained from the literature and Monte Carlo simulation methods, the...

  9. Extending existing structural identifiability analysis methods to mixed-effects models.

    PubMed

    Janzén, David L I; Jirstrand, Mats; Chappell, Michael J; Evans, Neil D

    2018-01-01

    The concept of structural identifiability for state-space models is expanded to cover mixed-effects state-space models. Two methods applicable for the analytical study of the structural identifiability of mixed-effects models are presented. The two methods are based on previously established techniques for non-mixed-effects models; namely the Taylor series expansion and the input-output form approach. By generating an exhaustive summary, and by assuming an infinite number of subjects, functions of random variables can be derived which in turn determine the distribution of the system's observation function(s). By considering the uniqueness of the analytical statistical moments of the derived functions of the random variables, the structural identifiability of the corresponding mixed-effects model can be determined. The two methods are applied to a set of examples of mixed-effects models to illustrate how they work in practice. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Pitfalls in statistical landslide susceptibility modelling

    NASA Astrophysics Data System (ADS)

    Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut

    2010-05-01

    The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible lack of explanatory information in the chosen set of predictor variables, the model residuals need to be checked for spatial auto¬correlation. Therefore, we calculate spline correlograms. In addition to this, we investigate partial dependency plots and bivariate interactions plots considering possible interactions between predictors to improve model interpretation. Aiming at presenting this toolbox for model quality assessment, we investigate the influence of strategies in the construction of training datasets for statistical models on model quality.

  11. Can climate variability information constrain a hydrological model for an ungauged Costa Rican catchment?

    NASA Astrophysics Data System (ADS)

    Quesada-Montano, Beatriz; Westerberg, Ida K.; Fuentes-Andino, Diana; Hidalgo-Leon, Hugo; Halldin, Sven

    2017-04-01

    Long-term hydrological data are key to understanding catchment behaviour and for decision making within water management and planning. Given the lack of observed data in many regions worldwide, hydrological models are an alternative for reproducing historical streamflow series. Additional types of information - to locally observed discharge - can be used to constrain model parameter uncertainty for ungauged catchments. Climate variability exerts a strong influence on streamflow variability on long and short time scales, in particular in the Central-American region. We therefore explored the use of climate variability knowledge to constrain the simulated discharge uncertainty of a conceptual hydrological model applied to a Costa Rican catchment, assumed to be ungauged. To reduce model uncertainty we first rejected parameter relationships that disagreed with our understanding of the system. We then assessed how well climate-based constraints applied at long-term, inter-annual and intra-annual time scales could constrain model uncertainty. Finally, we compared the climate-based constraints to a constraint on low-flow statistics based on information obtained from global maps. We evaluated our method in terms of the ability of the model to reproduce the observed hydrograph and the active catchment processes in terms of two efficiency measures, a statistical consistency measure, a spread measure and 17 hydrological signatures. We found that climate variability knowledge was useful for reducing model uncertainty, in particular, unrealistic representation of deep groundwater processes. The constraints based on global maps of low-flow statistics provided more constraining information than those based on climate variability, but the latter rejected slow rainfall-runoff representations that the low flow statistics did not reject. The use of such knowledge, together with information on low-flow statistics and constraints on parameter relationships showed to be useful to constrain model uncertainty for an - assumed to be - ungauged basin. This shows that our method is promising for reconstructing long-term flow data for ungauged catchments on the Pacific side of Central America, and that similar methods can be developed for ungauged basins in other regions where climate variability exerts a strong control on streamflow variability.

  12. Reduction of variable-truncation artifacts from beam occlusion during in situ x-ray tomography

    NASA Astrophysics Data System (ADS)

    Borg, Leise; Jørgensen, Jakob S.; Frikel, Jürgen; Sporring, Jon

    2017-12-01

    Many in situ x-ray tomography studies require experimental rigs which may partially occlude the beam and cause parts of the projection data to be missing. In a study of fluid flow in porous chalk using a percolation cell with four metal bars drastic streak artifacts arise in the filtered backprojection (FBP) reconstruction at certain orientations. Projections with non-trivial variable truncation caused by the metal bars are the source of these variable-truncation artifacts. To understand the artifacts a mathematical model of variable-truncation data as a function of metal bar radius and distance to sample is derived and verified numerically and with experimental data. The model accurately describes the arising variable-truncation artifacts across simulated variations of the experimental setup. Three variable-truncation artifact-reduction methods are proposed, all aimed at addressing sinogram discontinuities that are shown to be the source of the streaks. The ‘reduction to limited angle’ (RLA) method simply keeps only non-truncated projections; the ‘detector-directed smoothing’ (DDS) method smooths the discontinuities; while the ‘reflexive boundary condition’ (RBC) method enforces a zero derivative at the discontinuities. Experimental results using both simulated and real data show that the proposed methods effectively reduce variable-truncation artifacts. The RBC method is found to provide the best artifact reduction and preservation of image features using both visual and quantitative assessment. The analysis and artifact-reduction methods are designed in context of FBP reconstruction motivated by computational efficiency practical for large, real synchrotron data. While a specific variable-truncation case is considered, the proposed methods can be applied to general data cut-offs arising in different in situ x-ray tomography experiments.

  13. Parameterizing the Variability and Uncertainty of Wind and Solar in CEMs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Frew, Bethany

    We present current and improved methods for estimating the capacity value and curtailment impacts from variable generation (VG) in capacity expansion models (CEMs). The ideal calculation of these variability metrics is through an explicit co-optimized investment-dispatch model using multiple years of VG and load data. Because of data and computational limitations, existing CEMs typically approximate these metrics using a subset of all hours from a single year and/or using statistical methods, which often do not capture the tail-event impacts or the broader set of interactions between VG, storage, and conventional generators. In our proposed new methods, we use hourly generationmore » and load values across all hours of the year to characterize the (1) contribution of VG to system capacity during high load hours, (2) the curtailment level of VG, and (3) the reduction in VG curtailment due to storage and shutdown of select thermal generators. Using CEM model outputs from a preceding model solve period, we apply these methods to exogenously calculate capacity value and curtailment metrics for the subsequent model solve period. Preliminary results suggest that these hourly methods offer improved capacity value and curtailment representations of VG in the CEM from existing approximation methods without additional computational burdens.« less

  14. The ACCE method: an approach for obtaining quantitative or qualitative estimates of residual confounding that includes unmeasured confounding

    PubMed Central

    Smith, Eric G.

    2015-01-01

    Background:  Nonrandomized studies typically cannot account for confounding from unmeasured factors.  Method:  A method is presented that exploits the recently-identified phenomenon of  “confounding amplification” to produce, in principle, a quantitative estimate of total residual confounding resulting from both measured and unmeasured factors.  Two nested propensity score models are constructed that differ only in the deliberate introduction of an additional variable(s) that substantially predicts treatment exposure.  Residual confounding is then estimated by dividing the change in treatment effect estimate between models by the degree of confounding amplification estimated to occur, adjusting for any association between the additional variable(s) and outcome. Results:  Several hypothetical examples are provided to illustrate how the method produces a quantitative estimate of residual confounding if the method’s requirements and assumptions are met.  Previously published data is used to illustrate that, whether or not the method routinely provides precise quantitative estimates of residual confounding, the method appears to produce a valuable qualitative estimate of the likely direction and general size of residual confounding. Limitations:  Uncertainties exist, including identifying the best approaches for: 1) predicting the amount of confounding amplification, 2) minimizing changes between the nested models unrelated to confounding amplification, 3) adjusting for the association of the introduced variable(s) with outcome, and 4) deriving confidence intervals for the method’s estimates (although bootstrapping is one plausible approach). Conclusions:  To this author’s knowledge, it has not been previously suggested that the phenomenon of confounding amplification, if such amplification is as predictable as suggested by a recent simulation, provides a logical basis for estimating total residual confounding. The method's basic approach is straightforward.  The method's routine usefulness, however, has not yet been established, nor has the method been fully validated. Rapid further investigation of this novel method is clearly indicated, given the potential value of its quantitative or qualitative output. PMID:25580226

  15. Reply to comment by Fred L. Ogden et al. on "Beyond the SCS-CN method: A theoretical framework for spatially lumped rainfall-runoff response"

    NASA Astrophysics Data System (ADS)

    Bartlett, M. S.; Parolari, A. J.; McDonnell, J. J.; Porporato, A.

    2017-07-01

    Though Ogden et al. list several shortcomings of the original SCS-CN method, fit for purpose is a key consideration in hydrological modelling, as shown by the adoption of SCS-CN method in many design standards. The theoretical framework of Bartlett et al. [2016a] reveals a family of semidistributed models, of which the SCS-CN method is just one member. Other members include event-based versions of the Variable Infiltration Capacity (VIC) model and TOPMODEL. This general model allows us to move beyond the limitations of the original SCS-CN method under different rainfall-runoff mechanisms and distributions for soil and rainfall variability. Future research should link this general model approach to different hydrogeographic settings, in line with the call for action proposed by Ogden et al.

  16. An efficient deterministic-probabilistic approach to modeling regional groundwater flow: 1. Theory

    USGS Publications Warehouse

    Yen, Chung-Cheng; Guymon, Gary L.

    1990-01-01

    An efficient probabilistic model is developed and cascaded with a deterministic model for predicting water table elevations in regional aquifers. The objective is to quantify model uncertainty where precise estimates of water table elevations may be required. The probabilistic model is based on the two-point probability method which only requires prior knowledge of uncertain variables mean and coefficient of variation. The two-point estimate method is theoretically developed and compared with the Monte Carlo simulation method. The results of comparisons using hypothetical determinisitic problems indicate that the two-point estimate method is only generally valid for linear problems where the coefficients of variation of uncertain parameters (for example, storage coefficient and hydraulic conductivity) is small. The two-point estimate method may be applied to slightly nonlinear problems with good results, provided coefficients of variation are small. In such cases, the two-point estimate method is much more efficient than the Monte Carlo method provided the number of uncertain variables is less than eight.

  17. An Efficient Deterministic-Probabilistic Approach to Modeling Regional Groundwater Flow: 1. Theory

    NASA Astrophysics Data System (ADS)

    Yen, Chung-Cheng; Guymon, Gary L.

    1990-07-01

    An efficient probabilistic model is developed and cascaded with a deterministic model for predicting water table elevations in regional aquifers. The objective is to quantify model uncertainty where precise estimates of water table elevations may be required. The probabilistic model is based on the two-point probability method which only requires prior knowledge of uncertain variables mean and coefficient of variation. The two-point estimate method is theoretically developed and compared with the Monte Carlo simulation method. The results of comparisons using hypothetical determinisitic problems indicate that the two-point estimate method is only generally valid for linear problems where the coefficients of variation of uncertain parameters (for example, storage coefficient and hydraulic conductivity) is small. The two-point estimate method may be applied to slightly nonlinear problems with good results, provided coefficients of variation are small. In such cases, the two-point estimate method is much more efficient than the Monte Carlo method provided the number of uncertain variables is less than eight.

  18. Background Error Covariance Estimation using Information from a Single Model Trajectory with Application to Ocean Data Assimilation into the GEOS-5 Coupled Model

    NASA Technical Reports Server (NTRS)

    Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume; Koster, Randal D. (Editor)

    2014-01-01

    An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory. SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.

  19. Data mining of tree-based models to analyze freeway accident frequency.

    PubMed

    Chang, Li-Yen; Chen, Wen-Chieh

    2005-01-01

    Statistical models, such as Poisson or negative binomial regression models, have been employed to analyze vehicle accident frequency for many years. However, these models have their own model assumptions and pre-defined underlying relationship between dependent and independent variables. If these assumptions are violated, the model could lead to erroneous estimation of accident likelihood. Classification and Regression Tree (CART), one of the most widely applied data mining techniques, has been commonly employed in business administration, industry, and engineering. CART does not require any pre-defined underlying relationship between target (dependent) variable and predictors (independent variables) and has been shown to be a powerful tool, particularly for dealing with prediction and classification problems. This study collected the 2001-2002 accident data of National Freeway 1 in Taiwan. A CART model and a negative binomial regression model were developed to establish the empirical relationship between traffic accidents and highway geometric variables, traffic characteristics, and environmental factors. The CART findings indicated that the average daily traffic volume and precipitation variables were the key determinants for freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies.

  20. Quantifying natural delta variability using a multiple-point geostatistics prior uncertainty model

    NASA Astrophysics Data System (ADS)

    Scheidt, Céline; Fernandes, Anjali M.; Paola, Chris; Caers, Jef

    2016-10-01

    We address the question of quantifying uncertainty associated with autogenic pattern variability in a channelized transport system by means of a modern geostatistical method. This question has considerable relevance for practical subsurface applications as well, particularly those related to uncertainty quantification relying on Bayesian approaches. Specifically, we show how the autogenic variability in a laboratory experiment can be represented and reproduced by a multiple-point geostatistical prior uncertainty model. The latter geostatistical method requires selection of a limited set of training images from which a possibly infinite set of geostatistical model realizations, mimicking the training image patterns, can be generated. To that end, we investigate two methods to determine how many training images and what training images should be provided to reproduce natural autogenic variability. The first method relies on distance-based clustering of overhead snapshots of the experiment; the second method relies on a rate of change quantification by means of a computer vision algorithm termed the demon algorithm. We show quantitatively that with either training image selection method, we can statistically reproduce the natural variability of the delta formed in the experiment. In addition, we study the nature of the patterns represented in the set of training images as a representation of the "eigenpatterns" of the natural system. The eigenpattern in the training image sets display patterns consistent with previous physical interpretations of the fundamental modes of this type of delta system: a highly channelized, incisional mode; a poorly channelized, depositional mode; and an intermediate mode between the two.

  1. Multivariate calibration on NIR data: development of a model for the rapid evaluation of ethanol content in bakery products.

    PubMed

    Bello, Alessandra; Bianchi, Federica; Careri, Maria; Giannetto, Marco; Mori, Giovanni; Musci, Marilena

    2007-11-05

    A new NIR method based on multivariate calibration for determination of ethanol in industrially packed wholemeal bread was developed and validated. GC-FID was used as reference method for the determination of actual ethanol concentration of different samples of wholemeal bread with proper content of added ethanol, ranging from 0 to 3.5% (w/w). Stepwise discriminant analysis was carried out on the NIR dataset, in order to reduce the number of original variables by selecting those that were able to discriminate between the samples of different ethanol concentrations. With the so selected variables a multivariate calibration model was then obtained by multiple linear regression. The prediction power of the linear model was optimized by a new "leave one out" method, so that the number of original variables resulted further reduced.

  2. System and method for anomaly detection

    DOEpatents

    Scherrer, Chad

    2010-06-15

    A system and method for detecting one or more anomalies in a plurality of observations is provided. In one illustrative embodiment, the observations are real-time network observations collected from a stream of network traffic. The method includes performing a discrete decomposition of the observations, and introducing derived variables to increase storage and query efficiencies. A mathematical model, such as a conditional independence model, is then generated from the formatted data. The formatted data is also used to construct frequency tables which maintain an accurate count of specific variable occurrence as indicated by the model generation process. The formatted data is then applied to the mathematical model to generate scored data. The scored data is then analyzed to detect anomalies.

  3. The method of belief scales as a means for dealing with uncertainty in tough regulatory decisions.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pilch, Martin M.

    Modeling and simulation is playing an increasing role in supporting tough regulatory decisions, which are typically characterized by variabilities and uncertainties in the scenarios, input conditions, failure criteria, model parameters, and even model form. Variability exists when there is a statistically significant database that is fully relevant to the application. Uncertainty, on the other hand, is characterized by some degree of ignorance. A simple algebraic problem was used to illustrate how various risk methodologies address variability and uncertainty in a regulatory context. These traditional risk methodologies include probabilistic methods (including frequensic and Bayesian perspectives) and second-order methods where variabilities andmore » uncertainties are treated separately. Representing uncertainties with (subjective) probability distributions and using probabilistic methods to propagate subjective distributions can lead to results that are not logically consistent with available knowledge and that may not be conservative. The Method of Belief Scales (MBS) is developed as a means to logically aggregate uncertain input information and to propagate that information through the model to a set of results that are scrutable, easily interpretable by the nonexpert, and logically consistent with the available input information. The MBS, particularly in conjunction with sensitivity analyses, has the potential to be more computationally efficient than other risk methodologies. The regulatory language must be tailored to the specific risk methodology if ambiguity and conflict are to be avoided.« less

  4. Load estimator (LOADEST): a FORTRAN program for estimating constituent loads in streams and rivers

    USGS Publications Warehouse

    Runkel, Robert L.; Crawford, Charles G.; Cohn, Timothy A.

    2004-01-01

    LOAD ESTimator (LOADEST) is a FORTRAN program for estimating constituent loads in streams and rivers. Given a time series of streamflow, additional data variables, and constituent concentration, LOADEST assists the user in developing a regression model for the estimation of constituent load (calibration). Explanatory variables within the regression model include various functions of streamflow, decimal time, and additional user-specified data variables. The formulated regression model then is used to estimate loads over a user-specified time interval (estimation). Mean load estimates, standard errors, and 95 percent confidence intervals are developed on a monthly and(or) seasonal basis. The calibration and estimation procedures within LOADEST are based on three statistical estimation methods. The first two methods, Adjusted Maximum Likelihood Estimation (AMLE) and Maximum Likelihood Estimation (MLE), are appropriate when the calibration model errors (residuals) are normally distributed. Of the two, AMLE is the method of choice when the calibration data set (time series of streamflow, additional data variables, and concentration) contains censored data. The third method, Least Absolute Deviation (LAD), is an alternative to maximum likelihood estimation when the residuals are not normally distributed. LOADEST output includes diagnostic tests and warnings to assist the user in determining the appropriate estimation method and in interpreting the estimated loads. This report describes the development and application of LOADEST. Sections of the report describe estimation theory, input/output specifications, sample applications, and installation instructions.

  5. Analytical and Numerical solutions of a nonlinear alcoholism model via variable-order fractional differential equations

    NASA Astrophysics Data System (ADS)

    Gómez-Aguilar, J. F.

    2018-03-01

    In this paper, we analyze an alcoholism model which involves the impact of Twitter via Liouville-Caputo and Atangana-Baleanu-Caputo fractional derivatives with constant- and variable-order. Two fractional mathematical models are considered, with and without delay. Special solutions using an iterative scheme via Laplace and Sumudu transform were obtained. We studied the uniqueness and existence of the solutions employing the fixed point postulate. The generalized model with variable-order was solved numerically via the Adams method and the Adams-Bashforth-Moulton scheme. Stability and convergence of the numerical solutions were presented in details. Numerical examples of the approximate solutions are provided to show that the numerical methods are computationally efficient. Therefore, by including both the fractional derivatives and finite time delays in the alcoholism model studied, we believe that we have established a more complete and more realistic indicator of alcoholism model and affect the spread of the drinking.

  6. Introduction to the use of regression models in epidemiology.

    PubMed

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  7. Adaptive control of a jet turboshaft engine driving a variable pitch propeller using multiple models

    NASA Astrophysics Data System (ADS)

    Ahmadian, Narjes; Khosravi, Alireza; Sarhadi, Pouria

    2017-08-01

    In this paper, a multiple model adaptive control (MMAC) method is proposed for a gas turbine engine. The model of a twin spool turbo-shaft engine driving a variable pitch propeller includes various operating points. Variations in fuel flow and propeller pitch inputs produce different operating conditions which force the controller to be adopted rapidly. Important operating points are three idle, cruise and full thrust cases for the entire flight envelope. A multi-input multi-output (MIMO) version of second level adaptation using multiple models is developed. Also, stability analysis using Lyapunov method is presented. The proposed method is compared with two conventional first level adaptation and model reference adaptive control techniques. Simulation results for JetCat SPT5 turbo-shaft engine demonstrate the performance and fidelity of the proposed method.

  8. An Equation-Free Reduced-Order Modeling Approach to Tropical Pacific Simulation

    NASA Astrophysics Data System (ADS)

    Wang, Ruiwen; Zhu, Jiang; Luo, Zhendong; Navon, I. M.

    2009-03-01

    The “equation-free” (EF) method is often used in complex, multi-scale problems. In such cases it is necessary to know the closed form of the required evolution equations about oscopic variables within some applied fields. Conceptually such equations exist, however, they are not available in closed form. The EF method can bypass this difficulty. This method can obtain oscopic information by implementing models at a microscopic level. Given an initial oscopic variable, through lifting we can obtain the associated microscopic variable, which may be evolved using Direct Numerical Simulations (DNS) and by restriction, we can obtain the necessary oscopic information and the projective integration to obtain the desired quantities. In this paper we apply the EF POD-assisted method to the reduced modeling of a large-scale upper ocean circulation in the tropical Pacific domain. The computation cost is reduced dramatically. Compared with the POD method, the method provided more accurate results and it did not require the availability of any explicit equations or the right-hand side (RHS) of the evolution equation.

  9. Modeling and simulation of different and representative engineering problems using Network Simulation Method

    PubMed Central

    2018-01-01

    Mathematical models simulating different and representative engineering problem, atomic dry friction, the moving front problems and elastic and solid mechanics are presented in the form of a set of non-linear, coupled or not coupled differential equations. For different parameters values that influence the solution, the problem is numerically solved by the network method, which provides all the variables of the problems. Although the model is extremely sensitive to the above parameters, no assumptions are considered as regards the linearization of the variables. The design of the models, which are run on standard electrical circuit simulation software, is explained in detail. The network model results are compared with common numerical methods or experimental data, published in the scientific literature, to show the reliability of the model. PMID:29518121

  10. Modeling and simulation of different and representative engineering problems using Network Simulation Method.

    PubMed

    Sánchez-Pérez, J F; Marín, F; Morales, J L; Cánovas, M; Alhama, F

    2018-01-01

    Mathematical models simulating different and representative engineering problem, atomic dry friction, the moving front problems and elastic and solid mechanics are presented in the form of a set of non-linear, coupled or not coupled differential equations. For different parameters values that influence the solution, the problem is numerically solved by the network method, which provides all the variables of the problems. Although the model is extremely sensitive to the above parameters, no assumptions are considered as regards the linearization of the variables. The design of the models, which are run on standard electrical circuit simulation software, is explained in detail. The network model results are compared with common numerical methods or experimental data, published in the scientific literature, to show the reliability of the model.

  11. Identification of spectral regions for the quantification of red wine tannins with fourier transform mid-infrared spectroscopy.

    PubMed

    Jensen, Jacob S; Egebo, Max; Meyer, Anne S

    2008-05-28

    Accomplishment of fast tannin measurements is receiving increased interest as tannins are important for the mouthfeel and color properties of red wines. Fourier transform mid-infrared spectroscopy allows fast measurement of different wine components, but quantification of tannins is difficult due to interferences from spectral responses of other wine components. Four different variable selection tools were investigated for the identification of the most important spectral regions which would allow quantification of tannins from the spectra using partial least-squares regression. The study included the development of a new variable selection tool, iterative backward elimination of changeable size intervals PLS. The spectral regions identified by the different variable selection methods were not identical, but all included two regions (1485-1425 and 1060-995 cm(-1)), which therefore were concluded to be particularly important for tannin quantification. The spectral regions identified from the variable selection methods were used to develop calibration models. All four variable selection methods identified regions that allowed an improved quantitative prediction of tannins (RMSEP = 69-79 mg of CE/L; r = 0.93-0.94) as compared to a calibration model developed using all variables (RMSEP = 115 mg of CE/L; r = 0.87). Only minor differences in the performance of the variable selection methods were observed.

  12. Investigating the Performance of Alternate Regression Weights by Studying All Possible Criteria in Regression Models with a Fixed Set of Predictors

    ERIC Educational Resources Information Center

    Waller, Niels; Jones, Jeff

    2011-01-01

    We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…

  13. Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability.

    PubMed

    Krafty, Robert T

    2016-07-01

    Many studies record replicated time series epochs from different groups with the goal of using frequency domain properties to discriminate between the groups. In many applications, there exists variation in cyclical patterns from time series in the same group. Although a number of frequency domain methods for the discriminant analysis of time series have been explored, there is a dearth of models and methods that account for within-group spectral variability. This article proposes a model for groups of time series in which transfer functions are modeled as stochastic variables that can account for both between-group and within-group differences in spectra that are identified from individual replicates. An ensuing discriminant analysis of stochastic cepstra under this model is developed to obtain parsimonious measures of relative power that optimally separate groups in the presence of within-group spectral variability. The approach possess favorable properties in classifying new observations and can be consistently estimated through a simple discriminant analysis of a finite number of estimated cepstral coefficients. Benefits in accounting for within-group spectral variability are empirically illustrated in a simulation study and through an analysis of gait variability.

  14. Background Error Covariance Estimation Using Information from a Single Model Trajectory with Application to Ocean Data Assimilation

    NASA Technical Reports Server (NTRS)

    Keppenne, Christian L.; Rienecker, Michele; Kovach, Robin M.; Vernieres, Guillaume

    2014-01-01

    An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory.SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.

  15. Variability of basin-scale terrestrial water storage from a novel application of the water budget equation: the Amazon and the Mississippi

    NASA Astrophysics Data System (ADS)

    Yoon, J.; Zeng, N.; Mariotti, A.; Swenson, S.

    2007-12-01

    In an approach termed the P-E-R (or simply PER) method, we apply the basin water budget equation to diagnose the long-term variability of the total terrestrial water storage (TWS). The key input variables are observed precipitation (P) and runoff (R), and estimated evaporation (E). Unlike typical offline land-surface model estimate where only atmospheric variables are used as input, the direct use of observed runoff in the PER method imposes an important constraint on the diagnosed TWS. Although there lack basin-scale observations of evaporation, the tendency of E to have significantly less variability than the difference between precipitation and runoff (P-R) minimizes the uncertainties originating from estimated evaporation. Compared to the more traditional method using atmospheric moisture convergence (MC) minus R (MCR method), the use of observed precipitation in PER method is expected to lead to general improvement, especially in regions atmospheric radiosonde data are too sparse to constrain the atmospheric model analyzed MC such as in the remote tropics. TWS was diagnosed using the PER method for the Amazon (1970-2006) and the Mississippi Basin (1928-2006), and compared with MCR method, land-surface model and reanalyses, and NASA's GRACE satellite gravity data. The seasonal cycle of diagnosed TWS over the Amazon is about 300 mm. The interannual TWS variability in these two basins are 100-200 mm, but multi-dacadal changes can be as large as 600-800 mm. Major droughts such as the Dust Bowl period had large impact with water storage depleted by 500 mm over a decade. Within the short period 2003-2006 when GRACE data was available, PER and GRACE show good agreement both for seasonal cycle and interannual variability, providing potential to cross-validate each other. In contrast, land-surface model results are significantly smaller than PER and GRACE, especially towards longer timescales. While we currently lack independent means to verify these long-term changes, simple error analysis using 3 precipitation datasets and 3 evaporation estimates suggest that the multi-decadal amplitude can be uncertain up to a factor of 2, while the agreement is high on interannual timescales. The large TWS variability implies the remarkable capacity of land-surface in storing and taking up water that may be under-represented in models. The results also suggest the existence of water storage memories on multi-year time scales, significantly longer than typically assumed seasonal timescales associated with surface soil moisture.

  16. Comparing and improving proper orthogonal decomposition (POD) to reduce the complexity of groundwater models

    NASA Astrophysics Data System (ADS)

    Gosses, Moritz; Nowak, Wolfgang; Wöhling, Thomas

    2017-04-01

    Physically-based modeling is a wide-spread tool in understanding and management of natural systems. With the high complexity of many such models and the huge amount of model runs necessary for parameter estimation and uncertainty analysis, overall run times can be prohibitively long even on modern computer systems. An encouraging strategy to tackle this problem are model reduction methods. In this contribution, we compare different proper orthogonal decomposition (POD, Siade et al. (2010)) methods and their potential applications to groundwater models. The POD method performs a singular value decomposition on system states as simulated by the complex (e.g., PDE-based) groundwater model taken at several time-steps, so-called snapshots. The singular vectors with the highest information content resulting from this decomposition are then used as a basis for projection of the system of model equations onto a subspace of much lower dimensionality than the original complex model, thereby greatly reducing complexity and accelerating run times. In its original form, this method is only applicable to linear problems. Many real-world groundwater models are non-linear, tough. These non-linearities are introduced either through model structure (unconfined aquifers) or boundary conditions (certain Cauchy boundaries, like rivers with variable connection to the groundwater table). To date, applications of POD focused on groundwater models simulating pumping tests in confined aquifers with constant head boundaries. In contrast, POD model reduction either greatly looses accuracy or does not significantly reduce model run time if the above-mentioned non-linearities are introduced. We have also found that variable Dirichlet boundaries are problematic for POD model reduction. An extension to the POD method, called POD-DEIM, has been developed for non-linear groundwater models by Stanko et al. (2016). This method uses spatial interpolation points to build the equation system in the reduced model space, thereby allowing the recalculation of system matrices at every time-step necessary for non-linear models while retaining the speed of the reduced model. This makes POD-DEIM applicable for groundwater models simulating unconfined aquifers. However, in our analysis, the method struggled to reproduce variable river boundaries accurately and gave no advantage for variable Dirichlet boundaries compared to the original POD method. We have developed another extension for POD that targets to address these remaining problems by performing a second POD operation on the model matrix on the left-hand side of the equation. The method aims to at least reproduce the accuracy of the other methods where they are applicable while outperforming them for setups with changing river boundaries or variable Dirichlet boundaries. We compared the new extension with original POD and POD-DEIM for different combinations of model structures and boundary conditions. The new method shows the potential of POD extensions for applications to non-linear groundwater systems and complex boundary conditions that go beyond the current, relatively limited range of applications. References: Siade, A. J., Putti, M., and Yeh, W. W.-G. (2010). Snapshot selection for groundwater model reduction using proper orthogonal decomposition. Water Resour. Res., 46(8):W08539. Stanko, Z. P., Boyce, S. E., and Yeh, W. W.-G. (2016). Nonlinear model reduction of unconfined groundwater flow using pod and deim. Advances in Water Resources, 97:130 - 143.

  17. A method for fitting regression splines with varying polynomial order in the linear mixed model.

    PubMed

    Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

    2006-02-15

    The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.

  18. An Alternative Approach for Nonlinear Latent Variable Models

    ERIC Educational Resources Information Center

    Mooijaart, Ab; Bentler, Peter M.

    2010-01-01

    In the last decades there has been an increasing interest in nonlinear latent variable models. Since the seminal paper of Kenny and Judd, several methods have been proposed for dealing with these kinds of models. This article introduces an alternative approach. The methodology involves fitting some third-order moments in addition to the means and…

  19. Finite element implementation of state variable-based viscoplasticity models

    NASA Technical Reports Server (NTRS)

    Iskovitz, I.; Chang, T. Y. P.; Saleeb, A. F.

    1991-01-01

    The implementation of state variable-based viscoplasticity models is made in a general purpose finite element code for structural applications of metals deformed at elevated temperatures. Two constitutive models, Walker's and Robinson's models, are studied in conjunction with two implicit integration methods: the trapezoidal rule with Newton-Raphson iterations and an asymptotic integration algorithm. A comparison is made between the two integration methods, and the latter method appears to be computationally more appealing in terms of numerical accuracy and CPU time. However, in order to make the asymptotic algorithm robust, it is necessary to include a self adaptive scheme with subincremental step control and error checking of the Jacobian matrix at the integration points. Three examples are given to illustrate the numerical aspects of the integration methods tested.

  20. Multi-model blending

    DOEpatents

    Hamann, Hendrik F.; Hwang, Youngdeok; van Kessel, Theodore G.; Khabibrakhmanov, Ildar K.; Muralidhar, Ramachandran

    2016-10-18

    A method and a system to perform multi-model blending are described. The method includes obtaining one or more sets of predictions of historical conditions, the historical conditions corresponding with a time T that is historical in reference to current time, and the one or more sets of predictions of the historical conditions being output by one or more models. The method also includes obtaining actual historical conditions, the actual historical conditions being measured conditions at the time T, assembling a training data set including designating the two or more set of predictions of historical conditions as predictor variables and the actual historical conditions as response variables, and training a machine learning algorithm based on the training data set. The method further includes obtaining a blended model based on the machine learning algorithm.

  1. Bayesian inference for the genetic control of water deficit tolerance in spring wheat by stochastic search variable selection.

    PubMed

    Safari, Parviz; Danyali, Syyedeh Fatemeh; Rahimi, Mehdi

    2018-06-02

    Drought is the main abiotic stress seriously influencing wheat production. Information about the inheritance of drought tolerance is necessary to determine the most appropriate strategy to develop tolerant cultivars and populations. In this study, generation means analysis to identify the genetic effects controlling grain yield inheritance in water deficit and normal conditions was considered as a model selection problem in a Bayesian framework. Stochastic search variable selection (SSVS) was applied to identify the most important genetic effects and the best fitted models using different generations obtained from two crosses applying two water regimes in two growing seasons. The SSVS is used to evaluate the effect of each variable on the dependent variable via posterior variable inclusion probabilities. The model with the highest posterior probability is selected as the best model. In this study, the grain yield was controlled by the main effects (additive and non-additive effects) and epistatic. The results demonstrate that breeding methods such as recurrent selection and subsequent pedigree method and hybrid production can be useful to improve grain yield.

  2. A comparison of data-driven groundwater vulnerability assessment methods

    USGS Publications Warehouse

    Sorichetta, Alessandro; Ballabio, Cristiano; Masetti, Marco; Robinson, Gilpin R.; Sterlacchini, Simone

    2013-01-01

    Increasing availability of geo-environmental data has promoted the use of statistical methods to assess groundwater vulnerability. Nitrate is a widespread anthropogenic contaminant in groundwater and its occurrence can be used to identify aquifer settings vulnerable to contamination. In this study, multivariate Weights of Evidence (WofE) and Logistic Regression (LR) methods, where the response variable is binary, were used to evaluate the role and importance of a number of explanatory variables associated with nitrate sources and occurrence in groundwater in the Milan District (central part of the Po Plain, Italy). The results of these models have been used to map the spatial variation of groundwater vulnerability to nitrate in the region, and we compare the similarities and differences of their spatial patterns and associated explanatory variables. We modify the standard WofE method used in previous groundwater vulnerability studies to a form analogous to that used in LR; this provides a framework to compare the results of both models and reduces the effect of sampling bias on the results of the standard WofE model. In addition, a nonlinear Generalized Additive Model has been used to extend the LR analysis. Both approaches improved discrimination of the standard WofE and LR models, as measured by the c-statistic. Groundwater vulnerability probability outputs, based on rank-order classification of the respective model results, were similar in spatial patterns and identified similar strong explanatory variables associated with nitrate source (population density as a proxy for sewage systems and septic sources) and nitrate occurrence (groundwater depth).

  3. Variability aware compact model characterization for statistical circuit design optimization

    NASA Astrophysics Data System (ADS)

    Qiao, Ying; Qian, Kun; Spanos, Costas J.

    2012-03-01

    Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose an efficient variabilityaware compact model characterization methodology based on the linear propagation of variance. Hierarchical spatial variability patterns of selected compact model parameters are directly calculated from transistor array test structures. This methodology has been implemented and tested using transistor I-V measurements and the EKV-EPFL compact model. Calculation results compare well to full-wafer direct model parameter extractions. Further studies are done on the proper selection of both compact model parameters and electrical measurement metrics used in the method.

  4. A Generalized Stability Analysis of the AMOC in Earth System Models: Implication for Decadal Variability and Abrupt Climate Change

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fedorov, Alexey V.

    2015-01-14

    The central goal of this research project was to understand the mechanisms of decadal and multi-decadal variability of the Atlantic Meridional Overturning Circulation (AMOC) as related to climate variability and abrupt climate change within a hierarchy of climate models ranging from realistic ocean models to comprehensive Earth system models. Generalized Stability Analysis, a method that quantifies the transient and asymptotic growth of perturbations in the system, is one of the main approaches used throughout this project. The topics we have explored range from physical mechanisms that control AMOC variability to the factors that determine AMOC predictability in the Earth systemmore » models, to the stability and variability of the AMOC in past climates.« less

  5. Harmonize input selection for sediment transport prediction

    NASA Astrophysics Data System (ADS)

    Afan, Haitham Abdulmohsin; Keshtegar, Behrooz; Mohtar, Wan Hanna Melini Wan; El-Shafie, Ahmed

    2017-09-01

    In this paper, three modeling approaches using a Neural Network (NN), Response Surface Method (RSM) and response surface method basis Global Harmony Search (GHS) are applied to predict the daily time series suspended sediment load. Generally, the input variables for forecasting the suspended sediment load are manually selected based on the maximum correlations of input variables in the modeling approaches based on NN and RSM. The RSM is improved to select the input variables by using the errors terms of training data based on the GHS, namely as response surface method and global harmony search (RSM-GHS) modeling method. The second-order polynomial function with cross terms is applied to calibrate the time series suspended sediment load with three, four and five input variables in the proposed RSM-GHS. The linear, square and cross corrections of twenty input variables of antecedent values of suspended sediment load and water discharge are investigated to achieve the best predictions of the RSM based on the GHS method. The performances of the NN, RSM and proposed RSM-GHS including both accuracy and simplicity are compared through several comparative predicted and error statistics. The results illustrated that the proposed RSM-GHS is as uncomplicated as the RSM but performed better, where fewer errors and better correlation was observed (R = 0.95, MAE = 18.09 (ton/day), RMSE = 25.16 (ton/day)) compared to the ANN (R = 0.91, MAE = 20.17 (ton/day), RMSE = 33.09 (ton/day)) and RSM (R = 0.91, MAE = 20.06 (ton/day), RMSE = 31.92 (ton/day)) for all types of input variables.

  6. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  7. Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example Using National Data on Drug Injection in Prisons

    PubMed Central

    Haji-Maghsoudi, Saiedeh; Haghdoost, Ali-akbar; Rastegari, Azam; Baneshi, Mohammad Reza

    2013-01-01

    Background: Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern, to be addressed here, is the role of the pattern of missing data. Methods: We used information of 2720 prisoners. Results derived from fitting regression model to whole data were served as gold standard. Missing data were then generated so that 10%, 20% and 50% of data were lost. In scenario 1, we generated missing values, at above rates, in one variable which was significant in gold model (age). In scenario 2, a small proportion of each of independent variable was dropped out. Four imputation methods, under different Event Per Variable (EPV) values, were compared in terms of selection of important variables and parameter estimation. Results: In scenario 2, bias in estimates was low and performances of all methods for handing missing data were similar. All methods at all missing rates were able to detect significance of age. In scenario 1, biases in estimations were increased, in particular at 50% missing rate. Here at EPVs of 10 and 5, imputation methods failed to capture effect of age. Conclusion: In scenario 2, all imputation methods at all missing rates, were able to detect age as being significant. This was not the case in scenario 1. Our results showed that performance of imputation methods depends on the pattern of missing data. PMID:24596839

  8. COMPUTATIONAL METHODS FOR SENSITIVITY AND UNCERTAINTY ANALYSIS FOR ENVIRONMENTAL AND BIOLOGICAL MODELS

    EPA Science Inventory

    This work introduces a computationally efficient alternative method for uncertainty propagation, the Stochastic Response Surface Method (SRSM). The SRSM approximates uncertainties in model outputs through a series expansion in normal random variables (polynomial chaos expansion)...

  9. Model selection bias and Freedman's paradox

    USGS Publications Warehouse

    Lukacs, P.M.; Burnham, K.P.; Anderson, D.R.

    2010-01-01

    In situations where limited knowledge of a system exists and the ratio of data points to variables is small, variable selection methods can often be misleading. Freedman (Am Stat 37:152-155, 1983) demonstrated how common it is to select completely unrelated variables as highly "significant" when the number of data points is similar in magnitude to the number of variables. A new type of model averaging estimator based on model selection with Akaike's AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly "best" model from a (often large) set of models employing many predictor variables. The new model averaging estimator helps reduce these problems and provides confidence interval coverage at the nominal level while traditional stepwise selection has poor inferential properties. ?? The Institute of Statistical Mathematics, Tokyo 2009.

  10. Time series analysis as input for clinical predictive modeling: Modeling cardiac arrest in a pediatric ICU

    PubMed Central

    2011-01-01

    Background Thousands of children experience cardiac arrest events every year in pediatric intensive care units. Most of these children die. Cardiac arrest prediction tools are used as part of medical emergency team evaluations to identify patients in standard hospital beds that are at high risk for cardiac arrest. There are no models to predict cardiac arrest in pediatric intensive care units though, where the risk of an arrest is 10 times higher than for standard hospital beds. Current tools are based on a multivariable approach that does not characterize deterioration, which often precedes cardiac arrests. Characterizing deterioration requires a time series approach. The purpose of this study is to propose a method that will allow for time series data to be used in clinical prediction models. Successful implementation of these methods has the potential to bring arrest prediction to the pediatric intensive care environment, possibly allowing for interventions that can save lives and prevent disabilities. Methods We reviewed prediction models from nonclinical domains that employ time series data, and identified the steps that are necessary for building predictive models using time series clinical data. We illustrate the method by applying it to the specific case of building a predictive model for cardiac arrest in a pediatric intensive care unit. Results Time course analysis studies from genomic analysis provided a modeling template that was compatible with the steps required to develop a model from clinical time series data. The steps include: 1) selecting candidate variables; 2) specifying measurement parameters; 3) defining data format; 4) defining time window duration and resolution; 5) calculating latent variables for candidate variables not directly measured; 6) calculating time series features as latent variables; 7) creating data subsets to measure model performance effects attributable to various classes of candidate variables; 8) reducing the number of candidate features; 9) training models for various data subsets; and 10) measuring model performance characteristics in unseen data to estimate their external validity. Conclusions We have proposed a ten step process that results in data sets that contain time series features and are suitable for predictive modeling by a number of methods. We illustrated the process through an example of cardiac arrest prediction in a pediatric intensive care setting. PMID:22023778

  11. Learning planar Ising models

    DOE PAGES

    Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael; ...

    2016-12-01

    Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less

  12. Learning planar Ising models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael

    Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less

  13. Study on the variable cycle engine modeling techniques based on the component method

    NASA Astrophysics Data System (ADS)

    Zhang, Lihua; Xue, Hui; Bao, Yuhai; Li, Jijun; Yan, Lan

    2016-01-01

    Based on the structure platform of the gas turbine engine, the components of variable cycle engine were simulated by using the component method. The mathematical model of nonlinear equations correspondeing to each component of the gas turbine engine was established. Based on Matlab programming, the nonlinear equations were solved by using Newton-Raphson steady-state algorithm, and the performance of the components for engine was calculated. The numerical simulation results showed that the model bulit can describe the basic performance of the gas turbine engine, which verified the validity of the model.

  14. Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time‐to‐Event Analysis

    PubMed Central

    Gong, Xiajing; Hu, Meng

    2018-01-01

    Abstract Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time‐to‐event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high‐dimensional data featured by a large number of predictor variables. Our results showed that ML‐based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high‐dimensional data. The prediction performances of ML‐based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML‐based methods provide a powerful tool for time‐to‐event analysis, with a built‐in capacity for high‐dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. PMID:29536640

  15. Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method.

    PubMed

    Norris, Peter M; da Silva, Arlindo M

    2016-07-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.

  16. Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability Using High-Resolution Cloud Observations. Part 1: Method

    NASA Technical Reports Server (NTRS)

    Norris, Peter M.; Da Silva, Arlindo M.

    2016-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.

  17. Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method

    PubMed Central

    Norris, Peter M.; da Silva, Arlindo M.

    2018-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847

  18. Multinomial N-mixture models improve the applicability of electrofishing for developing population estimates of stream-dwelling Smallmouth Bass

    USGS Publications Warehouse

    Mollenhauer, Robert; Brewer, Shannon K.

    2017-01-01

    Failure to account for variable detection across survey conditions constrains progressive stream ecology and can lead to erroneous stream fish management and conservation decisions. In addition to variable detection’s confounding long-term stream fish population trends, reliable abundance estimates across a wide range of survey conditions are fundamental to establishing species–environment relationships. Despite major advancements in accounting for variable detection when surveying animal populations, these approaches remain largely ignored by stream fish scientists, and CPUE remains the most common metric used by researchers and managers. One notable advancement for addressing the challenges of variable detection is the multinomial N-mixture model. Multinomial N-mixture models use a flexible hierarchical framework to model the detection process across sites as a function of covariates; they also accommodate common fisheries survey methods, such as removal and capture–recapture. Effective monitoring of stream-dwelling Smallmouth Bass Micropterus dolomieu populations has long been challenging; therefore, our objective was to examine the use of multinomial N-mixture models to improve the applicability of electrofishing for estimating absolute abundance. We sampled Smallmouth Bass populations by using tow-barge electrofishing across a range of environmental conditions in streams of the Ozark Highlands ecoregion. Using an information-theoretic approach, we identified effort, water clarity, wetted channel width, and water depth as covariates that were related to variable Smallmouth Bass electrofishing detection. Smallmouth Bass abundance estimates derived from our top model consistently agreed with baseline estimates obtained via snorkel surveys. Additionally, confidence intervals from the multinomial N-mixture models were consistently more precise than those of unbiased Petersen capture–recapture estimates due to the dependency among data sets in the hierarchical framework. We demonstrate the application of this contemporary population estimation method to address a longstanding stream fish management issue. We also detail the advantages and trade-offs of hierarchical population estimation methods relative to CPUE and estimation methods that model each site separately.

  19. Comparison of CTT and Rasch-based approaches for the analysis of longitudinal Patient Reported Outcomes.

    PubMed

    Blanchin, Myriam; Hardouin, Jean-Benoit; Le Neel, Tanguy; Kubis, Gildas; Blanchard, Claire; Mirallié, Eric; Sébille, Véronique

    2011-04-15

    Health sciences frequently deal with Patient Reported Outcomes (PRO) data for the evaluation of concepts, in particular health-related quality of life, which cannot be directly measured and are often called latent variables. Two approaches are commonly used for the analysis of such data: Classical Test Theory (CTT) and Item Response Theory (IRT). Longitudinal data are often collected to analyze the evolution of an outcome over time. The most adequate strategy to analyze longitudinal latent variables, which can be either based on CTT or IRT models, remains to be identified. This strategy must take into account the latent characteristic of what PROs are intended to measure as well as the specificity of longitudinal designs. A simple and widely used IRT model is the Rasch model. The purpose of our study was to compare CTT and Rasch-based approaches to analyze longitudinal PRO data regarding type I error, power, and time effect estimation bias. Four methods were compared: the Score and Mixed models (SM) method based on the CTT approach, the Rasch and Mixed models (RM), the Plausible Values (PV), and the Longitudinal Rasch model (LRM) methods all based on the Rasch model. All methods have shown comparable results in terms of type I error, all close to 5 per cent. LRM and SM methods presented comparable power and unbiased time effect estimations, whereas RM and PV methods showed low power and biased time effect estimations. This suggests that RM and PV methods should be avoided to analyze longitudinal latent variables. Copyright © 2010 John Wiley & Sons, Ltd.

  20. (Bayesian) Inference for X-ray Timing

    NASA Astrophysics Data System (ADS)

    Huppenkothen, Daniela

    2016-07-01

    Fourier techniques have been incredibly successful in describing variability of X-ray binaries (XRBs) and Active Galactic Nuclei (AGN). The detection and characterization of both broadband noise components and quasi-periodic oscillations as well as their behavior in the context of spectral changes during XRB outbursts has become an important tool for studying the physical processes of accretion and ejection in these systems. In this talk, I will review state-of-the-art techniques for characterizing variability in compact objects and show how these methods help us understand the causes of the observed variability and how we may use it to probe fundamental physics. Despite numerous successes, however, it has also become clear that many scientific questions cannot be answered with traditional timing methods alone. I will therefore also present recent advances, some in the time domain like CARMA, to modeling variability with generative models and discuss where these methods might lead us in the future.

  1. A method of fitting the gravity model based on the Poisson distribution.

    PubMed

    Flowerdew, R; Aitkin, M

    1982-05-01

    "In this paper, [the authors] suggest an alternative method for fitting the gravity model. In this method, the interaction variable is treated as the outcome of a discrete probability process, whose mean is a function of the size and distance variables. This treatment seems appropriate when the dependent variable represents a count of the number of items (people, vehicles, shipments) moving from one place to another. It would seem to have special advantages where there are some pairs of places between which few items move. The argument will be illustrated with reference to data on the numbers of migrants moving in 1970-1971 between pairs of the 126 labor market areas defined for Great Britain...." excerpt

  2. Variability And Uncertainty Analysis Of Contaminant Transport Model Using Fuzzy Latin Hypercube Sampling Technique

    NASA Astrophysics Data System (ADS)

    Kumar, V.; Nayagum, D.; Thornton, S.; Banwart, S.; Schuhmacher2, M.; Lerner, D.

    2006-12-01

    Characterization of uncertainty associated with groundwater quality models is often of critical importance, as for example in cases where environmental models are employed in risk assessment. Insufficient data, inherent variability and estimation errors of environmental model parameters introduce uncertainty into model predictions. However, uncertainty analysis using conventional methods such as standard Monte Carlo sampling (MCS) may not be efficient, or even suitable, for complex, computationally demanding models and involving different nature of parametric variability and uncertainty. General MCS or variant of MCS such as Latin Hypercube Sampling (LHS) assumes variability and uncertainty as a single random entity and the generated samples are treated as crisp assuming vagueness as randomness. Also when the models are used as purely predictive tools, uncertainty and variability lead to the need for assessment of the plausible range of model outputs. An improved systematic variability and uncertainty analysis can provide insight into the level of confidence in model estimates, and can aid in assessing how various possible model estimates should be weighed. The present study aims to introduce, Fuzzy Latin Hypercube Sampling (FLHS), a hybrid approach of incorporating cognitive and noncognitive uncertainties. The noncognitive uncertainty such as physical randomness, statistical uncertainty due to limited information, etc can be described by its own probability density function (PDF); whereas the cognitive uncertainty such estimation error etc can be described by the membership function for its fuzziness and confidence interval by ?-cuts. An important property of this theory is its ability to merge inexact generated data of LHS approach to increase the quality of information. The FLHS technique ensures that the entire range of each variable is sampled with proper incorporation of uncertainty and variability. A fuzzified statistical summary of the model results will produce indices of sensitivity and uncertainty that relate the effects of heterogeneity and uncertainty of input variables to model predictions. The feasibility of the method is validated to assess uncertainty propagation of parameter values for estimation of the contamination level of a drinking water supply well due to transport of dissolved phenolics from a contaminated site in the UK.

  3. Applying genetic algorithms to set the optimal combination of forest fire related variables and model forest fire susceptibility based on data mining models. The case of Dayu County, China.

    PubMed

    Hong, Haoyuan; Tsangaratos, Paraskevas; Ilia, Ioanna; Liu, Junzhi; Zhu, A-Xing; Xu, Chong

    2018-07-15

    The main objective of the present study was to utilize Genetic Algorithms (GA) in order to obtain the optimal combination of forest fire related variables and apply data mining methods for constructing a forest fire susceptibility map. In the proposed approach, a Random Forest (RF) and a Support Vector Machine (SVM) was used to produce a forest fire susceptibility map for the Dayu County which is located in southwest of Jiangxi Province, China. For this purpose, historic forest fires and thirteen forest fire related variables were analyzed, namely: elevation, slope angle, aspect, curvature, land use, soil cover, heat load index, normalized difference vegetation index, mean annual temperature, mean annual wind speed, mean annual rainfall, distance to river network and distance to road network. The Natural Break and the Certainty Factor method were used to classify and weight the thirteen variables, while a multicollinearity analysis was performed to determine the correlation among the variables and decide about their usability. The optimal set of variables, determined by the GA limited the number of variables into eight excluding from the analysis, aspect, land use, heat load index, distance to river network and mean annual rainfall. The performance of the forest fire models was evaluated by using the area under the Receiver Operating Characteristic curve (ROC-AUC) based on the validation dataset. Overall, the RF models gave higher AUC values. Also the results showed that the proposed optimized models outperform the original models. Specifically, the optimized RF model gave the best results (0.8495), followed by the original RF (0.8169), while the optimized SVM gave lower values (0.7456) than the RF, however higher than the original SVM (0.7148) model. The study highlights the significance of feature selection techniques in forest fire susceptibility, whereas data mining methods could be considered as a valid approach for forest fire susceptibility modeling. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. Stochastic model search with binary outcomes for genome-wide association studies.

    PubMed

    Russu, Alberto; Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo

    2012-06-01

    The spread of case-control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model.

  5. An adaptive sampling method for variable-fidelity surrogate models using improved hierarchical kriging

    NASA Astrophysics Data System (ADS)

    Hu, Jiexiang; Zhou, Qi; Jiang, Ping; Shao, Xinyu; Xie, Tingli

    2018-01-01

    Variable-fidelity (VF) modelling methods have been widely used in complex engineering system design to mitigate the computational burden. Building a VF model generally includes two parts: design of experiments and metamodel construction. In this article, an adaptive sampling method based on improved hierarchical kriging (ASM-IHK) is proposed to refine the improved VF model. First, an improved hierarchical kriging model is developed as the metamodel, in which the low-fidelity model is varied through a polynomial response surface function to capture the characteristics of a high-fidelity model. Secondly, to reduce local approximation errors, an active learning strategy based on a sequential sampling method is introduced to make full use of the already required information on the current sampling points and to guide the sampling process of the high-fidelity model. Finally, two numerical examples and the modelling of the aerodynamic coefficient for an aircraft are provided to demonstrate the approximation capability of the proposed approach, as well as three other metamodelling methods and two sequential sampling methods. The results show that ASM-IHK provides a more accurate metamodel at the same simulation cost, which is very important in metamodel-based engineering design problems.

  6. Data driven model generation based on computational intelligence

    NASA Astrophysics Data System (ADS)

    Gemmar, Peter; Gronz, Oliver; Faust, Christophe; Casper, Markus

    2010-05-01

    The simulation of discharges at a local gauge or the modeling of large scale river catchments are effectively involved in estimation and decision tasks of hydrological research and practical applications like flood prediction or water resource management. However, modeling such processes using analytical or conceptual approaches is made difficult by both complexity of process relations and heterogeneity of processes. It was shown manifold that unknown or assumed process relations can principally be described by computational methods, and that system models can automatically be derived from observed behavior or measured process data. This study describes the development of hydrological process models using computational methods including Fuzzy logic and artificial neural networks (ANN) in a comprehensive and automated manner. Methods We consider a closed concept for data driven development of hydrological models based on measured (experimental) data. The concept is centered on a Fuzzy system using rules of Takagi-Sugeno-Kang type which formulate the input-output relation in a generic structure like Ri : IFq(t) = lowAND...THENq(t+Δt) = ai0 +ai1q(t)+ai2p(t-Δti1)+ai3p(t+Δti2)+.... The rule's premise part (IF) describes process states involving available process information, e.g. actual outlet q(t) is low where low is one of several Fuzzy sets defined over variable q(t). The rule's conclusion (THEN) estimates expected outlet q(t + Δt) by a linear function over selected system variables, e.g. actual outlet q(t), previous and/or forecasted precipitation p(t ?Δtik). In case of river catchment modeling we use head gauges, tributary and upriver gauges in the conclusion part as well. In addition, we consider temperature and temporal (season) information in the premise part. By creating a set of rules R = {Ri|(i = 1,...,N)} the space of process states can be covered as concise as necessary. Model adaptation is achieved by finding on optimal set A = (aij) of conclusion parameters with respect to a defined rating function and experimental data. To find A, we use for example a linear equation solver and RMSE-function. In practical process models, the number of Fuzzy sets and the according number of rules is fairly low. Nevertheless, creating the optimal model requires some experience. Therefore, we improved this development step by methods for automatic generation of Fuzzy sets, rules, and conclusions. Basically, the model achievement depends to a great extend on the selection of the conclusion variables. It is the aim that variables having most influence on the system reaction being considered and superfluous ones being neglected. At first, we use Kohonen maps, a specialized ANN, to identify relevant input variables from the large set of available system variables. A greedy algorithm selects a comprehensive set of dominant and uncorrelated variables. Next, the premise variables are analyzed with clustering methods (e.g. Fuzzy-C-means) and Fuzzy sets are then derived from cluster centers and outlines. The rule base is automatically constructed by permutation of the Fuzzy sets of the premise variables. Finally, the conclusion parameters are calculated and the total coverage of the input space is iteratively tested with experimental data, rarely firing rules are combined and coarse coverage of sensitive process states results in refined Fuzzy sets and rules. Results The described methods were implemented and integrated in a development system for process models. A series of models has already been built e.g. for rainfall-runoff modeling or for flood prediction (up to 72 hours) in river catchments. The models required significantly less development effort and showed advanced simulation results compared to conventional models. The models can be used operationally and simulation takes only some minutes on a standard PC e.g. for a gauge forecast (up to 72 hours) for the whole Mosel (Germany) river catchment.

  7. Vector wind profile gust model

    NASA Technical Reports Server (NTRS)

    Adelfang, S. I.

    1981-01-01

    To enable development of a vector wind gust model suitable for orbital flight test operations and trade studies, hypotheses concerning the distributions of gust component variables were verified. Methods for verification of hypotheses that observed gust variables, including gust component magnitude, gust length, u range, and L range, are gamma distributed and presented. Observed gust modulus has been drawn from a bivariate gamma distribution that can be approximated with a Weibull distribution. Zonal and meridional gust components are bivariate gamma distributed. An analytical method for testing for bivariate gamma distributed variables is presented. Two distributions for gust modulus are described and the results of extensive hypothesis testing of one of the distributions are presented. The validity of the gamma distribution for representation of gust component variables is established.

  8. Efficient Variable Selection Method for Exposure Variables on Binary Data

    NASA Astrophysics Data System (ADS)

    Ohno, Manabu; Tarumi, Tomoyuki

    In this paper, we propose a new variable selection method for "robust" exposure variables. We define "robust" as property that the same variable can select among original data and perturbed data. There are few studies of effective for the selection method. The problem that selects exposure variables is almost the same as a problem that extracts correlation rules without robustness. [Brin 97] is suggested that correlation rules are possible to extract efficiently using chi-squared statistic of contingency table having monotone property on binary data. But the chi-squared value does not have monotone property, so it's is easy to judge the method to be not independent with an increase in the dimension though the variable set is completely independent, and the method is not usable in variable selection for robust exposure variables. We assume anti-monotone property for independent variables to select robust independent variables and use the apriori algorithm for it. The apriori algorithm is one of the algorithms which find association rules from the market basket data. The algorithm use anti-monotone property on the support which is defined by association rules. But independent property does not completely have anti-monotone property on the AIC of independent probability model, but the tendency to have anti-monotone property is strong. Therefore, selected variables with anti-monotone property on the AIC have robustness. Our method judges whether a certain variable is exposure variable for the independent variable using previous comparison of the AIC. Our numerical experiments show that our method can select robust exposure variables efficiently and precisely.

  9. Cost Modeling for Space Telescope

    NASA Technical Reports Server (NTRS)

    Stahl, H. Philip

    2011-01-01

    Parametric cost models are an important tool for planning missions, compare concepts and justify technology investments. This paper presents on-going efforts to develop single variable and multi-variable cost models for space telescope optical telescope assembly (OTA). These models are based on data collected from historical space telescope missions. Standard statistical methods are used to derive CERs for OTA cost versus aperture diameter and mass. The results are compared with previously published models.

  10. Stochastic model search with binary outcomes for genome-wide association studies

    PubMed Central

    Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo

    2012-01-01

    Objective The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Materials and methods Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. Results BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. Discussion BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. Conclusion The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model. PMID:22534080

  11. A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.

    PubMed

    Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger

    2018-04-19

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.

  12. Evaluation of alternative model selection criteria in the analysis of unimodal response curves using CART

    USGS Publications Warehouse

    Ribic, C.A.; Miller, T.W.

    1998-01-01

    We investigated CART performance with a unimodal response curve for one continuous response and four continuous explanatory variables, where two variables were important (ie directly related to the response) and the other two were not. We explored performance under three relationship strengths and two explanatory variable conditions: equal importance and one variable four times as important as the other. We compared CART variable selection performance using three tree-selection rules ('minimum risk', 'minimum risk complexity', 'one standard error') to stepwise polynomial ordinary least squares (OLS) under four sample size conditions. The one-standard-error and minimum-risk-complexity methods performed about as well as stepwise OLS with large sample sizes when the relationship was strong. With weaker relationships, equally important explanatory variables and larger sample sizes, the one-standard-error and minimum-risk-complexity rules performed better than stepwise OLS. With weaker relationships and explanatory variables of unequal importance, tree-structured methods did not perform as well as stepwise OLS. Comparing performance within tree-structured methods, with a strong relationship and equally important explanatory variables, the one-standard-error-rule was more likely to choose the correct model than were the other tree-selection rules 1) with weaker relationships and equally important explanatory variables; and 2) under all relationship strengths when explanatory variables were of unequal importance and sample sizes were lower.

  13. An Improved Estimation Using Polya-Gamma Augmentation for Bayesian Structural Equation Models with Dichotomous Variables

    ERIC Educational Resources Information Center

    Kim, Seohyun; Lu, Zhenqiu; Cohen, Allan S.

    2018-01-01

    Bayesian algorithms have been used successfully in the social and behavioral sciences to analyze dichotomous data particularly with complex structural equation models. In this study, we investigate the use of the Polya-Gamma data augmentation method with Gibbs sampling to improve estimation of structural equation models with dichotomous variables.…

  14. Durability reliability analysis for corroding concrete structures under uncertainty

    NASA Astrophysics Data System (ADS)

    Zhang, Hao

    2018-02-01

    This paper presents a durability reliability analysis of reinforced concrete structures subject to the action of marine chloride. The focus is to provide insight into the role of epistemic uncertainties on durability reliability. The corrosion model involves a number of variables whose probabilistic characteristics cannot be fully determined due to the limited availability of supporting data. All sources of uncertainty, both aleatory and epistemic, should be included in the reliability analysis. Two methods are available to formulate the epistemic uncertainty: the imprecise probability-based method and the purely probabilistic method in which the epistemic uncertainties are modeled as random variables. The paper illustrates how the epistemic uncertainties are modeled and propagated in the two methods, and shows how epistemic uncertainties govern the durability reliability.

  15. Effect of genetic algorithm as a variable selection method on different chemometric models applied for the analysis of binary mixture of amoxicillin and flucloxacillin: A comparative study

    NASA Astrophysics Data System (ADS)

    Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed

    2016-03-01

    Different chemometric models were applied for the quantitative analysis of amoxicillin (AMX), and flucloxacillin (FLX) in their binary mixtures, namely, partial least squares (PLS), spectral residual augmented classical least squares (SRACLS), concentration residual augmented classical least squares (CRACLS) and artificial neural networks (ANNs). All methods were applied with and without variable selection procedure (genetic algorithm GA). The methods were used for the quantitative analysis of the drugs in laboratory prepared mixtures and real market sample via handling the UV spectral data. Robust and simpler models were obtained by applying GA. The proposed methods were found to be rapid, simple and required no preliminary separation steps.

  16. Uncertainty Quantification of Water Quality in Tamsui River in Taiwan

    NASA Astrophysics Data System (ADS)

    Kao, D.; Tsai, C.

    2017-12-01

    In Taiwan, modeling of non-point source pollution is unavoidably associated with uncertainty. The main purpose of this research is to better understand water contamination in the metropolitan Taipei area, and also to provide a new analysis method for government or companies to establish related control and design measures. In this research, three methods are utilized to carry out the uncertainty analysis step by step with Mike 21, which is widely used for hydro-dynamics and water quality modeling, and the study area is focused on Tamsui river watershed. First, a sensitivity analysis is conducted which can be used to rank the order of influential parameters and variables such as Dissolved Oxygen, Nitrate, Ammonia and Phosphorous. Then we use the First-order error method (FOEA) to determine the number of parameters that could significantly affect the variability of simulation results. Finally, a state-of-the-art method for uncertainty analysis called the Perturbance moment method (PMM) is applied in this research, which is more efficient than the Monte-Carlo simulation (MCS). For MCS, the calculations may become cumbersome when involving multiple uncertain parameters and variables. For PMM, three representative points are used for each random variable, and the statistical moments (e.g., mean value, standard deviation) for the output can be presented by the representative points and perturbance moments based on the parallel axis theorem. With the assumption of the independent parameters and variables, calculation time is significantly reduced for PMM as opposed to MCS for a comparable modeling accuracy.

  17. Efficient SRAM yield optimization with mixture surrogate modeling

    NASA Astrophysics Data System (ADS)

    Zhongjian, Jiang; Zuochang, Ye; Yan, Wang

    2016-12-01

    Largely repeated cells such as SRAM cells usually require extremely low failure-rate to ensure a moderate chi yield. Though fast Monte Carlo methods such as importance sampling and its variants can be used for yield estimation, they are still very expensive if one needs to perform optimization based on such estimations. Typically the process of yield calculation requires a lot of SPICE simulation. The circuit SPICE simulation analysis accounted for the largest proportion of time in the process yield calculation. In the paper, a new method is proposed to address this issue. The key idea is to establish an efficient mixture surrogate model. The surrogate model is based on the design variables and process variables. This model construction method is based on the SPICE simulation to get a certain amount of sample points, these points are trained for mixture surrogate model by the lasso algorithm. Experimental results show that the proposed model is able to calculate accurate yield successfully and it brings significant speed ups to the calculation of failure rate. Based on the model, we made a further accelerated algorithm to further enhance the speed of the yield calculation. It is suitable for high-dimensional process variables and multi-performance applications.

  18. Evaluation of variable selection methods for random forests and omics data sets.

    PubMed

    Degenhardt, Frauke; Seifert, Stephan; Szymczak, Silke

    2017-10-16

    Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our comparison included the Boruta algorithm, the Vita method, recurrent relative variable importance, a permutation approach and its parametric variant (Altmann) as well as recursive feature elimination (RFE). In our simulation studies, Boruta was the most powerful approach, followed closely by the Vita method. Both approaches demonstrated similar stability in variable selection, while Vita was the most robust approach under a pure null model without any predictor variables related to the outcome. In the analysis of the different experimental data sets, Vita demonstrated slightly better stability in variable selection and was less computationally intensive than Boruta.In conclusion, we recommend the Boruta and Vita approaches for the analysis of high-dimensional data sets. Vita is considerably faster than Boruta and thus more suitable for large data sets, but only Boruta can also be applied in low-dimensional settings. © The Author 2017. Published by Oxford University Press.

  19. Review of Methods for Buildings Energy Performance Modelling

    NASA Astrophysics Data System (ADS)

    Krstić, Hrvoje; Teni, Mihaela

    2017-10-01

    Research presented in this paper gives a brief review of methods used for buildings energy performance modelling. This paper gives also a comprehensive review of the advantages and disadvantages of available methods as well as the input parameters used for modelling buildings energy performance. European Directive EPBD obliges the implementation of energy certification procedure which gives an insight on buildings energy performance via exiting energy certificate databases. Some of the methods for buildings energy performance modelling mentioned in this paper are developed by employing data sets of buildings which have already undergone an energy certification procedure. Such database is used in this paper where the majority of buildings in the database have already gone under some form of partial retrofitting - replacement of windows or installation of thermal insulation but still have poor energy performance. The case study presented in this paper utilizes energy certificates database obtained from residential units in Croatia (over 400 buildings) in order to determine the dependence between buildings energy performance and variables from database by using statistical dependencies tests. Building energy performance in database is presented with building energy efficiency rate (from A+ to G) which is based on specific annual energy needs for heating for referential climatic data [kWh/(m2a)]. Independent variables in database are surfaces and volume of the conditioned part of the building, building shape factor, energy used for heating, CO2 emission, building age and year of reconstruction. Research results presented in this paper give an insight in possibilities of methods used for buildings energy performance modelling. Further on it gives an analysis of dependencies between buildings energy performance as a dependent variable and independent variables from the database. Presented results could be used for development of new building energy performance predictive model.

  20. Analysis of Uncertainty and Variability in Finite Element Computational Models for Biomedical Engineering: Characterization and Propagation

    PubMed Central

    Mangado, Nerea; Piella, Gemma; Noailly, Jérôme; Pons-Prats, Jordi; Ballester, Miguel Ángel González

    2016-01-01

    Computational modeling has become a powerful tool in biomedical engineering thanks to its potential to simulate coupled systems. However, real parameters are usually not accurately known, and variability is inherent in living organisms. To cope with this, probabilistic tools, statistical analysis and stochastic approaches have been used. This article aims to review the analysis of uncertainty and variability in the context of finite element modeling in biomedical engineering. Characterization techniques and propagation methods are presented, as well as examples of their applications in biomedical finite element simulations. Uncertainty propagation methods, both non-intrusive and intrusive, are described. Finally, pros and cons of the different approaches and their use in the scientific community are presented. This leads us to identify future directions for research and methodological development of uncertainty modeling in biomedical engineering. PMID:27872840

  1. Analysis of Uncertainty and Variability in Finite Element Computational Models for Biomedical Engineering: Characterization and Propagation.

    PubMed

    Mangado, Nerea; Piella, Gemma; Noailly, Jérôme; Pons-Prats, Jordi; Ballester, Miguel Ángel González

    2016-01-01

    Computational modeling has become a powerful tool in biomedical engineering thanks to its potential to simulate coupled systems. However, real parameters are usually not accurately known, and variability is inherent in living organisms. To cope with this, probabilistic tools, statistical analysis and stochastic approaches have been used. This article aims to review the analysis of uncertainty and variability in the context of finite element modeling in biomedical engineering. Characterization techniques and propagation methods are presented, as well as examples of their applications in biomedical finite element simulations. Uncertainty propagation methods, both non-intrusive and intrusive, are described. Finally, pros and cons of the different approaches and their use in the scientific community are presented. This leads us to identify future directions for research and methodological development of uncertainty modeling in biomedical engineering.

  2. Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany

    PubMed Central

    Wang, Zhu; Shuangge, Ma; Wang, Ching-Yun

    2017-01-01

    In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using an open-source R package mpath. PMID:26059498

  3. Selection of latent variables for multiple mixed-outcome models

    PubMed Central

    ZHOU, LING; LIN, HUAZHEN; SONG, XINYUAN; LI, YI

    2014-01-01

    Latent variable models have been widely used for modeling the dependence structure of multiple outcomes data. However, the formulation of a latent variable model is often unknown a priori, the misspecification will distort the dependence structure and lead to unreliable model inference. Moreover, multiple outcomes with varying types present enormous analytical challenges. In this paper, we present a class of general latent variable models that can accommodate mixed types of outcomes. We propose a novel selection approach that simultaneously selects latent variables and estimates parameters. We show that the proposed estimator is consistent, asymptotically normal and has the oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of the World Values Survey, a global research project that explores peoples’ values and beliefs and the social and personal characteristics that might influence them. PMID:27642219

  4. A practical approach to Sasang constitutional diagnosis using vocal features

    PubMed Central

    2013-01-01

    Background Sasang constitutional medicine (SCM) is a type of tailored medicine that divides human beings into four Sasang constitutional (SC) types. Diagnosis of SC types is crucial to proper treatment in SCM. Voice characteristics have been used as an essential clue for diagnosing SC types. In the past, many studies tried to extract quantitative vocal features to make diagnosis models; however, these studies were flawed by limited data collected from one or a few sites, long recording time, and low accuracy. We propose a practical diagnosis model having only a few variables, which decreases model complexity. This in turn, makes our model appropriate for clinical applications. Methods A total of 2,341 participants’ voice recordings were used in making a SC classification model and to test the generalization ability of the model. Although the voice data consisted of five vowels and two repeated sentences per participant, we used only the sentence part for our study. A total of 21 features were extracted, and an advanced feature selection method—the least absolute shrinkage and selection operator (LASSO)—was applied to reduce the number of variables for classifier learning. A SC classification model was developed using multinomial logistic regression via LASSO. Results We compared the proposed classification model to the previous study, which used both sentences and five vowels from the same patient’s group. The classification accuracies for the test set were 47.9% and 40.4% for male and female, respectively. Our result showed that the proposed method was superior to the previous study in that it required shorter voice recordings, is more applicable to practical use, and had better generalization performance. Conclusions We proposed a practical SC classification method and showed that our model having fewer variables outperformed the model having many variables in the generalization test. We attempted to reduce the number of variables in two ways: 1) the initial number of candidate features was decreased by considering shorter voice recording, and 2) LASSO was introduced for reducing model complexity. The proposed method is suitable for an actual clinical environment. Moreover, we expect it to yield more stable results because of the model’s simplicity. PMID:24200041

  5. Estimation of mechanical properties of nanomaterials using artificial intelligence methods

    NASA Astrophysics Data System (ADS)

    Vijayaraghavan, V.; Garg, A.; Wong, C. H.; Tai, K.

    2014-09-01

    Computational modeling tools such as molecular dynamics (MD), ab initio, finite element modeling or continuum mechanics models have been extensively applied to study the properties of carbon nanotubes (CNTs) based on given input variables such as temperature, geometry and defects. Artificial intelligence techniques can be used to further complement the application of numerical methods in characterizing the properties of CNTs. In this paper, we have introduced the application of multi-gene genetic programming (MGGP) and support vector regression to formulate the mathematical relationship between the compressive strength of CNTs and input variables such as temperature and diameter. The predictions of compressive strength of CNTs made by these models are compared to those generated using MD simulations. The results indicate that MGGP method can be deployed as a powerful method for predicting the compressive strength of the carbon nanotubes.

  6. Effect of Adding McKenzie Syndrome, Centralization, Directional Preference, and Psychosocial Classification Variables to a Risk-Adjusted Model Predicting Functional Status Outcomes for Patients With Lumbar Impairments.

    PubMed

    Werneke, Mark W; Edmond, Susan; Deutscher, Daniel; Ward, Jason; Grigsby, David; Young, Michelle; McGill, Troy; McClenahan, Brian; Weinberg, Jon; Davidow, Amy L

    2016-09-01

    Study Design Retrospective cohort. Background Patient-classification subgroupings may be important prognostic factors explaining outcomes. Objectives To determine effects of adding classification variables (McKenzie syndrome and pain patterns, including centralization and directional preference; Symptom Checklist Back Pain Prediction Model [SCL BPPM]; and the Fear-Avoidance Beliefs Questionnaire subscales of work and physical activity) to a baseline risk-adjusted model predicting functional status (FS) outcomes. Methods Consecutive patients completed a battery of questionnaires that gathered information on 11 risk-adjustment variables. Physical therapists trained in Mechanical Diagnosis and Therapy methods classified each patient by McKenzie syndromes and pain pattern. Functional status was assessed at discharge by patient-reported outcomes. Only patients with complete data were included. Risk of selection bias was assessed. Prediction of discharge FS was assessed using linear stepwise regression models, allowing 13 variables to enter the model. Significant variables were retained in subsequent models. Model power (R(2)) and beta coefficients for model variables were estimated. Results Two thousand sixty-six patients with lumbar impairments were evaluated. Of those, 994 (48%), 10 (<1%), and 601 (29%) were excluded due to incomplete psychosocial data, McKenzie classification data, and missing FS at discharge, respectively. The final sample for analyses was 723 (35%). Overall R(2) for the baseline prediction FS model was 0.40. Adding classification variables to the baseline model did not result in significant increases in R(2). McKenzie syndrome or pain pattern explained 2.8% and 3.0% of the variance, respectively. When pain pattern and SCL BPPM were added simultaneously, overall model R(2) increased to 0.44. Although none of these increases in R(2) were significant, some classification variables were stronger predictors compared with some other variables included in the baseline model. Conclusion The small added prognostic capabilities identified when combining McKenzie or pain-pattern classifications with the SCL BPPM classification did not significantly improve prediction of FS outcomes in this study. Additional research is warranted to investigate the importance of classification variables compared with those used in the baseline model to maximize predictive power. Level of Evidence Prognosis, level 4. J Orthop Sports Phys Ther 2016;46(9):726-741. Epub 31 Jul 2016. doi:10.2519/jospt.2016.6266.

  7. Estimating trends in the global mean temperature record

    NASA Astrophysics Data System (ADS)

    Poppick, Andrew; Moyer, Elisabeth J.; Stein, Michael L.

    2017-06-01

    Given uncertainties in physical theory and numerical climate simulations, the historical temperature record is often used as a source of empirical information about climate change. Many historical trend analyses appear to de-emphasize physical and statistical assumptions: examples include regression models that treat time rather than radiative forcing as the relevant covariate, and time series methods that account for internal variability in nonparametric rather than parametric ways. However, given a limited data record and the presence of internal variability, estimating radiatively forced temperature trends in the historical record necessarily requires some assumptions. Ostensibly empirical methods can also involve an inherent conflict in assumptions: they require data records that are short enough for naive trend models to be applicable, but long enough for long-timescale internal variability to be accounted for. In the context of global mean temperatures, empirical methods that appear to de-emphasize assumptions can therefore produce misleading inferences, because the trend over the twentieth century is complex and the scale of temporal correlation is long relative to the length of the data record. We illustrate here how a simple but physically motivated trend model can provide better-fitting and more broadly applicable trend estimates and can allow for a wider array of questions to be addressed. In particular, the model allows one to distinguish, within a single statistical framework, between uncertainties in the shorter-term vs. longer-term response to radiative forcing, with implications not only on historical trends but also on uncertainties in future projections. We also investigate the consequence on inferred uncertainties of the choice of a statistical description of internal variability. While nonparametric methods may seem to avoid making explicit assumptions, we demonstrate how even misspecified parametric statistical methods, if attuned to the important characteristics of internal variability, can result in more accurate uncertainty statements about trends.

  8. Fnk Model of Cracking Rate Calculus for a Variable Asymmetry Coefficient

    NASA Astrophysics Data System (ADS)

    Roşca, Vâlcu; Miriţoiu, Cosmin Mihai

    2017-12-01

    In the process of materials fracture, a very important parameter to study is the cracking rate growth da/dN. This paper proposes an analysis of the cracking rate, in a comparative way, by using four mathematical models:1 - polynomial method, by using successive iterations according to the ASTM E647 standard; 2 - model that uses the Paris formula; 3 - Walker formula method; 4 - NASGRO model or Forman - Newman - Konig equation, abbreviated as FNK model. This model is used in the NASA programs studies. For the tests, CT type specimens were made from stainless steel, V2A class, 10TiNiCr175 mark, and loaded to a variable tensile test axial - eccentrically, with the asymmetry coefficients: R= 0.1, 0.3 and 0.5; at the 213K (-60°C) temperature. There are analyzed the cracking rates variations according to the above models, especially through FNK method, highlighting the asymmetry factor variation.

  9. The Role of Auxiliary Variables in Deterministic and Deterministic-Stochastic Spatial Models of Air Temperature in Poland

    NASA Astrophysics Data System (ADS)

    Szymanowski, Mariusz; Kryza, Maciej

    2017-02-01

    Our study examines the role of auxiliary variables in the process of spatial modelling and mapping of climatological elements, with air temperature in Poland used as an example. The multivariable algorithms are the most frequently applied for spatialization of air temperature, and their results in many studies are proved to be better in comparison to those obtained by various one-dimensional techniques. In most of the previous studies, two main strategies were used to perform multidimensional spatial interpolation of air temperature. First, it was accepted that all variables significantly correlated with air temperature should be incorporated into the model. Second, it was assumed that the more spatial variation of air temperature was deterministically explained, the better was the quality of spatial interpolation. The main goal of the paper was to examine both above-mentioned assumptions. The analysis was performed using data from 250 meteorological stations and for 69 air temperature cases aggregated on different levels: from daily means to 10-year annual mean. Two cases were considered for detailed analysis. The set of potential auxiliary variables covered 11 environmental predictors of air temperature. Another purpose of the study was to compare the results of interpolation given by various multivariable methods using the same set of explanatory variables. Two regression models: multiple linear (MLR) and geographically weighted (GWR) method, as well as their extensions to the regression-kriging form, MLRK and GWRK, respectively, were examined. Stepwise regression was used to select variables for the individual models and the cross-validation method was used to validate the results with a special attention paid to statistically significant improvement of the model using the mean absolute error (MAE) criterion. The main results of this study led to rejection of both assumptions considered. Usually, including more than two or three of the most significantly correlated auxiliary variables does not improve the quality of the spatial model. The effects of introduction of certain variables into the model were not climatologically justified and were seen on maps as unexpected and undesired artefacts. The results confirm, in accordance with previous studies, that in the case of air temperature distribution, the spatial process is non-stationary; thus, the local GWR model performs better than the global MLR if they are specified using the same set of auxiliary variables. If only GWR residuals are autocorrelated, the geographically weighted regression-kriging (GWRK) model seems to be optimal for air temperature spatial interpolation.

  10. Mapping forest inventory and analysis data attributes within the framework of double sampling for stratification design

    Treesearch

    David C. Chojnacky; Randolph H. Wynne; Christine E. Blinn

    2009-01-01

    Methodology is lacking to easily map Forest Inventory and Analysis (FIA) inventory statistics for all attribute variables without having to develop separate models and methods for each variable. We developed a mapping method that can directly transfer tabular data to a map on which pixels can be added any way desired to estimate carbon (or any other variable) for a...

  11. Multilevel Models for Binary Data

    ERIC Educational Resources Information Center

    Powers, Daniel A.

    2012-01-01

    The methods and models for categorical data analysis cover considerable ground, ranging from regression-type models for binary and binomial data, count data, to ordered and unordered polytomous variables, as well as regression models that mix qualitative and continuous data. This article focuses on methods for binary or binomial data, which are…

  12. Forecasting seasonal hydrologic response in major river basins

    NASA Astrophysics Data System (ADS)

    Bhuiyan, A. M.

    2014-05-01

    Seasonal precipitation variation due to natural climate variation influences stream flow and the apparent frequency and severity of extreme hydrological conditions such as flood and drought. To study hydrologic response and understand the occurrence of extreme hydrological events, the relevant forcing variables must be identified. This study attempts to assess and quantify the historical occurrence and context of extreme hydrologic flow events and quantify the relation between relevant climate variables. Once identified, the flow data and climate variables are evaluated to identify the primary relationship indicators of hydrologic extreme event occurrence. Existing studies focus on developing basin-scale forecasting techniques based on climate anomalies in El Nino/La Nina episodes linked to global climate. Building on earlier work, the goal of this research is to quantify variations in historical river flows at seasonal temporal-scale, and regional to continental spatial-scale. The work identifies and quantifies runoff variability of major river basins and correlates flow with environmental forcing variables such as El Nino, La Nina, sunspot cycle. These variables are expected to be the primary external natural indicators of inter-annual and inter-seasonal patterns of regional precipitation and river flow. Relations between continental-scale hydrologic flows and external climate variables are evaluated through direct correlations in a seasonal context with environmental phenomenon such as sun spot numbers (SSN), Southern Oscillation Index (SOI), and Pacific Decadal Oscillation (PDO). Methods including stochastic time series analysis and artificial neural networks are developed to represent the seasonal variability evident in the historical records of river flows. River flows are categorized into low, average and high flow levels to evaluate and simulate flow variations under associated climate variable variations. Results demonstrated not any particular method is suited to represent scenarios leading to extreme flow conditions. For selected flow scenarios, the persistence model performance may be comparable to more complex multivariate approaches, and complex methods did not always improve flow estimation. Overall model performance indicates inclusion of river flows and forcing variables on average improve model extreme event forecasting skills. As a means to further refine the flow estimation, an ensemble forecast method is implemented to provide a likelihood-based indication of expected river flow magnitude and variability. Results indicate seasonal flow variations are well-captured in the ensemble range, therefore the ensemble approach can often prove efficient in estimating extreme river flow conditions. The discriminant prediction approach, a probabilistic measure to forecast streamflow, is also adopted to derive model performance. Results show the efficiency of the method in terms of representing uncertainties in the forecasts.

  13. Estimation of Staphylococcus aureus growth parameters from turbidity data: characterization of strain variation and comparison of methods.

    PubMed

    Lindqvist, R

    2006-07-01

    Turbidity methods offer possibilities for generating data required for addressing microorganism variability in risk modeling given that the results of these methods correspond to those of viable count methods. The objectives of this study were to identify the best approach for determining growth parameters based on turbidity data and use of a Bioscreen instrument and to characterize variability in growth parameters of 34 Staphylococcus aureus strains of different biotypes isolated from broiler carcasses. Growth parameters were estimated by fitting primary growth models to turbidity growth curves or to detection times of serially diluted cultures either directly or by using an analysis of variance (ANOVA) approach. The maximum specific growth rates in chicken broth at 17 degrees C estimated by time to detection methods were in good agreement with viable count estimates, whereas growth models (exponential and Richards) underestimated growth rates. Time to detection methods were selected for strain characterization. The variation of growth parameters among strains was best described by either the logistic or lognormal distribution, but definitive conclusions require a larger data set. The distribution of the physiological state parameter ranged from 0.01 to 0.92 and was not significantly different from a normal distribution. Strain variability was important, and the coefficient of variation of growth parameters was up to six times larger among strains than within strains. It is suggested to apply a time to detection (ANOVA) approach using turbidity measurements for convenient and accurate estimation of growth parameters. The results emphasize the need to consider implications of strain variability for predictive modeling and risk assessment.

  14. A Priori Subgrid Analysis of Temporal Mixing Layers with Evaporating Droplets

    NASA Technical Reports Server (NTRS)

    Okongo, Nora; Bellan, Josette

    1999-01-01

    Subgrid analysis of a transitional temporal mixing layer with evaporating droplets has been performed using three sets of results from a Direct Numerical Simulation (DNS) database, with Reynolds numbers (based on initial vorticity thickness) as large as 600 and with droplet mass loadings as large as 0.5. In the DNS, the gas phase is computed using a Eulerian formulation, with Lagrangian droplet tracking. The Large Eddy Simulation (LES) equations corresponding to the DNS are first derived, and key assumptions in deriving them are first confirmed by computing the terms using the DNS database. Since LES of this flow requires the computation of unfiltered gas-phase variables at droplet locations from filtered gas-phase variables at the grid points, it is proposed to model these by assuming the gas-phase variables to be the sum of the filtered variables and a correction based on the filtered standard deviation; this correction is then computed from the Subgrid Scale (SGS) standard deviation. This model predicts the unfiltered variables at droplet locations considerably better than simply interpolating the filtered variables. Three methods are investigated for modeling the SGS standard deviation: the Smagorinsky approach, the Gradient model and the Scale-Similarity formulation. When the proportionality constant inherent in the SGS models is properly calculated, the Gradient and Scale-Similarity methods give results in excellent agreement with the DNS.

  15. Identification of Coffee Varieties Using Laser-Induced Breakdown Spectroscopy and Chemometrics.

    PubMed

    Zhang, Chu; Shen, Tingting; Liu, Fei; He, Yong

    2017-12-31

    We linked coffee quality to its different varieties. This is of interest because the identification of coffee varieties should help coffee trading and consumption. Laser-induced breakdown spectroscopy (LIBS) combined with chemometric methods was used to identify coffee varieties. Wavelet transform (WT) was used to reduce LIBS spectra noise. Partial least squares-discriminant analysis (PLS-DA), radial basis function neural network (RBFNN), and support vector machine (SVM) were used to build classification models. Loadings of principal component analysis (PCA) were used to select the spectral variables contributing most to the identification of coffee varieties. Twenty wavelength variables corresponding to C I, Mg I, Mg II, Al II, CN, H, Ca II, Fe I, K I, Na I, N I, and O I were selected. PLS-DA, RBFNN, and SVM models on selected wavelength variables showed acceptable results. SVM and RBFNN models performed better with a classification accuracy of over 80% in the prediction set, for both full spectra and the selected variables. The overall results indicated that it was feasible to use LIBS and chemometric methods to identify coffee varieties. For further studies, more samples are needed to produce robust classification models, research should be conducted on which methods to use to select spectral peaks that correspond to the elements contributing most to identification, and the methods for acquiring stable spectra should also be studied.

  16. Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models

    PubMed Central

    2013-01-01

    Background In statistical modeling, finding the most favorable coding for an exploratory quantitative variable involves many tests. This process involves multiple testing problems and requires the correction of the significance level. Methods For each coding, a test on the nullity of the coefficient associated with the new coded variable is computed. The selected coding corresponds to that associated with the largest statistical test (or equivalently the smallest pvalue). In the context of the Generalized Linear Model, Liquet and Commenges (Stat Probability Lett,71:33–38,2005) proposed an asymptotic correction of the significance level. This procedure, based on the score test, has been developed for dichotomous and Box-Cox transformations. In this paper, we suggest the use of resampling methods to estimate the significance level for categorical transformations with more than two levels and, by definition those that involve more than one parameter in the model. The categorical transformation is a more flexible way to explore the unknown shape of the effect between an explanatory and a dependent variable. Results The simulations we ran in this study showed good performances of the proposed methods. These methods were illustrated using the data from a study of the relationship between cholesterol and dementia. Conclusion The algorithms were implemented using R, and the associated CPMCGLM R package is available on the CRAN. PMID:23758852

  17. Locally optimal control under unknown dynamics with learnt cost function: application to industrial robot positioning

    NASA Astrophysics Data System (ADS)

    Guérin, Joris; Gibaru, Olivier; Thiery, Stéphane; Nyiri, Eric

    2017-01-01

    Recent methods of Reinforcement Learning have enabled to solve difficult, high dimensional, robotic tasks under unknown dynamics using iterative Linear Quadratic Gaussian control theory. These algorithms are based on building a local time-varying linear model of the dynamics from data gathered through interaction with the environment. In such tasks, the cost function is often expressed directly in terms of the state and control variables so that it can be locally quadratized to run the algorithm. If the cost is expressed in terms of other variables, a model is required to compute the cost function from the variables manipulated. We propose a method to learn the cost function directly from the data, in the same way as for the dynamics. This way, the cost function can be defined in terms of any measurable quantity and thus can be chosen more appropriately for the task to be carried out. With our method, any sensor information can be used to design the cost function. We demonstrate the efficiency of this method through simulating, with the V-REP software, the learning of a Cartesian positioning task on several industrial robots with different characteristics. The robots are controlled in joint space and no model is provided a priori. Our results are compared with another model free technique, consisting in writing the cost function as a state variable.

  18. Identification of Coffee Varieties Using Laser-Induced Breakdown Spectroscopy and Chemometrics

    PubMed Central

    Zhang, Chu; Shen, Tingting

    2017-01-01

    We linked coffee quality to its different varieties. This is of interest because the identification of coffee varieties should help coffee trading and consumption. Laser-induced breakdown spectroscopy (LIBS) combined with chemometric methods was used to identify coffee varieties. Wavelet transform (WT) was used to reduce LIBS spectra noise. Partial least squares-discriminant analysis (PLS-DA), radial basis function neural network (RBFNN), and support vector machine (SVM) were used to build classification models. Loadings of principal component analysis (PCA) were used to select the spectral variables contributing most to the identification of coffee varieties. Twenty wavelength variables corresponding to C I, Mg I, Mg II, Al II, CN, H, Ca II, Fe I, K I, Na I, N I, and O I were selected. PLS-DA, RBFNN, and SVM models on selected wavelength variables showed acceptable results. SVM and RBFNN models performed better with a classification accuracy of over 80% in the prediction set, for both full spectra and the selected variables. The overall results indicated that it was feasible to use LIBS and chemometric methods to identify coffee varieties. For further studies, more samples are needed to produce robust classification models, research should be conducted on which methods to use to select spectral peaks that correspond to the elements contributing most to identification, and the methods for acquiring stable spectra should also be studied. PMID:29301228

  19. Direct Regularized Estimation of Retinal Vascular Oxygen Tension Based on an Experimental Model

    PubMed Central

    Yildirim, Isa; Ansari, Rashid; Yetik, I. Samil; Shahidi, Mahnaz

    2014-01-01

    Phosphorescence lifetime imaging is commonly used to generate oxygen tension maps of retinal blood vessels by classical least squares (LS) estimation method. A spatial regularization method was later proposed and provided improved results. However, both methods obtain oxygen tension values from the estimates of intermediate variables, and do not yield an optimum estimate of oxygen tension values, due to their nonlinear dependence on the ratio of intermediate variables. In this paper, we provide an improved solution by devising a regularized direct least squares (RDLS) method that exploits available knowledge in studies that provide models of oxygen tension in retinal arteries and veins, unlike the earlier regularized LS approach where knowledge about intermediate variables is limited. The performance of the proposed RDLS method is evaluated by investigating and comparing the bias, variance, oxygen tension maps, 1-D profiles of arterial oxygen tension, and mean absolute error with those of earlier methods, and its superior performance both quantitatively and qualitatively is demonstrated. PMID:23732915

  20. Optimal sensor placement for control of a supersonic mixed-compression inlet with variable geometry

    NASA Astrophysics Data System (ADS)

    Moore, Kenneth Thomas

    A method of using fluid dynamics models for the generation of models that are useable for control design and analysis is investigated. The problem considered is the control of the normal shock location in the VDC inlet, which is a mixed-compression, supersonic, variable-geometry inlet of a jet engine. A quasi-one-dimensional set of fluid equations incorporating bleed and moving walls is developed. An object-oriented environment is developed for simulation of flow systems under closed-loop control. A public interface between the controller and fluid classes is defined. A linear model representing the dynamics of the VDC inlet is developed from the finite difference equations, and its eigenstructure is analyzed. The order of this model is reduced using the square root balanced model reduction method to produce a reduced-order linear model that is suitable for control design and analysis tasks. A modification to this method that improves the accuracy of the reduced-order linear model for the purpose of sensor placement is presented and analyzed. The reduced-order linear model is used to develop a sensor placement method that quantifies as a function of the sensor location the ability of a sensor to provide information on the variable of interest for control. This method is used to develop a sensor placement metric for the VDC inlet. The reduced-order linear model is also used to design a closed loop control system to control the shock position in the VDC inlet. The object-oriented simulation code is used to simulate the nonlinear fluid equations under closed-loop control.

  1. The effects of modeling instruction on high school physics academic achievement

    NASA Astrophysics Data System (ADS)

    Wright, Tiffanie L.

    The purpose of this study was to explore whether Modeling Instruction, compared to traditional lecturing, is an effective instructional method to promote academic achievement in selected high school physics classes at a rural middle Tennessee high school. This study used an ex post facto , quasi-experimental research methodology. The independent variables in this study were the instructional methods of teaching. The treatment variable was Modeling Instruction and the control variable was traditional lecture instruction. The Treatment Group consisted of participants in Physical World Concepts who received Modeling Instruction. The Control Group consisted of participants in Physical Science who received traditional lecture instruction. The dependent variable was gains scores on the Force Concepts Inventory (FCI). The participants for this study were 133 students each in both the Treatment and Control Groups (n = 266), who attended a public, high school in rural middle Tennessee. The participants were administered the Force Concepts Inventory (FCI) prior to being taught the mechanics of physics. The FCI data were entered into the computer-based Statistical Package for the Social Science (SPSS). Two independent samples t-tests were conducted to answer the research questions. There was a statistically significant difference between the treatment and control groups concerning the instructional method. Modeling Instructional methods were found to be effective in increasing the academic achievement of students in high school physics. There was no statistically significant difference between FCI gains scores for gender. Gender was found to have no effect on the academic achievement of students in high school physics classes. However, even though there was not a statistically significant difference, female students' gains scores were higher than male students' gains scores when Modeling Instructional methods of teaching were used. Based on these findings, it is recommended that high school science teachers should use Modeling Instructional methods of teaching daily in their classrooms. A recommendation for further research is to expand the Modeling Instructional methods of teaching into different content areas, (i.e., reading and language arts) to explore academic achievement gains.

  2. A network-based approach for semi-quantitative knowledge mining and its application to yield variability

    NASA Astrophysics Data System (ADS)

    Schauberger, Bernhard; Rolinski, Susanne; Müller, Christoph

    2016-12-01

    Variability of crop yields is detrimental for food security. Under climate change its amplitude is likely to increase, thus it is essential to understand the underlying causes and mechanisms. Crop models are the primary tool to project future changes in crop yields under climate change. A systematic overview of drivers and mechanisms of crop yield variability (YV) can thus inform crop model development and facilitate improved understanding of climate change impacts on crop yields. Yet there is a vast body of literature on crop physiology and YV, which makes a prioritization of mechanisms for implementation in models challenging. Therefore this paper takes on a novel approach to systematically mine and organize existing knowledge from the literature. The aim is to identify important mechanisms lacking in models, which can help to set priorities in model improvement. We structure knowledge from the literature in a semi-quantitative network. This network consists of complex interactions between growing conditions, plant physiology and crop yield. We utilize the resulting network structure to assign relative importance to causes of YV and related plant physiological processes. As expected, our findings confirm existing knowledge, in particular on the dominant role of temperature and precipitation, but also highlight other important drivers of YV. More importantly, our method allows for identifying the relevant physiological processes that transmit variability in growing conditions to variability in yield. We can identify explicit targets for the improvement of crop models. The network can additionally guide model development by outlining complex interactions between processes and by easily retrieving quantitative information for each of the 350 interactions. We show the validity of our network method as a structured, consistent and scalable dictionary of literature. The method can easily be applied to many other research fields.

  3. Can Statistical Machine Learning Algorithms Help for Classification of Obstructive Sleep Apnea Severity to Optimal Utilization of Polysomnography Resources?

    PubMed

    Bozkurt, Selen; Bostanci, Asli; Turhan, Murat

    2017-08-11

    The goal of this study is to evaluate the results of machine learning methods for the classification of OSA severity of patients with suspected sleep disorder breathing as normal, mild, moderate and severe based on non-polysomnographic variables: 1) clinical data, 2) symptoms and 3) physical examination. In order to produce classification models for OSA severity, five different machine learning methods (Bayesian network, Decision Tree, Random Forest, Neural Networks and Logistic Regression) were trained while relevant variables and their relationships were derived empirically from observed data. Each model was trained and evaluated using 10-fold cross-validation and to evaluate classification performances of all methods, true positive rate (TPR), false positive rate (FPR), Positive Predictive Value (PPV), F measure and Area Under Receiver Operating Characteristics curve (ROC-AUC) were used. Results of 10-fold cross validated tests with different variable settings promisingly indicated that the OSA severity of suspected OSA patients can be classified, using non-polysomnographic features, with 0.71 true positive rate as the highest and, 0.15 false positive rate as the lowest, respectively. Moreover, the test results of different variables settings revealed that the accuracy of the classification models was significantly improved when physical examination variables were added to the model. Study results showed that machine learning methods can be used to estimate the probabilities of no, mild, moderate, and severe obstructive sleep apnea and such approaches may improve accurate initial OSA screening and help referring only the suspected moderate or severe OSA patients to sleep laboratories for the expensive tests.

  4. Do College Faculty Embrace Web 2.0 Technology?

    ERIC Educational Resources Information Center

    Siha, Samia M.; Bell, Reginald Lamar; Roebuck, Deborah

    2016-01-01

    The authors sought to determine if Rogers's Innovation Decision Process model could analyze Web 2.0 usage within the collegiate environment. The key independent variables studied in relationship to this model were gender, faculty rank, course content delivery method, and age. Chi-square nonparametric tests on the independent variables across…

  5. A Mixed-Methodological Examination of Investment Model Variables among Abused and Nonabused College Women

    ERIC Educational Resources Information Center

    Dardis, Christina M.; Kelley, Erika L.; Edwards, Katie M.; Gidycz, Christine A.

    2013-01-01

    Objective: This study assessed abused and nonabused women's perceptions of Investment Model (IM) variables (ie, relationship investment, satisfaction, commitment, quality of alternatives) utilizing a mixed-methods design. Participants: Participants included 102 college women, approximately half of whom were in abusive dating relationships.…

  6. Evaluation of Validity and Reliability for Hierarchical Scales Using Latent Variable Modeling

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2012-01-01

    A latent variable modeling method is outlined, which accomplishes estimation of criterion validity and reliability for a multicomponent measuring instrument with hierarchical structure. The approach provides point and interval estimates for the scale criterion validity and reliability coefficients, and can also be used for testing composite or…

  7. Method for assessing coal-floor water-inrush risk based on the variable-weight model and unascertained measure theory

    NASA Astrophysics Data System (ADS)

    Wu, Qiang; Zhao, Dekang; Wang, Yang; Shen, Jianjun; Mu, Wenping; Liu, Honglei

    2017-11-01

    Water inrush from coal-seam floors greatly threatens mining safety in North China and is a complex process controlled by multiple factors. This study presents a mathematical assessment system for coal-floor water-inrush risk based on the variable-weight model (VWM) and unascertained measure theory (UMT). In contrast to the traditional constant-weight model (CWM), which assigns a fixed weight to each factor, the VWM varies with the factor-state value. The UMT employs the confidence principle, which is more effective in ordered partition problems than the maximum membership principle adopted in the former mathematical theory. The method is applied to the Datang Tashan Coal Mine in North China. First, eight main controlling factors are selected to construct the comprehensive evaluation index system. Subsequently, an incentive-penalty variable-weight model is built to calculate the variable weights of each factor. Then, the VWM-UMT model is established using the quantitative risk-grade divide of each factor according to the UMT. On this basis, the risk of coal-floor water inrush in Tashan Mine No. 8 is divided into five grades. For comparison, the CWM is also adopted for the risk assessment, and a differences distribution map is obtained between the two methods. Finally, the verification of water-inrush points indicates that the VWM-UMT model is powerful and more feasible and reasonable. The model has great potential and practical significance in future engineering applications.

  8. Regression Is a Univariate General Linear Model Subsuming Other Parametric Methods as Special Cases.

    ERIC Educational Resources Information Center

    Vidal, Sherry

    Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout…

  9. Applying multibeam sonar and mathematical modeling for mapping seabed substrate and biota of offshore shallows

    NASA Astrophysics Data System (ADS)

    Herkül, Kristjan; Peterson, Anneliis; Paekivi, Sander

    2017-06-01

    Both basic science and marine spatial planning are in a need of high resolution spatially continuous data on seabed habitats and biota. As conventional point-wise sampling is unable to cover large spatial extents in high detail, it must be supplemented with remote sensing and modeling in order to fulfill the scientific and management needs. The combined use of in situ sampling, sonar scanning, and mathematical modeling is becoming the main method for mapping both abiotic and biotic seabed features. Further development and testing of the methods in varying locations and environmental settings is essential for moving towards unified and generally accepted methodology. To fill the relevant research gap in the Baltic Sea, we used multibeam sonar and mathematical modeling methods - generalized additive models (GAM) and random forest (RF) - together with underwater video to map seabed substrate and epibenthos of offshore shallows. In addition to testing the general applicability of the proposed complex of techniques, the predictive power of different sonar-based variables and modeling algorithms were tested. Mean depth, followed by mean backscatter, were the most influential variables in most of the models. Generally, mean values of sonar-based variables had higher predictive power than their standard deviations. The predictive accuracy of RF was higher than that of GAM. To conclude, we found the method to be feasible and with predictive accuracy similar to previous studies of sonar-based mapping.

  10. Evaluation of model-based methods in estimating respiratory mechanics in the presence of variable patient effort.

    PubMed

    Redmond, Daniel P; Chiew, Yeong Shiong; Major, Vincent; Chase, J Geoffrey

    2016-09-23

    Monitoring of respiratory mechanics is required for guiding patient-specific mechanical ventilation settings in critical care. Many models of respiratory mechanics perform poorly in the presence of variable patient effort. Typical modelling approaches either attempt to mitigate the effect of the patient effort on the airway pressure waveforms, or attempt to capture the size and shape of the patient effort. This work analyses a range of methods to identify respiratory mechanics in volume controlled ventilation modes when there is patient effort. The models are compared using 4 Datasets, each with a sample of 30 breaths before, and 2-3 minutes after sedation has been administered. The sedation will reduce patient efforts, but the underlying pulmonary mechanical properties are unlikely to change during this short time. Model identified parameters from breathing cycles with patient effort are compared to breathing cycles that do not have patient effort. All models have advantages and disadvantages, so model selection may be specific to the respiratory mechanics application. However, in general, the combined method of iterative interpolative pressure reconstruction, and stacking multiple consecutive breaths together has the best performance over the Dataset. The variability of identified elastance when there is patient effort is the lowest with this method, and there is little systematic offset in identified mechanics when sedation is administered. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  11. Dynamic Assessment of Water Quality Based on a Variable Fuzzy Pattern Recognition Model

    PubMed Central

    Xu, Shiguo; Wang, Tianxiang; Hu, Suduan

    2015-01-01

    Water quality assessment is an important foundation of water resource protection and is affected by many indicators. The dynamic and fuzzy changes of water quality lead to problems for proper assessment. This paper explores a method which is in accordance with the water quality changes. The proposed method is based on the variable fuzzy pattern recognition (VFPR) model and combines the analytic hierarchy process (AHP) model with the entropy weight (EW) method. The proposed method was applied to dynamically assess the water quality of Biliuhe Reservoir (Dailan, China). The results show that the water quality level is between levels 2 and 3 and worse in August or September, caused by the increasing water temperature and rainfall. Weights and methods are compared and random errors of the values of indicators are analyzed. It is concluded that the proposed method has advantages of dynamism, fuzzification and stability by considering the interval influence of multiple indicators and using the average level characteristic values of four models as results. PMID:25689998

  12. Dynamic assessment of water quality based on a variable fuzzy pattern recognition model.

    PubMed

    Xu, Shiguo; Wang, Tianxiang; Hu, Suduan

    2015-02-16

    Water quality assessment is an important foundation of water resource protection and is affected by many indicators. The dynamic and fuzzy changes of water quality lead to problems for proper assessment. This paper explores a method which is in accordance with the water quality changes. The proposed method is based on the variable fuzzy pattern recognition (VFPR) model and combines the analytic hierarchy process (AHP) model with the entropy weight (EW) method. The proposed method was applied to dynamically assess the water quality of Biliuhe Reservoir (Dailan, China). The results show that the water quality level is between levels 2 and 3 and worse in August or September, caused by the increasing water temperature and rainfall. Weights and methods are compared and random errors of the values of indicators are analyzed. It is concluded that the proposed method has advantages of dynamism, fuzzification and stability by considering the interval influence of multiple indicators and using the average level characteristic values of four models as results.

  13. City scale pollen concentration variability

    NASA Astrophysics Data System (ADS)

    van der Molen, Michiel; van Vliet, Arnold; Krol, Maarten

    2016-04-01

    Pollen are emitted in the atmosphere both in the country-side and in cities. Yet the majority of the population is exposed to pollen in cities. Allergic reactions may be induced by short-term exposure to pollen. This raises the question how variable pollen concentration in cities are in temporally and spatially, and how much of the pollen in cities are actually produced in the urban region itself. We built a high resolution (1 × 1 km) pollen dispersion model based on WRF-Chem to study a city's pollen budget and the spatial and temporal variability in concentration. It shows that the concentrations are highly variable, as a result of source distribution, wind direction and boundary layer mixing, as well as the release rate as a function of temperature, turbulence intensity and humidity. Hay Fever Forecasts based on such high resolution emission and physical dispersion modelling surpass traditional hay fever warning methods based on temperature sum methods. The model gives new insights in concentration variability, personal and community level exposure and prevention. The model will be developped into a new forecast tool to serve allergic people to minimize their exposure and reduce nuisance, coast of medication and sick leave. This is an innovative approach in hay fever warning systems.

  14. Integrated Approach to Inform the New York City Water Supply System Coupling SAR Remote Sensing Observations and the SWAT Watershed Model

    NASA Astrophysics Data System (ADS)

    Tesser, D.; Hoang, L.; McDonald, K. C.

    2017-12-01

    Efforts to improve municipal water supply systems increasingly rely on an ability to elucidate variables that drive hydrologic dynamics within large watersheds. However, fundamental model variables such as precipitation, soil moisture, evapotranspiration, and soil freeze/thaw state remain difficult to measure empirically across large, heterogeneous watersheds. Satellite remote sensing presents a method to validate these spatially and temporally dynamic variables as well as better inform the watershed models that monitor the water supply for many of the planet's most populous urban centers. PALSAR 2 L-band, Sentinel 1 C-band, and SMAP L-band scenes covering the Cannonsville branch of the New York City (NYC) water supply watershed were obtained for the period of March 2015 - October 2017. The SAR data provides information on soil moisture, free/thaw state, seasonal surface inundation, and variable source areas within the study site. Integrating the remote sensing products with watershed model outputs and ground survey data improves the representation of related processes in the Soil and Water Assessment Tool (SWAT) utilized to monitor the NYC water supply. PALSAR 2 supports accurate mapping of the extent of variable source areas while Sentinel 1 presents a method to model the timing and magnitude of snowmelt runoff events. SMAP Active Radar soil moisture product directly validates SWAT outputs at the subbasin level. This blended approach verifies the distribution of soil wetness classes within the watershed that delineate Hydrologic Response Units (HRUs) in the modified SWAT-Hillslope. The research expands the ability to model the NYC water supply source beyond a subset of the watershed while also providing high resolution information across a larger spatial scale. The global availability of these remote sensing products provides a method to capture fundamental hydrology variables in regions where current modeling efforts and in situ data remain limited.

  15. Soil variability in engineering applications

    NASA Astrophysics Data System (ADS)

    Vessia, Giovanna

    2014-05-01

    Natural geomaterials, as soils and rocks, show spatial variability and heterogeneity of physical and mechanical properties. They can be measured by in field and laboratory testing. The heterogeneity concerns different values of litho-technical parameters pertaining similar lithological units placed close to each other. On the contrary, the variability is inherent to the formation and evolution processes experienced by each geological units (homogeneous geomaterials on average) and captured as a spatial structure of fluctuation of physical property values about their mean trend, e.g. the unit weight, the hydraulic permeability, the friction angle, the cohesion, among others. The preceding spatial variations shall be managed by engineering models to accomplish reliable designing of structures and infrastructures. Materon (1962) introduced the Geostatistics as the most comprehensive tool to manage spatial correlation of parameter measures used in a wide range of earth science applications. In the field of the engineering geology, Vanmarcke (1977) developed the first pioneering attempts to describe and manage the inherent variability in geomaterials although Terzaghi (1943) already highlighted that spatial fluctuations of physical and mechanical parameters used in geotechnical designing cannot be neglected. A few years later, Mandelbrot (1983) and Turcotte (1986) interpreted the internal arrangement of geomaterial according to Fractal Theory. In the same years, Vanmarcke (1983) proposed the Random Field Theory providing mathematical tools to deal with inherent variability of each geological units or stratigraphic succession that can be resembled as one material. In this approach, measurement fluctuations of physical parameters are interpreted through the spatial variability structure consisting in the correlation function and the scale of fluctuation. Fenton and Griffiths (1992) combined random field simulation with the finite element method to produce the Random Finite Element Method (RFEM). This method has been used to investigate the random behavior of soils in the context of a variety of classical geotechnical problems. Afterward, some following studies collected the worldwide variability values of many technical parameters of soils (Phoon and Kulhawy 1999a) and their spatial correlation functions (Phoon and Kulhawy 1999b). In Italy, Cherubini et al. (2007) calculated the spatial variability structure of sandy and clayey soils from the standard cone penetration test readings. The large extent of the worldwide measured spatial variability of soils and rocks heavily affects the reliability of geotechnical designing as well as other uncertainties introduced by testing devices and engineering models. So far, several methods have been provided to deal with the preceding sources of uncertainties in engineering designing models (e.g. First Order Reliability Method, Second Order Reliability Method, Response Surface Method, High Dimensional Model Representation, etc.). Nowadays, the efforts in this field have been focusing on (1) measuring spatial variability of different rocks and soils and (2) developing numerical models that take into account the spatial variability as additional physical variable. References Cherubini C., Vessia G. and Pula W. 2007. Statistical soil characterization of Italian sites for reliability analyses. Proc. 2nd Int. Workshop. on Characterization and Engineering Properties of Natural Soils, 3-4: 2681-2706. Griffiths D.V. and Fenton G.A. 1993. Seepage beneath water retaining structures founded on spatially random soil, Géotechnique, 43(6): 577-587. Mandelbrot B.B. 1983. The Fractal Geometry of Nature. San Francisco: W H Freeman. Matheron G. 1962. Traité de Géostatistique appliquée. Tome 1, Editions Technip, Paris, 334 p. Phoon K.K. and Kulhawy F.H. 1999a. Characterization of geotechnical variability. Can Geotech J, 36(4): 612-624. Phoon K.K. and Kulhawy F.H. 1999b. Evaluation of geotechnical property variability. Can Geotech J, 36(4): 625-639. Terzaghi K. 1943. Theoretical Soil Mechanics. New York: John Wiley and Sons. Turcotte D.L. 1986. Fractals and fragmentation. J Geophys Res, 91: 1921-1926. Vanmarcke E.H. 1977. Probabilistic modeling of soil profiles. J Geotech Eng Div, ASCE, 103: 1227-1246. Vanmarcke E.H. 1983. Random fields: analysis and synthesis. MIT Press, Cambridge.

  16. Predictive models for Escherichia coli concentrations at inland lake beaches and relationship of model variables to pathogen detection

    USGS Publications Warehouse

    Francy, Donna S.; Stelzer, Erin A.; Duris, Joseph W.; Brady, Amie M.G.; Harrison, John H.; Johnson, Heather E.; Ware, Michael W.

    2013-01-01

    Predictive models, based on environmental and water quality variables, have been used to improve the timeliness and accuracy of recreational water quality assessments, but their effectiveness has not been studied in inland waters. Sampling at eight inland recreational lakes in Ohio was done in order to investigate using predictive models for Escherichia coli and to understand the links between E. coli concentrations, predictive variables, and pathogens. Based upon results from 21 beach sites, models were developed for 13 sites, and the most predictive variables were rainfall, wind direction and speed, turbidity, and water temperature. Models were not developed at sites where the E. coli standard was seldom exceeded. Models were validated at nine sites during an independent year. At three sites, the model resulted in increased correct responses, sensitivities, and specificities compared to use of the previous day's E. coli concentration (the current method). Drought conditions during the validation year precluded being able to adequately assess model performance at most of the other sites. Cryptosporidium, adenovirus, eaeA (E. coli), ipaH (Shigella), and spvC (Salmonella) were found in at least 20% of samples collected for pathogens at five sites. The presence or absence of the three bacterial genes was related to some of the model variables but was not consistently related to E. coli concentrations. Predictive models were not effective at all inland lake sites; however, their use at two lakes with high swimmer densities will provide better estimates of public health risk than current methods and will be a valuable resource for beach managers and the public.

  17. Predictive models for Escherichia coli concentrations at inland lake beaches and relationship of model variables to pathogen detection.

    PubMed

    Francy, Donna S; Stelzer, Erin A; Duris, Joseph W; Brady, Amie M G; Harrison, John H; Johnson, Heather E; Ware, Michael W

    2013-03-01

    Predictive models, based on environmental and water quality variables, have been used to improve the timeliness and accuracy of recreational water quality assessments, but their effectiveness has not been studied in inland waters. Sampling at eight inland recreational lakes in Ohio was done in order to investigate using predictive models for Escherichia coli and to understand the links between E. coli concentrations, predictive variables, and pathogens. Based upon results from 21 beach sites, models were developed for 13 sites, and the most predictive variables were rainfall, wind direction and speed, turbidity, and water temperature. Models were not developed at sites where the E. coli standard was seldom exceeded. Models were validated at nine sites during an independent year. At three sites, the model resulted in increased correct responses, sensitivities, and specificities compared to use of the previous day's E. coli concentration (the current method). Drought conditions during the validation year precluded being able to adequately assess model performance at most of the other sites. Cryptosporidium, adenovirus, eaeA (E. coli), ipaH (Shigella), and spvC (Salmonella) were found in at least 20% of samples collected for pathogens at five sites. The presence or absence of the three bacterial genes was related to some of the model variables but was not consistently related to E. coli concentrations. Predictive models were not effective at all inland lake sites; however, their use at two lakes with high swimmer densities will provide better estimates of public health risk than current methods and will be a valuable resource for beach managers and the public.

  18. Multivariate fault isolation of batch processes via variable selection in partial least squares discriminant analysis.

    PubMed

    Yan, Zhengbing; Kuang, Te-Hui; Yao, Yuan

    2017-09-01

    In recent years, multivariate statistical monitoring of batch processes has become a popular research topic, wherein multivariate fault isolation is an important step aiming at the identification of the faulty variables contributing most to the detected process abnormality. Although contribution plots have been commonly used in statistical fault isolation, such methods suffer from the smearing effect between correlated variables. In particular, in batch process monitoring, the high autocorrelations and cross-correlations that exist in variable trajectories make the smearing effect unavoidable. To address such a problem, a variable selection-based fault isolation method is proposed in this research, which transforms the fault isolation problem into a variable selection problem in partial least squares discriminant analysis and solves it by calculating a sparse partial least squares model. As different from the traditional methods, the proposed method emphasizes the relative importance of each process variable. Such information may help process engineers in conducting root-cause diagnosis. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  19. Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random.

    PubMed

    Pritikin, Joshua N; Brick, Timothy R; Neale, Michael C

    2018-04-01

    A novel method for the maximum likelihood estimation of structural equation models (SEM) with both ordinal and continuous indicators is introduced using a flexible multivariate probit model for the ordinal indicators. A full information approach ensures unbiased estimates for data missing at random. Exceeding the capability of prior methods, up to 13 ordinal variables can be included before integration time increases beyond 1 s per row. The method relies on the axiom of conditional probability to split apart the distribution of continuous and ordinal variables. Due to the symmetry of the axiom, two similar methods are available. A simulation study provides evidence that the two similar approaches offer equal accuracy. A further simulation is used to develop a heuristic to automatically select the most computationally efficient approach. Joint ordinal continuous SEM is implemented in OpenMx, free and open-source software.

  20. The Mediated MIMIC Model for Understanding the Underlying Mechanism of DIF.

    PubMed

    Cheng, Ying; Shao, Can; Lathrop, Quinn N

    2016-02-01

    Due to its flexibility, the multiple-indicator, multiple-causes (MIMIC) model has become an increasingly popular method for the detection of differential item functioning (DIF). In this article, we propose the mediated MIMIC model method to uncover the underlying mechanism of DIF. This method extends the usual MIMIC model by including one variable or multiple variables that may completely or partially mediate the DIF effect. If complete mediation effect is found, the DIF effect is fully accounted for. Through our simulation study, we find that the mediated MIMIC model is very successful in detecting the mediation effect that completely or partially accounts for DIF, while keeping the Type I error rate well controlled for both balanced and unbalanced sample sizes between focal and reference groups. Because it is successful in detecting such mediation effects, the mediated MIMIC model may help explain DIF and give guidance in the revision of a DIF item.

  1. The Mediated MIMIC Model for Understanding the Underlying Mechanism of DIF

    PubMed Central

    Cheng, Ying; Shao, Can; Lathrop, Quinn N.

    2015-01-01

    Due to its flexibility, the multiple-indicator, multiple-causes (MIMIC) model has become an increasingly popular method for the detection of differential item functioning (DIF). In this article, we propose the mediated MIMIC model method to uncover the underlying mechanism of DIF. This method extends the usual MIMIC model by including one variable or multiple variables that may completely or partially mediate the DIF effect. If complete mediation effect is found, the DIF effect is fully accounted for. Through our simulation study, we find that the mediated MIMIC model is very successful in detecting the mediation effect that completely or partially accounts for DIF, while keeping the Type I error rate well controlled for both balanced and unbalanced sample sizes between focal and reference groups. Because it is successful in detecting such mediation effects, the mediated MIMIC model may help explain DIF and give guidance in the revision of a DIF item.

  2. Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

    PubMed

    Liu, Xuejun; Shi, Xinxin; Chen, Chunlin; Zhang, Li

    2015-10-16

    The high-throughput sequencing technology, RNA-Seq, has been widely used to quantify gene and isoform expression in the study of transcriptome in recent years. Accurate expression measurement from the millions or billions of short generated reads is obstructed by difficulties. One is ambiguous mapping of reads to reference transcriptome caused by alternative splicing. This increases the uncertainty in estimating isoform expression. The other is non-uniformity of read distribution along the reference transcriptome due to positional, sequencing, mappability and other undiscovered sources of biases. This violates the uniform assumption of read distribution for many expression calculation approaches, such as the direct RPKM calculation and Poisson-based models. Many methods have been proposed to address these difficulties. Some approaches employ latent variable models to discover the underlying pattern of read sequencing. However, most of these methods make bias correction based on surrounding sequence contents and share the bias models by all genes. They therefore cannot estimate gene- and isoform-specific biases as revealed by recent studies. We propose a latent variable model, NLDMseq, to estimate gene and isoform expression. Our method adopts latent variables to model the unknown isoforms, from which reads originate, and the underlying percentage of multiple spliced variants. The isoform- and exon-specific read sequencing biases are modeled to account for the non-uniformity of read distribution, and are identified by utilizing the replicate information of multiple lanes of a single library run. We employ simulation and real data to verify the performance of our method in terms of accuracy in the calculation of gene and isoform expression. Results show that NLDMseq obtains competitive gene and isoform expression compared to popular alternatives. Finally, the proposed method is applied to the detection of differential expression (DE) to show its usefulness in the downstream analysis. The proposed NLDMseq method provides an approach to accurately estimate gene and isoform expression from RNA-Seq data by modeling the isoform- and exon-specific read sequencing biases. It makes use of a latent variable model to discover the hidden pattern of read sequencing. We have shown that it works well in both simulations and real datasets, and has competitive performance compared to popular methods. The method has been implemented as a freely available software which can be found at https://github.com/PUGEA/NLDMseq.

  3. Separation of the atmospheric variability into non-Gaussian multidimensional sources by projection pursuit techniques

    NASA Astrophysics Data System (ADS)

    Pires, Carlos A. L.; Ribeiro, Andreia F. S.

    2017-02-01

    We develop an expansion of space-distributed time series into statistically independent uncorrelated subspaces (statistical sources) of low-dimension and exhibiting enhanced non-Gaussian probability distributions with geometrically simple chosen shapes (projection pursuit rationale). The method relies upon a generalization of the principal component analysis that is optimal for Gaussian mixed signals and of the independent component analysis (ICA), optimized to split non-Gaussian scalar sources. The proposed method, supported by information theory concepts and methods, is the independent subspace analysis (ISA) that looks for multi-dimensional, intrinsically synergetic subspaces such as dyads (2D) and triads (3D), not separable by ICA. Basically, we optimize rotated variables maximizing certain nonlinear correlations (contrast functions) coming from the non-Gaussianity of the joint distribution. As a by-product, it provides nonlinear variable changes `unfolding' the subspaces into nearly Gaussian scalars of easier post-processing. Moreover, the new variables still work as nonlinear data exploratory indices of the non-Gaussian variability of the analysed climatic and geophysical fields. The method (ISA, followed by nonlinear unfolding) is tested into three datasets. The first one comes from the Lorenz'63 three-dimensional chaotic model, showing a clear separation into a non-Gaussian dyad plus an independent scalar. The second one is a mixture of propagating waves of random correlated phases in which the emergence of triadic wave resonances imprints a statistical signature in terms of a non-Gaussian non-separable triad. Finally the method is applied to the monthly variability of a high-dimensional quasi-geostrophic (QG) atmospheric model, applied to the Northern Hemispheric winter. We find that quite enhanced non-Gaussian dyads of parabolic shape, perform much better than the unrotated variables in which concerns the separation of the four model's centroid regimes (positive and negative phases of the Arctic Oscillation and of the North Atlantic Oscillation). Triads are also likely in the QG model but of weaker expression than dyads due to the imposed shape and dimension. The study emphasizes the existence of nonlinear dyadic and triadic nonlinear teleconnections.

  4. Intercomparison of 3D pore-scale flow and solute transport simulation methods

    DOE PAGES

    Mehmani, Yashar; Schoenherr, Martin; Pasquali, Andrea; ...

    2015-09-28

    Multiple numerical approaches have been developed to simulate porous media fluid flow and solute transport at the pore scale. These include 1) methods that explicitly model the three-dimensional geometry of pore spaces and 2) methods that conceptualize the pore space as a topologically consistent set of stylized pore bodies and pore throats. In previous work we validated a model of the first type, using computational fluid dynamics (CFD) codes employing a standard finite volume method (FVM), against magnetic resonance velocimetry (MRV) measurements of pore-scale velocities. Here we expand that validation to include additional models of the first type based onmore » the lattice Boltzmann method (LBM) and smoothed particle hydrodynamics (SPH), as well as a model of the second type, a pore-network model (PNM). The PNM approach used in the current study was recently improved and demonstrated to accurately simulate solute transport in a two-dimensional experiment. While the PNM approach is computationally much less demanding than direct numerical simulation methods, the effect of conceptualizing complex three-dimensional pore geometries on solute transport in the manner of PNMs has not been fully determined. We apply all four approaches (FVM-based CFD, LBM, SPH and PNM) to simulate pore-scale velocity distributions and (for capable codes) nonreactive solute transport, and intercompare the model results. Comparisons are drawn both in terms of macroscopic variables (e.g., permeability, solute breakthrough curves) and microscopic variables (e.g., local velocities and concentrations). Generally good agreement was achieved among the various approaches, but some differences were observed depending on the model context. The intercomparison work was challenging because of variable capabilities of the codes, and inspired some code enhancements to allow consistent comparison of flow and transport simulations across the full suite of methods. This paper provides support for confidence in a variety of pore-scale modeling methods and motivates further development and application of pore-scale simulation methods.« less

  5. Intercomparison of 3D pore-scale flow and solute transport simulation methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaofan; Mehmani, Yashar; Perkins, William A.

    2016-09-01

    Multiple numerical approaches have been developed to simulate porous media fluid flow and solute transport at the pore scale. These include 1) methods that explicitly model the three-dimensional geometry of pore spaces and 2) methods that conceptualize the pore space as a topologically consistent set of stylized pore bodies and pore throats. In previous work we validated a model of the first type, using computational fluid dynamics (CFD) codes employing a standard finite volume method (FVM), against magnetic resonance velocimetry (MRV) measurements of pore-scale velocities. Here we expand that validation to include additional models of the first type based onmore » the lattice Boltzmann method (LBM) and smoothed particle hydrodynamics (SPH), as well as a model of the second type, a pore-network model (PNM). The PNM approach used in the current study was recently improved and demonstrated to accurately simulate solute transport in a two-dimensional experiment. While the PNM approach is computationally much less demanding than direct numerical simulation methods, the effect of conceptualizing complex three-dimensional pore geometries on solute transport in the manner of PNMs has not been fully determined. We apply all four approaches (FVM-based CFD, LBM, SPH and PNM) to simulate pore-scale velocity distributions and (for capable codes) nonreactive solute transport, and intercompare the model results. Comparisons are drawn both in terms of macroscopic variables (e.g., permeability, solute breakthrough curves) and microscopic variables (e.g., local velocities and concentrations). Generally good agreement was achieved among the various approaches, but some differences were observed depending on the model context. The intercomparison work was challenging because of variable capabilities of the codes, and inspired some code enhancements to allow consistent comparison of flow and transport simulations across the full suite of methods. This study provides support for confidence in a variety of pore-scale modeling methods and motivates further development and application of pore-scale simulation methods.« less

  6. Global modeling of land water and energy balances. Part III: Interannual variability

    USGS Publications Warehouse

    Shmakin, A.B.; Milly, P.C.D.; Dunne, K.A.

    2002-01-01

    The Land Dynamics (LaD) model is tested by comparison with observations of interannual variations in discharge from 44 large river basins for which relatively accurate time series of monthly precipitation (a primary model input) have recently been computed. When results are pooled across all basins, the model explains 67% of the interannual variance of annual runoff ratio anomalies (i.e., anomalies of annual discharge volume, normalized by long-term mean precipitation volume). The new estimates of basin precipitation appear to offer an improvement over those from a state-of-the-art analysis of global precipitation (the Climate Prediction Center Merged Analysis of Precipitation, CMAP), judging from comparisons of parallel model runs and of analyses of precipitation-discharge correlations. When the new precipitation estimates are used, the performance of the LaD model is comparable to, but not significantly better than, that of a simple, semiempirical water-balance relation that uses only annual totals of surface net radiation and precipitation. This implies that the LaD simulations of interannual runoff variability do not benefit substantially from information on geographical variability of land parameters or seasonal structure of interannual variability of precipitation. The aforementioned analyses necessitated the development of a method for downscaling of long-term monthly precipitation data to the relatively short timescales necessary for running the model. The method merges the long-term data with a reference dataset of 1-yr duration, having high temporal resolution. The success of the method, for the model and data considered here, was demonstrated in a series of model-model comparisons and in the comparisons of modeled and observed interannual variations of basin discharge.

  7. Joint modelling rationale for chained equations

    PubMed Central

    2014-01-01

    Background Chained equations imputation is widely used in medical research. It uses a set of conditional models, so is more flexible than joint modelling imputation for the imputation of different types of variables (e.g. binary, ordinal or unordered categorical). However, chained equations imputation does not correspond to drawing from a joint distribution when the conditional models are incompatible. Concurrently with our work, other authors have shown the equivalence of the two imputation methods in finite samples. Methods Taking a different approach, we prove, in finite samples, sufficient conditions for chained equations and joint modelling to yield imputations from the same predictive distribution. Further, we apply this proof in four specific cases and conduct a simulation study which explores the consequences when the conditional models are compatible but the conditions otherwise are not satisfied. Results We provide an additional “non-informative margins” condition which, together with compatibility, is sufficient. We show that the non-informative margins condition is not satisfied, despite compatible conditional models, in a situation as simple as two continuous variables and one binary variable. Our simulation study demonstrates that as a consequence of this violation order effects can occur; that is, systematic differences depending upon the ordering of the variables in the chained equations algorithm. However, the order effects appear to be small, especially when associations between variables are weak. Conclusions Since chained equations is typically used in medical research for datasets with different types of variables, researchers must be aware that order effects are likely to be ubiquitous, but our results suggest they may be small enough to be negligible. PMID:24559129

  8. A joint modeling and estimation method for multivariate longitudinal data with mixed types of responses to analyze physical activity data generated by accelerometers.

    PubMed

    Li, Haocheng; Zhang, Yukun; Carroll, Raymond J; Keadle, Sarah Kozey; Sampson, Joshua N; Matthews, Charles E

    2017-11-10

    A mixed effect model is proposed to jointly analyze multivariate longitudinal data with continuous, proportion, count, and binary responses. The association of the variables is modeled through the correlation of random effects. We use a quasi-likelihood type approximation for nonlinear variables and transform the proposed model into a multivariate linear mixed model framework for estimation and inference. Via an extension to the EM approach, an efficient algorithm is developed to fit the model. The method is applied to physical activity data, which uses a wearable accelerometer device to measure daily movement and energy expenditure information. Our approach is also evaluated by a simulation study. Copyright © 2017 John Wiley & Sons, Ltd.

  9. Large-scale expensive black-box function optimization

    NASA Astrophysics Data System (ADS)

    Rashid, Kashif; Bailey, William; Couët, Benoît

    2012-09-01

    This paper presents the application of an adaptive radial basis function method to a computationally expensive black-box reservoir simulation model of many variables. An iterative proxy-based scheme is used to tune the control variables, distributed for finer control over a varying number of intervals covering the total simulation period, to maximize asset NPV. The method shows that large-scale simulation-based function optimization of several hundred variables is practical and effective.

  10. Finite-difference modeling with variable grid-size and adaptive time-step in porous media

    NASA Astrophysics Data System (ADS)

    Liu, Xinxin; Yin, Xingyao; Wu, Guochen

    2014-04-01

    Forward modeling of elastic wave propagation in porous media has great importance for understanding and interpreting the influences of rock properties on characteristics of seismic wavefield. However, the finite-difference forward-modeling method is usually implemented with global spatial grid-size and time-step; it consumes large amounts of computational cost when small-scaled oil/gas-bearing structures or large velocity-contrast exist underground. To overcome this handicap, combined with variable grid-size and time-step, this paper developed a staggered-grid finite-difference scheme for elastic wave modeling in porous media. Variable finite-difference coefficients and wavefield interpolation were used to realize the transition of wave propagation between regions of different grid-size. The accuracy and efficiency of the algorithm were shown by numerical examples. The proposed method is advanced with low computational cost in elastic wave simulation for heterogeneous oil/gas reservoirs.

  11. Assessing groundwater vulnerability to agrichemical contamination in the Midwest US

    USGS Publications Warehouse

    Burkart, M.R.; Kolpin, D.W.; James, D.E.

    1999-01-01

    Agrichemicals (herbicides and nitrate) are significant sources of diffuse pollution to groundwater. Indirect methods are needed to assess the potential for groundwater contamination by diffuse sources because groundwater monitoring is too costly to adequately define the geographic extent of contamination at a regional or national scale. This paper presents examples of the application of statistical, overlay and index, and process-based modeling methods for groundwater vulnerability assessments to a variety of data from the Midwest U.S. The principles for vulnerability assessment include both intrinsic (pedologic, climatologic, and hydrogeologic factors) and specific (contaminant and other anthropogenic factors) vulnerability of a location. Statistical methods use the frequency of contaminant occurrence, contaminant concentration, or contamination probability as a response variable. Statistical assessments are useful for defining the relations among explanatory and response variables whether they define intrinsic or specific vulnerability. Multivariate statistical analyses are useful for ranking variables critical to estimating water quality responses of interest. Overlay and index methods involve intersecting maps of intrinsic and specific vulnerability properties and indexing the variables by applying appropriate weights. Deterministic models use process-based equations to simulate contaminant transport and are distinguished from the other methods in their potential to predict contaminant transport in both space and time. An example of a one-dimensional leaching model linked to a geographic information system (GIS) to define a regional metamodel for contamination in the Midwest is included.

  12. Application of the Newtonian nudging data assimilation method for the Biebrza River flow model

    NASA Astrophysics Data System (ADS)

    Miroslaw-Swiatek, Dorota

    2010-05-01

    Data assimilation provides a tool for integrating observations of spatially distributed environmental variables with model predictions. In this paper a simple data assimilation technique, the Newtonian nudging to individual observations method, has been implemented in the 1D St. Venant equations. The method involves adding a term to the prognostic equation. This term is proportional to the difference between the value calculated in the model at a given point in time and space and the one resulted from observations. Improving the model with available measurement observations is accomplished by adequate weighting functions, that can incorporate prior knowledge about the spatial and temporal variability of the state variables being assimilated. The article contains a description of the numerical model, which employs the finite element method (FEM) to solve the 1D St. Venant equations modified by the ‘nudging' method. The developed model was applied to the Biebrza River, situated in the north-eastern part of Poland, flowing through the last extensive, fairly undisturbed river-marginal peatland in Europe. A 41 km long reach of the Lower Biebrza River described by 68 cross-sections was selected for the study. The observed water stage collected by automatic sensors was the subject of the data assimilation in the Newtonian nudging to individual observations method. The water level observation data were collected in four observation points along a river with time interval 6 hours for one year. The obtained results show a prediction with no nudging and influence of the nudging term on water stages forecast. The developed model enables integrating water stage observation with an unsteady river flow model for improved water level prediction.

  13. The forgotten component of sub-glacial heat flow: Upper crustal heat production and resultant total heat flux on the Antarctic Peninsula

    NASA Astrophysics Data System (ADS)

    Burton-Johnson, Alex; Halpin, Jacqueline; Whittaker, Joanne; Watson, Sally

    2017-04-01

    Seismic and magnetic geophysical methods have both been employed to produce estimates of heat flux beneath the Antarctic ice sheet. However, both methods use a homogeneous upper crustal model despite the variable concentration of heat producing elements within its composite lithologies. Using geological and geochemical datasets from the Antarctic Peninsula we have developed a new methodology for incorporating upper crustal heat production in heat flux models and have shown the greater variability this introduces in to estimates of crustal heat flux, with implications for glaciological modelling.

  14. Preserving Flow Variability in Watershed Model Calibrations

    EPA Science Inventory

    Background/Question/Methods Although watershed modeling flow calibration techniques often emphasize a specific flow mode, ecological conditions that depend on flow-ecology relationships often emphasize a range of flow conditions. We used informal likelihood methods to investig...

  15. A phenomenological approach to modeling chemical dynamics in nonlinear and two-dimensional spectroscopy.

    PubMed

    Ramasesha, Krupa; De Marco, Luigi; Horning, Andrew D; Mandal, Aritra; Tokmakoff, Andrei

    2012-04-07

    We present an approach for calculating nonlinear spectroscopic observables, which overcomes the approximations inherent to current phenomenological models without requiring the computational cost of performing molecular dynamics simulations. The trajectory mapping method uses the semi-classical approximation to linear and nonlinear response functions, and calculates spectra from trajectories of the system's transition frequencies and transition dipole moments. It rests on identifying dynamical variables important to the problem, treating the dynamics of these variables stochastically, and then generating correlated trajectories of spectroscopic quantities by mapping from the dynamical variables. This approach allows one to describe non-Gaussian dynamics, correlated dynamics between variables of the system, and nonlinear relationships between spectroscopic variables of the system and the bath such as non-Condon effects. We illustrate the approach by applying it to three examples that are often not adequately treated by existing analytical models--the non-Condon effect in the nonlinear infrared spectra of water, non-Gaussian dynamics inherent to strongly hydrogen bonded systems, and chemical exchange processes in barrier crossing reactions. The methods described are generally applicable to nonlinear spectroscopy throughout the optical, infrared and terahertz regions.

  16. [Compatible biomass models of natural spruce (Picea asperata)].

    PubMed

    Wang, Jin Chi; Deng, Hua Feng; Huang, Guo Sheng; Wang, Xue Jun; Zhang, Lu

    2017-10-01

    By using nonlinear measurement error method, the compatible tree volume and above ground biomass equations were established based on the volume and biomass data of 150 sampling trees of natural spruce (Picea asperata). Two approaches, controlling directly under total aboveground biomass and controlling jointly from level to level, were used to design the compatible system for the total aboveground biomass and the biomass of four components (stem, bark, branch and foliage), and the total ground biomass could be estimated independently or estimated simultaneously in the system. The results showed that the R 2 of the one variable and bivariate compatible tree volume and aboveground biomass equations were all above 0.85, and the maximum value reached 0.99. The prediction effect of the volume equations could be improved significantly when tree height was included as predictor, while it was not significant in biomass estimation. For the compatible biomass systems, the one variable model based on controlling jointly from level to level was better than the model using controlling directly under total above ground biomass, but the bivariate models of the two methods were similar. Comparing the imitative effects of the one variable and bivariate compatible biomass models, the results showed that the increase of explainable variables could significantly improve the fitness of branch and foliage biomass, but had little effect on other components. Besides, there was almost no difference between the two methods of estimation based on the comparison.

  17. Improved Model Fitting for the Empirical Green's Function Approach Using Hierarchical Models

    NASA Astrophysics Data System (ADS)

    Van Houtte, Chris; Denolle, Marine

    2018-04-01

    Stress drops calculated from source spectral studies currently show larger variability than what is implied by empirical ground motion models. One of the potential origins of the inflated variability is the simplified model-fitting techniques used in most source spectral studies. This study examines a variety of model-fitting methods and shows that the choice of method can explain some of the discrepancy. The preferred method is Bayesian hierarchical modeling, which can reduce bias, better quantify uncertainties, and allow additional effects to be resolved. Two case study earthquakes are examined, the 2016 MW7.1 Kumamoto, Japan earthquake and a MW5.3 aftershock of the 2016 MW7.8 Kaikōura earthquake. By using hierarchical models, the variation of the corner frequency, fc, and the falloff rate, n, across the focal sphere can be retrieved without overfitting the data. Other methods commonly used to calculate corner frequencies may give substantial biases. In particular, if fc was calculated for the Kumamoto earthquake using an ω-square model, the obtained fc could be twice as large as a realistic value.

  18. Estimating the Depth of Stratigraphic Units from Marine Seismic Profiles Using Nonstationary Geostatistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chihi, Hayet; Galli, Alain; Ravenne, Christian

    2000-03-15

    The object of this study is to build a three-dimensional (3D) geometric model of the stratigraphic units of the margin of the Rhone River on the basis of geophysical investigations by a network of seismic profiles at sea. The geometry of these units is described by depth charts of each surface identified by seismic profiling, which is done by geostatistics. The modeling starts by a statistical analysis by which we determine the parameters that enable us to calculate the variograms of the identified surfaces. After having determined the statistical parameters, we calculate the variograms of the variable Depth. By analyzingmore » the behavior of the variogram we then can deduce whether the situation is stationary and if the variable has an anisotropic behavior. We tried the following two nonstationary methods to obtain our estimates: (a) The method of universal kriging if the underlying variogram was directly accessible. (b) The method of increments if the underlying variogram was not directly accessible. After having modeled the variograms of the increments and of the variable itself, we calculated the surfaces by kriging the variable Depth on a small-mesh estimation grid. The two methods then are compared and their respective advantages and disadvantages are discussed, as well as their fields of application. These methods are capable of being used widely in earth sciences for automatic mapping of geometric surfaces or for variables such as a piezometric surface or a concentration, which are not 'stationary,' that is, essentially, possess a gradient or a tendency to develop systematically in space.« less

  19. A hybrid machine learning model to estimate nitrate contamination of production zone groundwater in the Central Valley, California

    NASA Astrophysics Data System (ADS)

    Ransom, K.; Nolan, B. T.; Faunt, C. C.; Bell, A.; Gronberg, J.; Traum, J.; Wheeler, D. C.; Rosecrans, C.; Belitz, K.; Eberts, S.; Harter, T.

    2016-12-01

    A hybrid, non-linear, machine learning statistical model was developed within a statistical learning framework to predict nitrate contamination of groundwater to depths of approximately 500 m below ground surface in the Central Valley, California. A database of 213 predictor variables representing well characteristics, historical and current field and county scale nitrogen mass balance, historical and current landuse, oxidation/reduction conditions, groundwater flow, climate, soil characteristics, depth to groundwater, and groundwater age were assigned to over 6,000 private supply and public supply wells measured previously for nitrate and located throughout the study area. The machine learning method, gradient boosting machine (GBM) was used to screen predictor variables and rank them in order of importance in relation to the groundwater nitrate measurements. The top five most important predictor variables included oxidation/reduction characteristics, historical field scale nitrogen mass balance, climate, and depth to 60 year old water. Twenty-two variables were selected for the final model and final model errors for log-transformed hold-out data were R squared of 0.45 and root mean square error (RMSE) of 1.124. Modeled mean groundwater age was tested separately for error improvement in the model and when included decreased model RMSE by 0.5% compared to the same model without age and by 0.20% compared to the model with all 213 variables. 1D and 2D partial plots were examined to determine how variables behave individually and interact in the model. Some variables behaved as expected: log nitrate decreased with increasing probability of anoxic conditions and depth to 60 year old water, generally decreased with increasing natural landuse surrounding wells and increasing mean groundwater age, generally increased with increased minimum depth to high water table and with increased base flow index value. Other variables exhibited much more erratic or noisy behavior in the model making them more difficult to interpret but highlighting the usefulness of the non-linear machine learning method. 2D interaction plots show probability of anoxic groundwater conditions largely control estimated nitrate concentrations compared to the other predictors.

  20. Modeling of Continuum Manipulators Using Pythagorean Hodograph Curves.

    PubMed

    Singh, Inderjeet; Amara, Yacine; Melingui, Achille; Mani Pathak, Pushparaj; Merzouki, Rochdi

    2018-05-10

    Research on continuum manipulators is increasingly developing in the context of bionic robotics because of their many advantages over conventional rigid manipulators. Due to their soft structure, they have inherent flexibility, which makes it a huge challenge to control them with high performances. Before elaborating a control strategy of such robots, it is essential to reconstruct first the behavior of the robot through development of an approximate behavioral model. This can be kinematic or dynamic depending on the conditions of operation of the robot itself. Kinematically, two types of modeling methods exist to describe the robot behavior; quantitative methods describe a model-based method, and qualitative methods describe a learning-based method. In kinematic modeling of continuum manipulator, the assumption of constant curvature is often considered to simplify the model formulation. In this work, a quantitative modeling method is proposed, based on the Pythagorean hodograph (PH) curves. The aim is to obtain a three-dimensional reconstruction of the shape of the continuum manipulator with variable curvature, allowing the calculation of its inverse kinematic model (IKM). It is noticed that the performances of the PH-based kinematic modeling of continuum manipulators are considerable regarding position accuracy, shape reconstruction, and time/cost of the model calculation, than other kinematic modeling methods, for two cases: free load manipulation and variable load manipulation. This modeling method is applied to the compact bionic handling assistant (CBHA) manipulator for validation. The results are compared with other IKMs developed in case of CBHA manipulator.

  1. The variability of software scoring of the CDMAM phantom associated with a limited number of images

    NASA Astrophysics Data System (ADS)

    Yang, Chang-Ying J.; Van Metter, Richard

    2007-03-01

    Software scoring approaches provide an attractive alternative to human evaluation of CDMAM images from digital mammography systems, particularly for annual quality control testing as recommended by the European Protocol for the Quality Control of the Physical and Technical Aspects of Mammography Screening (EPQCM). Methods for correlating CDCOM-based results with human observer performance have been proposed. A common feature of all methods is the use of a small number (at most eight) of CDMAM images to evaluate the system. This study focuses on the potential variability in the estimated system performance that is associated with these methods. Sets of 36 CDMAM images were acquired under carefully controlled conditions from three different digital mammography systems. The threshold visibility thickness (TVT) for each disk diameter was determined using previously reported post-analysis methods from the CDCOM scorings for a randomly selected group of eight images for one measurement trial. This random selection process was repeated 3000 times to estimate the variability in the resulting TVT values for each disk diameter. The results from using different post-analysis methods, different random selection strategies and different digital systems were compared. Additional variability of the 0.1 mm disk diameter was explored by comparing the results from two different image data sets acquired under the same conditions from the same system. The magnitude and the type of error estimated for experimental data was explained through modeling. The modeled results also suggest a limitation in the current phantom design for the 0.1 mm diameter disks. Through modeling, it was also found that, because of the binomial statistic nature of the CDMAM test, the true variability of the test could be underestimated by the commonly used method of random re-sampling.

  2. Gaussian Mixture Model of Heart Rate Variability

    PubMed Central

    Costa, Tommaso; Boccignone, Giuseppe; Ferraro, Mario

    2012-01-01

    Heart rate variability (HRV) is an important measure of sympathetic and parasympathetic functions of the autonomic nervous system and a key indicator of cardiovascular condition. This paper proposes a novel method to investigate HRV, namely by modelling it as a linear combination of Gaussians. Results show that three Gaussians are enough to describe the stationary statistics of heart variability and to provide a straightforward interpretation of the HRV power spectrum. Comparisons have been made also with synthetic data generated from different physiologically based models showing the plausibility of the Gaussian mixture parameters. PMID:22666386

  3. Post-processing method for wind speed ensemble forecast using wind speed and direction

    NASA Astrophysics Data System (ADS)

    Sofie Eide, Siri; Bjørnar Bremnes, John; Steinsland, Ingelin

    2017-04-01

    Statistical methods are widely applied to enhance the quality of both deterministic and ensemble NWP forecasts. In many situations, like wind speed forecasting, most of the predictive information is contained in one variable in the NWP models. However, in statistical calibration of deterministic forecasts it is often seen that including more variables can further improve forecast skill. For ensembles this is rarely taken advantage of, mainly due to that it is generally not straightforward how to include multiple variables. In this study, it is demonstrated how multiple variables can be included in Bayesian model averaging (BMA) by using a flexible regression method for estimating the conditional means. The method is applied to wind speed forecasting at 204 Norwegian stations based on wind speed and direction forecasts from the ECMWF ensemble system. At about 85 % of the sites the ensemble forecasts were improved in terms of CRPS by adding wind direction as predictor compared to only using wind speed. On average the improvements were about 5 %, but mainly for moderate to strong wind situations. For weak wind speeds adding wind direction had more or less neutral impact.

  4. Time series analysis as input for clinical predictive modeling: modeling cardiac arrest in a pediatric ICU.

    PubMed

    Kennedy, Curtis E; Turley, James P

    2011-10-24

    Thousands of children experience cardiac arrest events every year in pediatric intensive care units. Most of these children die. Cardiac arrest prediction tools are used as part of medical emergency team evaluations to identify patients in standard hospital beds that are at high risk for cardiac arrest. There are no models to predict cardiac arrest in pediatric intensive care units though, where the risk of an arrest is 10 times higher than for standard hospital beds. Current tools are based on a multivariable approach that does not characterize deterioration, which often precedes cardiac arrests. Characterizing deterioration requires a time series approach. The purpose of this study is to propose a method that will allow for time series data to be used in clinical prediction models. Successful implementation of these methods has the potential to bring arrest prediction to the pediatric intensive care environment, possibly allowing for interventions that can save lives and prevent disabilities. We reviewed prediction models from nonclinical domains that employ time series data, and identified the steps that are necessary for building predictive models using time series clinical data. We illustrate the method by applying it to the specific case of building a predictive model for cardiac arrest in a pediatric intensive care unit. Time course analysis studies from genomic analysis provided a modeling template that was compatible with the steps required to develop a model from clinical time series data. The steps include: 1) selecting candidate variables; 2) specifying measurement parameters; 3) defining data format; 4) defining time window duration and resolution; 5) calculating latent variables for candidate variables not directly measured; 6) calculating time series features as latent variables; 7) creating data subsets to measure model performance effects attributable to various classes of candidate variables; 8) reducing the number of candidate features; 9) training models for various data subsets; and 10) measuring model performance characteristics in unseen data to estimate their external validity. We have proposed a ten step process that results in data sets that contain time series features and are suitable for predictive modeling by a number of methods. We illustrated the process through an example of cardiac arrest prediction in a pediatric intensive care setting.

  5. Results from the VALUE perfect predictor experiment: process-based evaluation

    NASA Astrophysics Data System (ADS)

    Maraun, Douglas; Soares, Pedro; Hertig, Elke; Brands, Swen; Huth, Radan; Cardoso, Rita; Kotlarski, Sven; Casado, Maria; Pongracz, Rita; Bartholy, Judit

    2016-04-01

    Until recently, the evaluation of downscaled climate model simulations has typically been limited to surface climatologies, including long term means, spatial variability and extremes. But these aspects are often, at least partly, tuned in regional climate models to match observed climate. The tuning issue is of course particularly relevant for bias corrected regional climate models. In general, a good performance of a model for these aspects in present climate does therefore not imply a good performance in simulating climate change. It is now widely accepted that, to increase our condidence in climate change simulations, it is necessary to evaluate how climate models simulate relevant underlying processes. In other words, it is important to assess whether downscaling does the right for the right reason. Therefore, VALUE has carried out a broad process-based evaluation study based on its perfect predictor experiment simulations: the downscaling methods are driven by ERA-Interim data over the period 1979-2008, reference observations are given by a network of 85 meteorological stations covering all European climates. More than 30 methods participated in the evaluation. In order to compare statistical and dynamical methods, only variables provided by both types of approaches could be considered. This limited the analysis to conditioning local surface variables on variables from driving processes that are simulated by ERA-Interim. We considered the following types of processes: at the continental scale, we evaluated the performance of downscaling methods for positive and negative North Atlantic Oscillation, Atlantic ridge and blocking situations. At synoptic scales, we considered Lamb weather types for selected European regions such as Scandinavia, the United Kingdom, the Iberian Pensinsula or the Alps. At regional scales we considered phenomena such as the Mistral, the Bora or the Iberian coastal jet. Such process-based evaluation helps to attribute biases in surface variables to underlying processes and ultimately to improve climate models.

  6. Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time-to-Event Analysis.

    PubMed

    Gong, Xiajing; Hu, Meng; Zhao, Liang

    2018-05-01

    Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time-to-event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high-dimensional data featured by a large number of predictor variables. Our results showed that ML-based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high-dimensional data. The prediction performances of ML-based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML-based methods provide a powerful tool for time-to-event analysis, with a built-in capacity for high-dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. © 2018 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

  7. [Discrimination of varieties of brake fluid using visual-near infrared spectra].

    PubMed

    Jiang, Lu-lu; Tan, Li-hong; Qiu, Zheng-jun; Lu, Jiang-feng; He, Yong

    2008-06-01

    A new method was developed to fast discriminate brands of brake fluid by means of visual-near infrared spectroscopy. Five different brands of brake fluid were analyzed using a handheld near infrared spectrograph, manufactured by ASD Company, and 60 samples were gotten from each brand of brake fluid. The samples data were pretreated using average smoothing and standard normal variable method, and then analyzed using principal component analysis (PCA). A 2-dimensional plot was drawn based on the first and the second principal components, and the plot indicated that the clustering characteristic of different brake fluid is distinct. The foregoing 6 principal components were taken as input variable, and the band of brake fluid as output variable to build the discriminate model by stepwise discriminant analysis method. Two hundred twenty five samples selected randomly were used to create the model, and the rest 75 samples to verify the model. The result showed that the distinguishing rate was 94.67%, indicating that the method proposed in this paper has good performance in classification and discrimination. It provides a new way to fast discriminate different brands of brake fluid.

  8. A Note on the Use of Missing Auxiliary Variables in Full Information Maximum Likelihood-Based Structural Equation Models

    ERIC Educational Resources Information Center

    Enders, Craig K.

    2008-01-01

    Recent missing data studies have argued in favor of an "inclusive analytic strategy" that incorporates auxiliary variables into the estimation routine, and Graham (2003) outlined methods for incorporating auxiliary variables into structural equation analyses. In practice, the auxiliary variables often have missing values, so it is reasonable to…

  9. Plasticity models of material variability based on uncertainty quantification techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jones, Reese E.; Rizzi, Francesco; Boyce, Brad

    The advent of fabrication techniques like additive manufacturing has focused attention on the considerable variability of material response due to defects and other micro-structural aspects. This variability motivates the development of an enhanced design methodology that incorporates inherent material variability to provide robust predictions of performance. In this work, we develop plasticity models capable of representing the distribution of mechanical responses observed in experiments using traditional plasticity models of the mean response and recently developed uncertainty quantification (UQ) techniques. Lastly, we demonstrate that the new method provides predictive realizations that are superior to more traditional ones, and how these UQmore » techniques can be used in model selection and assessing the quality of calibrated physical parameters.« less

  10. Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS.

    PubMed

    Wiedermann, Wolfgang; Li, Xintong

    2018-04-16

    In nonexperimental data, at least three possible explanations exist for the association of two variables x and y: (1) x is the cause of y, (2) y is the cause of x, or (3) an unmeasured confounder is present. Statistical tests that identify which of the three explanatory models fits best would be a useful adjunct to the use of theory alone. The present article introduces one such statistical method, direction dependence analysis (DDA), which assesses the relative plausibility of the three explanatory models on the basis of higher-moment information about the variables (i.e., skewness and kurtosis). DDA involves the evaluation of three properties of the data: (1) the observed distributions of the variables, (2) the residual distributions of the competing models, and (3) the independence properties of the predictors and residuals of the competing models. When the observed variables are nonnormally distributed, we show that DDA components can be used to uniquely identify each explanatory model. Statistical inference methods for model selection are presented, and macros to implement DDA in SPSS are provided. An empirical example is given to illustrate the approach. Conceptual and empirical considerations are discussed for best-practice applications in psychological data, and sample size recommendations based on previous simulation studies are provided.

  11. Prepositioning emergency supplies under uncertainty: a parametric optimization method

    NASA Astrophysics Data System (ADS)

    Bai, Xuejie; Gao, Jinwu; Liu, Yankui

    2018-07-01

    Prepositioning of emergency supplies is an effective method for increasing preparedness for disasters and has received much attention in recent years. In this article, the prepositioning problem is studied by a robust parametric optimization method. The transportation cost, supply, demand and capacity are unknown prior to the extraordinary event, which are represented as fuzzy parameters with variable possibility distributions. The variable possibility distributions are obtained through the credibility critical value reduction method for type-2 fuzzy variables. The prepositioning problem is formulated as a fuzzy value-at-risk model to achieve a minimum total cost incurred in the whole process. The key difficulty in solving the proposed optimization model is to evaluate the quantile of the fuzzy function in the objective and the credibility in the constraints. The objective function and constraints can be turned into their equivalent parametric forms through chance constrained programming under the different confidence levels. Taking advantage of the structural characteristics of the equivalent optimization model, a parameter-based domain decomposition method is developed to divide the original optimization problem into six mixed-integer parametric submodels, which can be solved by standard optimization solvers. Finally, to explore the viability of the developed model and the solution approach, some computational experiments are performed on realistic scale case problems. The computational results reported in the numerical example show the credibility and superiority of the proposed parametric optimization method.

  12. Prediction of hourly PM2.5 using a space-time support vector regression model

    NASA Astrophysics Data System (ADS)

    Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang

    2018-05-01

    Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.

  13. Evaluation of Reliability Coefficients for Two-Level Models via Latent Variable Analysis

    ERIC Educational Resources Information Center

    Raykov, Tenko; Penev, Spiridon

    2010-01-01

    A latent variable analysis procedure for evaluation of reliability coefficients for 2-level models is outlined. The method provides point and interval estimates of group means' reliability, overall reliability of means, and conditional reliability. In addition, the approach can be used to test simple hypotheses about these parameters. The…

  14. A Comparison of Joint Model and Fully Conditional Specification Imputation for Multilevel Missing Data

    ERIC Educational Resources Information Center

    Mistler, Stephen A.; Enders, Craig K.

    2017-01-01

    Multiple imputation methods can generally be divided into two broad frameworks: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution, whereas FCS imputes variables one at a time from a series of univariate conditional…

  15. Evaluation of Scale Reliability with Binary Measures Using Latent Variable Modeling

    ERIC Educational Resources Information Center

    Raykov, Tenko; Dimitrov, Dimiter M.; Asparouhov, Tihomir

    2010-01-01

    A method for interval estimation of scale reliability with discrete data is outlined. The approach is applicable with multi-item instruments consisting of binary measures, and is developed within the latent variable modeling methodology. The procedure is useful for evaluation of consistency of single measures and of sum scores from item sets…

  16. Intraclass Correlation Coefficients in Hierarchical Designs: Evaluation Using Latent Variable Modeling

    ERIC Educational Resources Information Center

    Raykov, Tenko

    2011-01-01

    Interval estimation of intraclass correlation coefficients in hierarchical designs is discussed within a latent variable modeling framework. A method accomplishing this aim is outlined, which is applicable in two-level studies where participants (or generally lower-order units) are clustered within higher-order units. The procedure can also be…

  17. Evaluation of Weighted Scale Reliability and Criterion Validity: A Latent Variable Modeling Approach

    ERIC Educational Resources Information Center

    Raykov, Tenko

    2007-01-01

    A method is outlined for evaluating the reliability and criterion validity of weighted scales based on sets of unidimensional measures. The approach is developed within the framework of latent variable modeling methodology and is useful for point and interval estimation of these measurement quality coefficients in counseling and education…

  18. Copula Models for Sociology: Measures of Dependence and Probabilities for Joint Distributions

    ERIC Educational Resources Information Center

    Vuolo, Mike

    2017-01-01

    Often in sociology, researchers are confronted with nonnormal variables whose joint distribution they wish to explore. Yet, assumptions of common measures of dependence can fail or estimating such dependence is computationally intensive. This article presents the copula method for modeling the joint distribution of two random variables, including…

  19. A Revised Simplex Method for Test Construction Problems. Research Report 90-5.

    ERIC Educational Resources Information Center

    Adema, Jos J.

    Linear programming models with 0-1 variables are useful for the construction of tests from an item bank. Most solution strategies for these models start with solving the relaxed 0-1 linear programming model, allowing the 0-1 variables to take on values between 0 and 1. Then, a 0-1 solution is found by just rounding, optimal rounding, or a…

  20. Power maximization of variable-speed variable-pitch wind turbines using passive adaptive neural fault tolerant control

    NASA Astrophysics Data System (ADS)

    Habibi, Hamed; Rahimi Nohooji, Hamed; Howard, Ian

    2017-09-01

    Power maximization has always been a practical consideration in wind turbines. The question of how to address optimal power capture, especially when the system dynamics are nonlinear and the actuators are subject to unknown faults, is significant. This paper studies the control methodology for variable-speed variable-pitch wind turbines including the effects of uncertain nonlinear dynamics, system fault uncertainties, and unknown external disturbances. The nonlinear model of the wind turbine is presented, and the problem of maximizing extracted energy is formulated by designing the optimal desired states. With the known system, a model-based nonlinear controller is designed; then, to handle uncertainties, the unknown nonlinearities of the wind turbine are estimated by utilizing radial basis function neural networks. The adaptive neural fault tolerant control is designed passively to be robust on model uncertainties, disturbances including wind speed and model noises, and completely unknown actuator faults including generator torque and pitch actuator torque. The Lyapunov direct method is employed to prove that the closed-loop system is uniformly bounded. Simulation studies are performed to verify the effectiveness of the proposed method.

  1. Integrating models that depend on variable data

    NASA Astrophysics Data System (ADS)

    Banks, A. T.; Hill, M. C.

    2016-12-01

    Models of human-Earth systems are often developed with the goal of predicting the behavior of one or more dependent variables from multiple independent variables, processes, and parameters. Often dependent variable values range over many orders of magnitude, which complicates evaluation of the fit of the dependent variable values to observations. Many metrics and optimization methods have been proposed to address dependent variable variability, with little consensus being achieved. In this work, we evaluate two such methods: log transformation (based on the dependent variable being log-normally distributed with a constant variance) and error-based weighting (based on a multi-normal distribution with variances that tend to increase as the dependent variable value increases). Error-based weighting has the advantage of encouraging model users to carefully consider data errors, such as measurement and epistemic errors, while log-transformations can be a black box for typical users. Placing the log-transformation into the statistical perspective of error-based weighting has not formerly been considered, to the best of our knowledge. To make the evaluation as clear and reproducible as possible, we use multiple linear regression (MLR). Simulations are conducted with MatLab. The example represents stream transport of nitrogen with up to eight independent variables. The single dependent variable in our example has values that range over 4 orders of magnitude. Results are applicable to any problem for which individual or multiple data types produce a large range of dependent variable values. For this problem, the log transformation produced good model fit, while some formulations of error-based weighting worked poorly. Results support previous suggestions fthat error-based weighting derived from a constant coefficient of variation overemphasizes low values and degrades model fit to high values. Applying larger weights to the high values is inconsistent with the log-transformation. Greater consistency is obtained by imposing smaller (by up to a factor of 1/35) weights on the smaller dependent-variable values. From an error-based perspective, the small weights are consistent with large standard deviations. This work considers the consequences of these two common ways of addressing variable data.

  2. Hyperspectral imaging for predicting the allicin and soluble solid content of garlic with variable selection algorithms and chemometric models.

    PubMed

    Rahman, Anisur; Faqeerzada, Mohammad A; Cho, Byoung-Kwan

    2018-03-14

    Allicin and soluble solid content (SSC) in garlic is the responsible for its pungent flavor and odor. However, current conventional methods such as the use of high-pressure liquid chromatography and a refractometer have critical drawbacks in that they are time-consuming, labor-intensive and destructive procedures. The present study aimed to predict allicin and SSC in garlic using hyperspectral imaging in combination with variable selection algorithms and calibration models. Hyperspectral images of 100 garlic cloves were acquired that covered two spectral ranges, from which the mean spectra of each clove were extracted. The calibration models included partial least squares (PLS) and least squares-support vector machine (LS-SVM) regression, as well as different spectral pre-processing techniques, from which the highest performing spectral preprocessing technique and spectral range were selected. Then, variable selection methods, such as regression coefficients, variable importance in projection (VIP) and the successive projections algorithm (SPA), were evaluated for the selection of effective wavelengths (EWs). Furthermore, PLS and LS-SVM regression methods were applied to quantitatively predict the quality attributes of garlic using the selected EWs. Of the established models, the SPA-LS-SVM model obtained an Rpred2 of 0.90 and standard error of prediction (SEP) of 1.01% for SSC prediction, whereas the VIP-LS-SVM model produced the best result with an Rpred2 of 0.83 and SEP of 0.19 mg g -1 for allicin prediction in the range 1000-1700 nm. Furthermore, chemical images of garlic were developed using the best predictive model to facilitate visualization of the spatial distributions of allicin and SSC. The present study clearly demonstrates that hyperspectral imaging combined with an appropriate chemometrics method can potentially be employed as a fast, non-invasive method to predict the allicin and SSC in garlic. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.

  3. Replicates in high dimensions, with applications to latent variable graphical models.

    PubMed

    Tan, Kean Ming; Ning, Yang; Witten, Daniela M; Liu, Han

    2016-12-01

    In classical statistics, much thought has been put into experimental design and data collection. In the high-dimensional setting, however, experimental design has been less of a focus. In this paper, we stress the importance of collecting multiple replicates for each subject in this setting. We consider learning the structure of a graphical model with latent variables, under the assumption that these variables take a constant value across replicates within each subject. By collecting multiple replicates for each subject, we are able to estimate the conditional dependence relationships among the observed variables given the latent variables. To test the null hypothesis of conditional independence between two observed variables, we propose a pairwise decorrelated score test. Theoretical guarantees are established for parameter estimation and for this test. We show that our proposal is able to estimate latent variable graphical models more accurately than some existing proposals, and apply the proposed method to a brain imaging dataset.

  4. High dimensional model representation method for fuzzy structural dynamics

    NASA Astrophysics Data System (ADS)

    Adhikari, S.; Chowdhury, R.; Friswell, M. I.

    2011-03-01

    Uncertainty propagation in multi-parameter complex structures possess significant computational challenges. This paper investigates the possibility of using the High Dimensional Model Representation (HDMR) approach when uncertain system parameters are modeled using fuzzy variables. In particular, the application of HDMR is proposed for fuzzy finite element analysis of linear dynamical systems. The HDMR expansion is an efficient formulation for high-dimensional mapping in complex systems if the higher order variable correlations are weak, thereby permitting the input-output relationship behavior to be captured by the terms of low-order. The computational effort to determine the expansion functions using the α-cut method scales polynomically with the number of variables rather than exponentially. This logic is based on the fundamental assumption underlying the HDMR representation that only low-order correlations among the input variables are likely to have significant impacts upon the outputs for most high-dimensional complex systems. The proposed method is first illustrated for multi-parameter nonlinear mathematical test functions with fuzzy variables. The method is then integrated with a commercial finite element software (ADINA). Modal analysis of a simplified aircraft wing with fuzzy parameters has been used to illustrate the generality of the proposed approach. In the numerical examples, triangular membership functions have been used and the results have been validated against direct Monte Carlo simulations. It is shown that using the proposed HDMR approach, the number of finite element function calls can be reduced without significantly compromising the accuracy.

  5. The Effect of Latent Binary Variables on the Uncertainty of the Prediction of a Dichotomous Outcome Using Logistic Regression Based Propensity Score Matching.

    PubMed

    Szekér, Szabolcs; Vathy-Fogarassy, Ágnes

    2018-01-01

    Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.

  6. Transferability of optimally-selected climate models in the quantification of climate change impacts on hydrology

    NASA Astrophysics Data System (ADS)

    Chen, Jie; Brissette, François P.; Lucas-Picher, Philippe

    2016-11-01

    Given the ever increasing number of climate change simulations being carried out, it has become impractical to use all of them to cover the uncertainty of climate change impacts. Various methods have been proposed to optimally select subsets of a large ensemble of climate simulations for impact studies. However, the behaviour of optimally-selected subsets of climate simulations for climate change impacts is unknown, since the transfer process from climate projections to the impact study world is usually highly non-linear. Consequently, this study investigates the transferability of optimally-selected subsets of climate simulations in the case of hydrological impacts. Two different methods were used for the optimal selection of subsets of climate scenarios, and both were found to be capable of adequately representing the spread of selected climate model variables contained in the original large ensemble. However, in both cases, the optimal subsets had limited transferability to hydrological impacts. To capture a similar variability in the impact model world, many more simulations have to be used than those that are needed to simply cover variability from the climate model variables' perspective. Overall, both optimal subset selection methods were better than random selection when small subsets were selected from a large ensemble for impact studies. However, as the number of selected simulations increased, random selection often performed better than the two optimal methods. To ensure adequate uncertainty coverage, the results of this study imply that selecting as many climate change simulations as possible is the best avenue. Where this was not possible, the two optimal methods were found to perform adequately.

  7. Comparing multiple imputation methods for systematically missing subject-level data.

    PubMed

    Kline, David; Andridge, Rebecca; Kaizar, Eloise

    2017-06-01

    When conducting research synthesis, the collection of studies that will be combined often do not measure the same set of variables, which creates missing data. When the studies to combine are longitudinal, missing data can occur on the observation-level (time-varying) or the subject-level (non-time-varying). Traditionally, the focus of missing data methods for longitudinal data has been on missing observation-level variables. In this paper, we focus on missing subject-level variables and compare two multiple imputation approaches: a joint modeling approach and a sequential conditional modeling approach. We find the joint modeling approach to be preferable to the sequential conditional approach, except when the covariance structure of the repeated outcome for each individual has homogenous variance and exchangeable correlation. Specifically, the regression coefficient estimates from an analysis incorporating imputed values based on the sequential conditional method are attenuated and less efficient than those from the joint method. Remarkably, the estimates from the sequential conditional method are often less efficient than a complete case analysis, which, in the context of research synthesis, implies that we lose efficiency by combining studies. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  8. A method to encapsulate model structural uncertainty in ensemble projections of future climate: EPIC v1.0

    NASA Astrophysics Data System (ADS)

    Lewis, Jared; Bodeker, Greg E.; Kremser, Stefanie; Tait, Andrew

    2017-12-01

    A method, based on climate pattern scaling, has been developed to expand a small number of projections of fields of a selected climate variable (X) into an ensemble that encapsulates a wide range of indicative model structural uncertainties. The method described in this paper is referred to as the Ensemble Projections Incorporating Climate model uncertainty (EPIC) method. Each ensemble member is constructed by adding contributions from (1) a climatology derived from observations that represents the time-invariant part of the signal; (2) a contribution from forced changes in X, where those changes can be statistically related to changes in global mean surface temperature (Tglobal); and (3) a contribution from unforced variability that is generated by a stochastic weather generator. The patterns of unforced variability are also allowed to respond to changes in Tglobal. The statistical relationships between changes in X (and its patterns of variability) and Tglobal are obtained in a training phase. Then, in an implementation phase, 190 simulations of Tglobal are generated using a simple climate model tuned to emulate 19 different global climate models (GCMs) and 10 different carbon cycle models. Using the generated Tglobal time series and the correlation between the forced changes in X and Tglobal, obtained in the training phase, the forced change in the X field can be generated many times using Monte Carlo analysis. A stochastic weather generator is used to generate realistic representations of weather which include spatial coherence. Because GCMs and regional climate models (RCMs) are less likely to correctly represent unforced variability compared to observations, the stochastic weather generator takes as input measures of variability derived from observations, but also responds to forced changes in climate in a way that is consistent with the RCM projections. This approach to generating a large ensemble of projections is many orders of magnitude more computationally efficient than running multiple GCM or RCM simulations. Such a large ensemble of projections permits a description of a probability density function (PDF) of future climate states rather than a small number of individual story lines within that PDF, which may not be representative of the PDF as a whole; the EPIC method largely corrects for such potential sampling biases. The method is useful for providing projections of changes in climate to users wishing to investigate the impacts and implications of climate change in a probabilistic way. A web-based tool, using the EPIC method to provide probabilistic projections of changes in daily maximum and minimum temperatures for New Zealand, has been developed and is described in this paper.

  9. Selection of a Geostatistical Method to Interpolate Soil Properties of the State Crop Testing Fields using Attributes of a Digital Terrain Model

    NASA Astrophysics Data System (ADS)

    Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.

    2018-03-01

    The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.

  10. Comparing daily temperature averaging methods: the role of surface and atmosphere variables in determining spatial and seasonal variability

    NASA Astrophysics Data System (ADS)

    Bernhardt, Jase; Carleton, Andrew M.

    2018-05-01

    The two main methods for determining the average daily near-surface air temperature, twice-daily averaging (i.e., [Tmax+Tmin]/2) and hourly averaging (i.e., the average of 24 hourly temperature measurements), typically show differences associated with the asymmetry of the daily temperature curve. To quantify the relative influence of several land surface and atmosphere variables on the two temperature averaging methods, we correlate data for 215 weather stations across the Contiguous United States (CONUS) for the period 1981-2010 with the differences between the two temperature-averaging methods. The variables are land use-land cover (LULC) type, soil moisture, snow cover, cloud cover, atmospheric moisture (i.e., specific humidity, dew point temperature), and precipitation. Multiple linear regression models explain the spatial and monthly variations in the difference between the two temperature-averaging methods. We find statistically significant correlations between both the land surface and atmosphere variables studied with the difference between temperature-averaging methods, especially for the extreme (i.e., summer, winter) seasons (adjusted R2 > 0.50). Models considering stations with certain LULC types, particularly forest and developed land, have adjusted R2 values > 0.70, indicating that both surface and atmosphere variables control the daily temperature curve and its asymmetry. This study improves our understanding of the role of surface and near-surface conditions in modifying thermal climates of the CONUS for a wide range of environments, and their likely importance as anthropogenic forcings—notably LULC changes and greenhouse gas emissions—continues.

  11. Determination of main fruits in adulterated nectars by ATR-FTIR spectroscopy combined with multivariate calibration and variable selection methods.

    PubMed

    Miaw, Carolina Sheng Whei; Assis, Camila; Silva, Alessandro Rangel Carolino Sales; Cunha, Maria Luísa; Sena, Marcelo Martins; de Souza, Scheilla Vitorino Carvalho

    2018-07-15

    Grape, orange, peach and passion fruit nectars were formulated and adulterated by dilution with syrup, apple and cashew juices at 10 levels for each adulterant. Attenuated total reflectance Fourier transform mid infrared (ATR-FTIR) spectra were obtained. Partial least squares (PLS) multivariate calibration models allied to different variable selection methods, such as interval partial least squares (iPLS), ordered predictors selection (OPS) and genetic algorithm (GA), were used to quantify the main fruits. PLS improved by iPLS-OPS variable selection showed the highest predictive capacity to quantify the main fruit contents. The selected variables in the final models varied from 72 to 100; the root mean square errors of prediction were estimated from 0.5 to 2.6%; the correlation coefficients of prediction ranged from 0.948 to 0.990; and, the mean relative errors of prediction varied from 3.0 to 6.7%. All of the developed models were validated. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Cocaine Dependence Treatment Data: Methods for Measurement Error Problems With Predictors Derived From Stationary Stochastic Processes

    PubMed Central

    Guan, Yongtao; Li, Yehua; Sinha, Rajita

    2011-01-01

    In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854

  13. Sobol‧ sensitivity analysis of NAPL-contaminated aquifer remediation process based on multiple surrogates

    NASA Astrophysics Data System (ADS)

    Luo, Jiannan; Lu, Wenxi

    2014-06-01

    Sobol‧ sensitivity analyses based on different surrogates were performed on a trichloroethylene (TCE)-contaminated aquifer to assess the sensitivity of the design variables of remediation duration, surfactant concentration and injection rates at four wells to remediation efficiency First, the surrogate models of a multi-phase flow simulation model were constructed by applying radial basis function artificial neural network (RBFANN) and Kriging methods, and the two models were then compared. Based on the developed surrogate models, the Sobol‧ method was used to calculate the sensitivity indices of the design variables which affect the remediation efficiency. The coefficient of determination (R2) and the mean square error (MSE) of these two surrogate models demonstrated that both models had acceptable approximation accuracy, furthermore, the approximation accuracy of the Kriging model was slightly better than that of the RBFANN model. Sobol‧ sensitivity analysis results demonstrated that the remediation duration was the most important variable influencing remediation efficiency, followed by rates of injection at wells 1 and 3, while rates of injection at wells 2 and 4 and the surfactant concentration had negligible influence on remediation efficiency. In addition, high-order sensitivity indices were all smaller than 0.01, which indicates that interaction effects of these six factors were practically insignificant. The proposed Sobol‧ sensitivity analysis based on surrogate is an effective tool for calculating sensitivity indices, because it shows the relative contribution of the design variables (individuals and interactions) to the output performance variability with a limited number of runs of a computationally expensive simulation model. The sensitivity analysis results lay a foundation for the optimal groundwater remediation process optimization.

  14. LES/PDF studies of joint statistics of mixture fraction and progress variable in piloted methane jet flames with inhomogeneous inlet flows

    NASA Astrophysics Data System (ADS)

    Zhang, Pei; Barlow, Robert; Masri, Assaad; Wang, Haifeng

    2016-11-01

    The mixture fraction and progress variable are often used as independent variables for describing turbulent premixed and non-premixed flames. There is a growing interest in using these two variables for describing partially premixed flames. The joint statistical distribution of the mixture fraction and progress variable is of great interest in developing models for partially premixed flames. In this work, we conduct predictive studies of the joint statistics of mixture fraction and progress variable in a series of piloted methane jet flames with inhomogeneous inlet flows. The employed models combine large eddy simulations with the Monte Carlo probability density function (PDF) method. The joint PDFs and marginal PDFs are examined in detail by comparing the model predictions and the measurements. Different presumed shapes of the joint PDFs are also evaluated.

  15. Zipf’s Law Arises Naturally When There Are Underlying, Unobserved Variables

    PubMed Central

    Corradi, Nicola

    2016-01-01

    Zipf’s law, which states that the probability of an observation is inversely proportional to its rank, has been observed in many domains. While there are models that explain Zipf’s law in each of them, those explanations are typically domain specific. Recently, methods from statistical physics were used to show that a fairly broad class of models does provide a general explanation of Zipf’s law. This explanation rests on the observation that real world data is often generated from underlying causes, known as latent variables. Those latent variables mix together multiple models that do not obey Zipf’s law, giving a model that does. Here we extend that work both theoretically and empirically. Theoretically, we provide a far simpler and more intuitive explanation of Zipf’s law, which at the same time considerably extends the class of models to which this explanation can apply. Furthermore, we also give methods for verifying whether this explanation applies to a particular dataset. Empirically, these advances allowed us extend this explanation to important classes of data, including word frequencies (the first domain in which Zipf’s law was discovered), data with variable sequence length, and multi-neuron spiking activity. PMID:27997544

  16. Body Fat Percentage Prediction Using Intelligent Hybrid Approaches

    PubMed Central

    Shao, Yuehjen E.

    2014-01-01

    Excess of body fat often leads to obesity. Obesity is typically associated with serious medical diseases, such as cancer, heart disease, and diabetes. Accordingly, knowing the body fat is an extremely important issue since it affects everyone's health. Although there are several ways to measure the body fat percentage (BFP), the accurate methods are often associated with hassle and/or high costs. Traditional single-stage approaches may use certain body measurements or explanatory variables to predict the BFP. Diverging from existing approaches, this study proposes new intelligent hybrid approaches to obtain fewer explanatory variables, and the proposed forecasting models are able to effectively predict the BFP. The proposed hybrid models consist of multiple regression (MR), artificial neural network (ANN), multivariate adaptive regression splines (MARS), and support vector regression (SVR) techniques. The first stage of the modeling includes the use of MR and MARS to obtain fewer but more important sets of explanatory variables. In the second stage, the remaining important variables are served as inputs for the other forecasting methods. A real dataset was used to demonstrate the development of the proposed hybrid models. The prediction results revealed that the proposed hybrid schemes outperformed the typical, single-stage forecasting models. PMID:24723804

  17. Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany.

    PubMed

    Wang, Zhu; Ma, Shuangge; Wang, Ching-Yun

    2015-09-01

    In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open-source R package mpath. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Probabilities and statistics for backscatter estimates obtained by a scatterometer

    NASA Technical Reports Server (NTRS)

    Pierson, Willard J., Jr.

    1989-01-01

    Methods for the recovery of winds near the surface of the ocean from measurements of the normalized radar backscattering cross section must recognize and make use of the statistics (i.e., the sampling variability) of the backscatter measurements. Radar backscatter values from a scatterometer are random variables with expected values given by a model. A model relates backscatter to properties of the waves on the ocean, which are in turn generated by the winds in the atmospheric marine boundary layer. The effective wind speed and direction at a known height for a neutrally stratified atmosphere are the values to be recovered from the model. The probability density function for the backscatter values is a normal probability distribution with the notable feature that the variance is a known function of the expected value. The sources of signal variability, the effects of this variability on the wind speed estimation, and criteria for the acceptance or rejection of models are discussed. A modified maximum likelihood method for estimating wind vectors is described. Ways to make corrections for the kinds of errors found for the Seasat SASS model function are described, and applications to a new scatterometer are given.

  19. Study on the medical meteorological forecast of the number of hypertension inpatient based on SVR

    NASA Astrophysics Data System (ADS)

    Zhai, Guangyu; Chai, Guorong; Zhang, Haifeng

    2017-06-01

    The purpose of this study is to build a hypertension prediction model by discussing the meteorological factors for hypertension incidence. The research method is selecting the standard data of relative humidity, air temperature, visibility, wind speed and air pressure of Lanzhou from 2010 to 2012(calculating the maximum, minimum and average value with 5 days as a unit ) as the input variables of Support Vector Regression(SVR) and the standard data of hypertension incidence of the same period as the output dependent variables to obtain the optimal prediction parameters by cross validation algorithm, then by SVR algorithm learning and training, a SVR forecast model for hypertension incidence is built. The result shows that the hypertension prediction model is composed of 15 input independent variables, the training accuracy is 0.005, the final error is 0.0026389. The forecast accuracy based on SVR model is 97.1429%, which is higher than statistical forecast equation and neural network prediction method. It is concluded that SVR model provides a new method for hypertension prediction with its simple calculation, small error as well as higher historical sample fitting and Independent sample forecast capability.

  20. Implementations of geographically weighted lasso in spatial data with multicollinearity (Case study: Poverty modeling of Java Island)

    NASA Astrophysics Data System (ADS)

    Setiyorini, Anis; Suprijadi, Jadi; Handoko, Budhi

    2017-03-01

    Geographically Weighted Regression (GWR) is a regression model that takes into account the spatial heterogeneity effect. In the application of the GWR, inference on regression coefficients is often of interest, as is estimation and prediction of the response variable. Empirical research and studies have demonstrated that local correlation between explanatory variables can lead to estimated regression coefficients in GWR that are strongly correlated, a condition named multicollinearity. It later results on a large standard error on estimated regression coefficients, and, hence, problematic for inference on relationships between variables. Geographically Weighted Lasso (GWL) is a method which capable to deal with spatial heterogeneity and local multicollinearity in spatial data sets. GWL is a further development of GWR method, which adds a LASSO (Least Absolute Shrinkage and Selection Operator) constraint in parameter estimation. In this study, GWL will be applied by using fixed exponential kernel weights matrix to establish a poverty modeling of Java Island, Indonesia. The results of applying the GWL to poverty datasets show that this method stabilizes regression coefficients in the presence of multicollinearity and produces lower prediction and estimation error of the response variable than GWR does.

  1. CALCULATION OF NONLINEAR CONFIDENCE AND PREDICTION INTERVALS FOR GROUND-WATER FLOW MODELS.

    USGS Publications Warehouse

    Cooley, Richard L.; Vecchia, Aldo V.

    1987-01-01

    A method is derived to efficiently compute nonlinear confidence and prediction intervals on any function of parameters derived as output from a mathematical model of a physical system. The method is applied to the problem of obtaining confidence and prediction intervals for manually-calibrated ground-water flow models. To obtain confidence and prediction intervals resulting from uncertainties in parameters, the calibrated model and information on extreme ranges and ordering of the model parameters within one or more independent groups are required. If random errors in the dependent variable are present in addition to uncertainties in parameters, then calculation of prediction intervals also requires information on the extreme range of error expected. A simple Monte Carlo method is used to compute the quantiles necessary to establish probability levels for the confidence and prediction intervals. Application of the method to a hypothetical example showed that inclusion of random errors in the dependent variable in addition to uncertainties in parameters can considerably widen the prediction intervals.

  2. AIC identifies optimal representation of longitudinal dietary variables.

    PubMed

    VanBuren, John; Cavanaugh, Joseph; Marshall, Teresa; Warren, John; Levy, Steven M

    2017-09-01

    The Akaike Information Criterion (AIC) is a well-known tool for variable selection in multivariable modeling as well as a tool to help identify the optimal representation of explanatory variables. However, it has been discussed infrequently in the dental literature. The purpose of this paper is to demonstrate the use of AIC in determining the optimal representation of dietary variables in a longitudinal dental study. The Iowa Fluoride Study enrolled children at birth and dental examinations were conducted at ages 5, 9, 13, and 17. Decayed or filled surfaces (DFS) trend clusters were created based on age 13 DFS counts and age 13-17 DFS increments. Dietary intake data (water, milk, 100 percent-juice, and sugar sweetened beverages) were collected semiannually using a food frequency questionnaire. Multinomial logistic regression models were fit to predict DFS cluster membership (n=344). Multiple approaches could be used to represent the dietary data including averaging across all collected surveys or over different shorter time periods to capture age-specific trends or using the individual time points of dietary data. AIC helped identify the optimal representation. Averaging data for all four dietary variables for the whole period from age 9.0 to 17.0 provided a better representation in the multivariable full model (AIC=745.0) compared to other methods assessed in full models (AICs=750.6 for age 9 and 9-13 increment dietary measurements and AIC=762.3 for age 9, 13, and 17 individual measurements). The results illustrate that AIC can help researchers identify the optimal way to summarize information for inclusion in a statistical model. The method presented here can be used by researchers performing statistical modeling in dental research. This method provides an alternative approach for assessing the propriety of variable representation to significance-based procedures, which could potentially lead to improved research in the dental community. © 2017 American Association of Public Health Dentistry.

  3. Solvency supervision based on a total balance sheet approach

    NASA Astrophysics Data System (ADS)

    Pitselis, Georgios

    2009-11-01

    In this paper we investigate the adequacy of the own funds a company requires in order to remain healthy and avoid insolvency. Two methods are applied here; the quantile regression method and the method of mixed effects models. Quantile regression is capable of providing a more complete statistical analysis of the stochastic relationship among random variables than least squares estimation. The estimated mixed effects line can be considered as an internal industry equation (norm), which explains a systematic relation between a dependent variable (such as own funds) with independent variables (e.g. financial characteristics, such as assets, provisions, etc.). The above two methods are implemented with two data sets.

  4. Large scale air pollution estimation method combining land use regression and chemical transport modeling in a geostatistical framework.

    PubMed

    Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey

    2014-04-15

    In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.

  5. [Study on Application of NIR Spectral Information Screening in Identification of Maca Origin].

    PubMed

    Wang, Yuan-zhong; Zhao, Yan-li; Zhang, Ji; Jin, Hang

    2016-02-01

    Medicinal and edible plant Maca is rich in various nutrients and owns great medicinal value. Based on near infrared diffuse reflectance spectra, 139 Maca samples collected from Peru and Yunnan were used to identify their geographical origins. Multiplication signal correction (MSC) coupled with second derivative (SD) and Norris derivative filter (ND) was employed in spectral pretreatment. Spectrum range (7,500-4,061 cm⁻¹) was chosen by spectrum standard deviation. Combined with principal component analysis-mahalanobis distance (PCA-MD), the appropriate number of principal components was selected as 5. Based on the spectrum range and the number of principal components selected, two abnormal samples were eliminated by modular group iterative singular sample diagnosis method. Then, four methods were used to filter spectral variable information, competitive adaptive reweighted sampling (CARS), monte carlo-uninformative variable elimination (MC-UVE), genetic algorithm (GA) and subwindow permutation analysis (SPA). The spectral variable information filtered was evaluated by model population analysis (MPA). The results showed that RMSECV(SPA) > RMSECV(CARS) > RMSECV(MC-UVE) > RMSECV(GA), were 2. 14, 2. 05, 2. 02, and 1. 98, and the spectral variables were 250, 240, 250 and 70, respectively. According to the spectral variable filtered, partial least squares discriminant analysis (PLS-DA) was used to build the model, with random selection of 97 samples as training set, and the other 40 samples as validation set. The results showed that, R²: GA > MC-UVE > CARS > SPA, RMSEC and RMSEP: GA < MC-UVE < CARS

  6. Gaia DR2 documentation Chapter 7: Variability

    NASA Astrophysics Data System (ADS)

    Eyer, L.; Guy, L.; Distefano, E.; Clementini, G.; Mowlavi, N.; Rimoldini, L.; Roelens, M.; Audard, M.; Holl, B.; Lanzafame, A.; Lebzelter, T.; Lecoeur-Taïbi, I.; Molnár, L.; Ripepi, V.; Sarro, L.; Jevardat de Fombelle, G.; Nienartowicz, K.; De Ridder, J.; Juhász, Á.; Molinaro, R.; Plachy, E.; Regibo, S.

    2018-04-01

    This chapter of the Gaia DR2 documentation describes the models and methods used on the 22 months of data to produce the Gaia variable star results for Gaia DR2. The variability processing and analysis was based mostly on the calibrated G and integrated BP and RP photometry. The variability analysis approach to the Gaia data has been described in Eyer et al. (2017), and the Gaia DR2 results are presented in Holl et al. (2018). Detailed methods on specific topics will be published in a number of separate articles. Variability behaviour in the colour magnitude diagram is presented in Gaia Collaboration et al. (2018c).

  7. Delay correlation analysis and representation for vital complaint VHDL models

    DOEpatents

    Rich, Marvin J.; Misra, Ashutosh

    2004-11-09

    A method and system unbind a rise/fall tuple of a VHDL generic variable and create rise time and fall time generics of each generic variable that are independent of each other. Then, according to a predetermined correlation policy, the method and system collect delay values in a VHDL standard delay file, sort the delay values, remove duplicate delay values, group the delay values into correlation sets, and output an analysis file. The correlation policy may include collecting all generic variables in a VHDL standard delay file, selecting each generic variable, and performing reductions on the set of delay values associated with each selected generic variable.

  8. Modeling Predictors of Duties Not Including Flying Status.

    PubMed

    Tvaryanas, Anthony P; Griffith, Converse

    2018-01-01

    The purpose of this study was to reuse available datasets to conduct an analysis of potential predictors of U.S. Air Force aircrew nonavailability in terms of being in "duties not to include flying" (DNIF) status. This study was a retrospective cohort analysis of U.S. Air Force aircrew on active duty during the period from 2003-2012. Predictor variables included age, Air Force Specialty Code (AFSC), clinic location, diagnosis, gender, pay grade, and service component. The response variable was DNIF duration. Nonparametric methods were used for the exploratory analysis and parametric methods were used for model building and statistical inference. Out of a set of 783 potential predictor variables, 339 variables were identified from the nonparametric exploratory analysis for inclusion in the parametric analysis. Of these, 54 variables had significant associations with DNIF duration in the final model fitted to the validation data set. The predicted results of this model for DNIF duration had a correlation of 0.45 with the actual number of DNIF days. Predictor variables included age, 6 AFSCs, 7 clinic locations, and 40 primary diagnosis categories. Specific demographic (i.e., age), occupational (i.e., AFSC), and health (i.e., clinic location and primary diagnosis category) DNIF drivers were identified. Subsequent research should focus on the application of primary, secondary, and tertiary prevention measures to ameliorate the potential impact of these DNIF drivers where possible.Tvaryanas AP, Griffith C Jr. Modeling predictors of duties not including flying status. Aerosp Med Hum Perform. 2018; 89(1):52-57.

  9. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

    PubMed Central

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models. PMID:26890307

  10. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

    PubMed

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.

  11. Probability of Detection (POD) as a statistical model for the validation of qualitative methods.

    PubMed

    Wehling, Paul; LaBudde, Robert A; Brunelle, Sharon L; Nelson, Maria T

    2011-01-01

    A statistical model is presented for use in validation of qualitative methods. This model, termed Probability of Detection (POD), harmonizes the statistical concepts and parameters between quantitative and qualitative method validation. POD characterizes method response with respect to concentration as a continuous variable. The POD model provides a tool for graphical representation of response curves for qualitative methods. In addition, the model allows comparisons between candidate and reference methods, and provides calculations of repeatability, reproducibility, and laboratory effects from collaborative study data. Single laboratory study and collaborative study examples are given.

  12. Sparse modeling of spatial environmental variables associated with asthma

    PubMed Central

    Chang, Timothy S.; Gangnon, Ronald E.; Page, C. David; Buckingham, William R.; Tandias, Aman; Cowan, Kelly J.; Tomasallo, Carrie D.; Arndt, Brian G.; Hanrahan, Lawrence P.; Guilbert, Theresa W.

    2014-01-01

    Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin’s Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5–50 years over a three-year period. Each patient’s home address was geocoded to one of 3,456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin’s geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. PMID:25533437

  13. Sparse modeling of spatial environmental variables associated with asthma.

    PubMed

    Chang, Timothy S; Gangnon, Ronald E; David Page, C; Buckingham, William R; Tandias, Aman; Cowan, Kelly J; Tomasallo, Carrie D; Arndt, Brian G; Hanrahan, Lawrence P; Guilbert, Theresa W

    2015-02-01

    Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin's Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5-50years over a three-year period. Each patient's home address was geocoded to one of 3456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin's geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. AN IMPROVED STRATEGY FOR REGRESSION OF BIOPHYSICAL VARIABLES AND LANDSAT ETM+ DATA. (R828309)

    EPA Science Inventory

    Empirical models are important tools for relating field-measured biophysical variables to remote sensing data. Regression analysis has been a popular empirical method of linking these two types of data to provide continuous estimates for variables such as biomass, percent wood...

  15. Predictive Models for Escherichia coli Concentrations at Inland Lake Beaches and Relationship of Model Variables to Pathogen Detection

    PubMed Central

    Stelzer, Erin A.; Duris, Joseph W.; Brady, Amie M. G.; Harrison, John H.; Johnson, Heather E.; Ware, Michael W.

    2013-01-01

    Predictive models, based on environmental and water quality variables, have been used to improve the timeliness and accuracy of recreational water quality assessments, but their effectiveness has not been studied in inland waters. Sampling at eight inland recreational lakes in Ohio was done in order to investigate using predictive models for Escherichia coli and to understand the links between E. coli concentrations, predictive variables, and pathogens. Based upon results from 21 beach sites, models were developed for 13 sites, and the most predictive variables were rainfall, wind direction and speed, turbidity, and water temperature. Models were not developed at sites where the E. coli standard was seldom exceeded. Models were validated at nine sites during an independent year. At three sites, the model resulted in increased correct responses, sensitivities, and specificities compared to use of the previous day's E. coli concentration (the current method). Drought conditions during the validation year precluded being able to adequately assess model performance at most of the other sites. Cryptosporidium, adenovirus, eaeA (E. coli), ipaH (Shigella), and spvC (Salmonella) were found in at least 20% of samples collected for pathogens at five sites. The presence or absence of the three bacterial genes was related to some of the model variables but was not consistently related to E. coli concentrations. Predictive models were not effective at all inland lake sites; however, their use at two lakes with high swimmer densities will provide better estimates of public health risk than current methods and will be a valuable resource for beach managers and the public. PMID:23291550

  16. TIMSS 2011 Student and Teacher Predictors for Mathematics Achievement Explored and Identified via Elastic Net.

    PubMed

    Yoo, Jin Eun

    2018-01-01

    A substantial body of research has been conducted on variables relating to students' mathematics achievement with TIMSS. However, most studies have employed conventional statistical methods, and have focused on selected few indicators instead of utilizing hundreds of variables TIMSS provides. This study aimed to find a prediction model for students' mathematics achievement using as many TIMSS student and teacher variables as possible. Elastic net, the selected machine learning technique in this study, takes advantage of both LASSO and ridge in terms of variable selection and multicollinearity, respectively. A logistic regression model was also employed to predict TIMSS 2011 Korean 4th graders' mathematics achievement. Ten-fold cross-validation with mean squared error was employed to determine the elastic net regularization parameter. Among 162 TIMSS variables explored, 12 student and 5 teacher variables were selected in the elastic net model, and the prediction accuracy, sensitivity, and specificity were 76.06, 70.23, and 80.34%, respectively. This study showed that the elastic net method can be successfully applied to educational large-scale data by selecting a subset of variables with reasonable prediction accuracy and finding new variables to predict students' mathematics achievement. Newly found variables via machine learning can shed light on the existing theories from a totally different perspective, which in turn propagates creation of a new theory or complement of existing ones. This study also examined the current scale development convention from a machine learning perspective.

  17. TIMSS 2011 Student and Teacher Predictors for Mathematics Achievement Explored and Identified via Elastic Net

    PubMed Central

    Yoo, Jin Eun

    2018-01-01

    A substantial body of research has been conducted on variables relating to students' mathematics achievement with TIMSS. However, most studies have employed conventional statistical methods, and have focused on selected few indicators instead of utilizing hundreds of variables TIMSS provides. This study aimed to find a prediction model for students' mathematics achievement using as many TIMSS student and teacher variables as possible. Elastic net, the selected machine learning technique in this study, takes advantage of both LASSO and ridge in terms of variable selection and multicollinearity, respectively. A logistic regression model was also employed to predict TIMSS 2011 Korean 4th graders' mathematics achievement. Ten-fold cross-validation with mean squared error was employed to determine the elastic net regularization parameter. Among 162 TIMSS variables explored, 12 student and 5 teacher variables were selected in the elastic net model, and the prediction accuracy, sensitivity, and specificity were 76.06, 70.23, and 80.34%, respectively. This study showed that the elastic net method can be successfully applied to educational large-scale data by selecting a subset of variables with reasonable prediction accuracy and finding new variables to predict students' mathematics achievement. Newly found variables via machine learning can shed light on the existing theories from a totally different perspective, which in turn propagates creation of a new theory or complement of existing ones. This study also examined the current scale development convention from a machine learning perspective. PMID:29599736

  18. Evaluation of Normalization Methods to Pave the Way Towards Large-Scale LC-MS-Based Metabolomics Profiling Experiments

    PubMed Central

    Valkenborg, Dirk; Baggerman, Geert; Vanaerschot, Manu; Witters, Erwin; Dujardin, Jean-Claude; Burzykowski, Tomasz; Berg, Maya

    2013-01-01

    Abstract Combining liquid chromatography-mass spectrometry (LC-MS)-based metabolomics experiments that were collected over a long period of time remains problematic due to systematic variability between LC-MS measurements. Until now, most normalization methods for LC-MS data are model-driven, based on internal standards or intermediate quality control runs, where an external model is extrapolated to the dataset of interest. In the first part of this article, we evaluate several existing data-driven normalization approaches on LC-MS metabolomics experiments, which do not require the use of internal standards. According to variability measures, each normalization method performs relatively well, showing that the use of any normalization method will greatly improve data-analysis originating from multiple experimental runs. In the second part, we apply cyclic-Loess normalization to a Leishmania sample. This normalization method allows the removal of systematic variability between two measurement blocks over time and maintains the differential metabolites. In conclusion, normalization allows for pooling datasets from different measurement blocks over time and increases the statistical power of the analysis, hence paving the way to increase the scale of LC-MS metabolomics experiments. From our investigation, we recommend data-driven normalization methods over model-driven normalization methods, if only a few internal standards were used. Moreover, data-driven normalization methods are the best option to normalize datasets from untargeted LC-MS experiments. PMID:23808607

  19. Development and evaluation of height diameter at breast models for native Chinese Metasequoia.

    PubMed

    Liu, Mu; Feng, Zhongke; Zhang, Zhixiang; Ma, Chenghui; Wang, Mingming; Lian, Bo-Ling; Sun, Renjie; Zhang, Li

    2017-01-01

    Accurate tree height and diameter at breast height (dbh) are important input variables for growth and yield models. A total of 5503 Chinese Metasequoia trees were used in this study. We studied 53 fitted models, of which 7 were linear models and 46 were non-linear models. These models were divided into two groups of single models and multivariate models according to the number of independent variables. The results show that the allometry equation of tree height which has diameter at breast height as independent variable can better reflect the change of tree height; in addition the prediction accuracy of the multivariate composite models is higher than that of the single variable models. Although tree age is not the most important variable in the study of the relationship between tree height and dbh, the consideration of tree age when choosing models and parameters in model selection can make the prediction of tree height more accurate. The amount of data is also an important parameter what can improve the reliability of models. Other variables such as tree height, main dbh and altitude, etc can also affect models. In this study, the method of developing the recommended models for predicting the tree height of native Metasequoias aged 50-485 years is statistically reliable and can be used for reference in predicting the growth and production of mature native Metasequoia.

  20. Development and evaluation of height diameter at breast models for native Chinese Metasequoia

    PubMed Central

    Feng, Zhongke; Zhang, Zhixiang; Ma, Chenghui; Wang, Mingming; Lian, Bo-ling; Sun, Renjie; Zhang, Li

    2017-01-01

    Accurate tree height and diameter at breast height (dbh) are important input variables for growth and yield models. A total of 5503 Chinese Metasequoia trees were used in this study. We studied 53 fitted models, of which 7 were linear models and 46 were non-linear models. These models were divided into two groups of single models and multivariate models according to the number of independent variables. The results show that the allometry equation of tree height which has diameter at breast height as independent variable can better reflect the change of tree height; in addition the prediction accuracy of the multivariate composite models is higher than that of the single variable models. Although tree age is not the most important variable in the study of the relationship between tree height and dbh, the consideration of tree age when choosing models and parameters in model selection can make the prediction of tree height more accurate. The amount of data is also an important parameter what can improve the reliability of models. Other variables such as tree height, main dbh and altitude, etc can also affect models. In this study, the method of developing the recommended models for predicting the tree height of native Metasequoias aged 50–485 years is statistically reliable and can be used for reference in predicting the growth and production of mature native Metasequoia. PMID:28817600

  1. Model reduction method using variable-separation for stochastic saddle point problems

    NASA Astrophysics Data System (ADS)

    Jiang, Lijian; Li, Qiuqi

    2018-02-01

    In this paper, we consider a variable-separation (VS) method to solve the stochastic saddle point (SSP) problems. The VS method is applied to obtain the solution in tensor product structure for stochastic partial differential equations (SPDEs) in a mixed formulation. The aim of such a technique is to construct a reduced basis approximation of the solution of the SSP problems. The VS method attempts to get a low rank separated representation of the solution for SSP in a systematic enrichment manner. No iteration is performed at each enrichment step. In order to satisfy the inf-sup condition in the mixed formulation, we enrich the separated terms for the primal system variable at each enrichment step. For the SSP problems by regularization or penalty, we propose a more efficient variable-separation (VS) method, i.e., the variable-separation by penalty method. This can avoid further enrichment of the separated terms in the original mixed formulation. The computation of the variable-separation method decomposes into offline phase and online phase. Sparse low rank tensor approximation method is used to significantly improve the online computation efficiency when the number of separated terms is large. For the applications of SSP problems, we present three numerical examples to illustrate the performance of the proposed methods.

  2. Path Analysis and Residual Plotting as Methods of Environmental Scanning in Higher Education: An Illustration with Applications and Enrollments.

    ERIC Educational Resources Information Center

    Morcol, Goktug; McLaughlin, Gerald W.

    1990-01-01

    The study proposes using path analysis and residual plotting as methods supporting environmental scanning in strategic planning for higher education institutions. Path models of three levels of independent variables are developed. Dependent variables measuring applications and enrollments at Virginia Polytechnic Institute and State University are…

  3. The magnitude of variability produced by methods used to estimate annual stormwater contaminant loads for highly urbanised catchments.

    PubMed

    Beck, H J; Birch, G F

    2013-06-01

    Stormwater contaminant loading estimates using event mean concentration (EMC), rainfall/runoff relationship calculations and computer modelling (Model of Urban Stormwater Infrastructure Conceptualisation--MUSIC) demonstrated high variability in common methods of water quality assessment. Predictions of metal, nutrient and total suspended solid loadings for three highly urbanised catchments in Sydney estuary, Australia, varied greatly within and amongst methods tested. EMC and rainfall/runoff relationship calculations produced similar estimates (within 1 SD) in a statistically significant number of trials; however, considerable variability within estimates (∼50 and ∼25 % relative standard deviation, respectively) questions the reliability of these methods. Likewise, upper and lower default inputs in a commonly used loading model (MUSIC) produced an extensive range of loading estimates (3.8-8.3 times above and 2.6-4.1 times below typical default inputs, respectively). Default and calibrated MUSIC simulations produced loading estimates that agreed with EMC and rainfall/runoff calculations in some trials (4-10 from 18); however, they were not frequent enough to statistically infer that these methods produced the same results. Great variance within and amongst mean annual loads estimated by common methods of water quality assessment has important ramifications for water quality managers requiring accurate estimates of the quantities and nature of contaminants requiring treatment.

  4. Research on ionospheric tomography based on variable pixel height

    NASA Astrophysics Data System (ADS)

    Zheng, Dunyong; Li, Peiqing; He, Jie; Hu, Wusheng; Li, Chaokui

    2016-05-01

    A novel ionospheric tomography technique based on variable pixel height was developed for the tomographic reconstruction of the ionospheric electron density distribution. The method considers the height of each pixel as an unknown variable, which is retrieved during the inversion process together with the electron density values. In contrast to conventional computerized ionospheric tomography (CIT), which parameterizes the model with a fixed pixel height, the variable-pixel-height computerized ionospheric tomography (VHCIT) model applies a disturbance to the height of each pixel. In comparison with conventional CIT models, the VHCIT technique achieved superior results in a numerical simulation. A careful validation of the reliability and superiority of VHCIT was performed. According to the results of the statistical analysis of the average root mean square errors, the proposed model offers an improvement by 15% compared with conventional CIT models.

  5. New approach to probability estimate of femoral neck fracture by fall (Slovak regression model).

    PubMed

    Wendlova, J

    2009-01-01

    3,216 Slovak women with primary or secondary osteoporosis or osteopenia, aged 20-89 years, were examined with the bone densitometer DXA (dual energy X-ray absorptiometry, GE, Prodigy - Primo), x = 58.9, 95% C.I. (58.42; 59.38). The values of the following variables for each patient were measured: FSI (femur strength index), T-score total hip left, alpha angle - left, theta angle - left, HAL (hip axis length) left, BMI (body mass index) was calculated from the height and weight of the patients. Regression model determined the following order of independent variables according to the intensity of their influence upon the occurrence of values of dependent FSI variable: 1. BMI, 2. theta angle, 3. T-score total hip, 4. alpha angle, 5. HAL. The regression model equation, calculated from the variables monitored in the study, enables a doctor in praxis to determine the probability magnitude (absolute risk) for the occurrence of pathological value of FSI (FSI < 1) in the femoral neck area, i. e., allows for probability estimate of a femoral neck fracture by fall for Slovak women. 1. The Slovak regression model differs from regression models, published until now, in chosen independent variables and a dependent variable, belonging to biomechanical variables, characterising the bone quality. 2. The Slovak regression model excludes the inaccuracies of other models, which are not able to define precisely the current and past clinical condition of tested patients (e.g., to define the length and dose of exposure to risk factors). 3. The Slovak regression model opens the way to a new method of estimating the probability (absolute risk) or the odds for a femoral neck fracture by fall, based upon the bone quality determination. 4. It is assumed that the development will proceed by improving the methods enabling to measure the bone quality, determining the probability of fracture by fall (Tab. 6, Fig. 3, Ref. 22). Full Text (Free, PDF) www.bmj.sk.

  6. New Methods for Estimating Seasonal Potential Climate Predictability

    NASA Astrophysics Data System (ADS)

    Feng, Xia

    This study develops two new statistical approaches to assess the seasonal potential predictability of the observed climate variables. One is the univariate analysis of covariance (ANOCOVA) model, a combination of autoregressive (AR) model and analysis of variance (ANOVA). It has the advantage of taking into account the uncertainty of the estimated parameter due to sampling errors in statistical test, which is often neglected in AR based methods, and accounting for daily autocorrelation that is not considered in traditional ANOVA. In the ANOCOVA model, the seasonal signals arising from external forcing are determined to be identical or not to assess any interannual variability that may exist is potentially predictable. The bootstrap is an attractive alternative method that requires no hypothesis model and is available no matter how mathematically complicated the parameter estimator. This method builds up the empirical distribution of the interannual variance from the resamplings drawn with replacement from the given sample, in which the only predictability in seasonal means arises from the weather noise. These two methods are applied to temperature and water cycle components including precipitation and evaporation, to measure the extent to which the interannual variance of seasonal means exceeds the unpredictable weather noise compared with the previous methods, including Leith-Shukla-Gutzler (LSG), Madden, and Katz. The potential predictability of temperature from ANOCOVA model, bootstrap, LSG and Madden exhibits a pronounced tropical-extratropical contrast with much larger predictability in the tropics dominated by El Nino/Southern Oscillation (ENSO) than in higher latitudes where strong internal variability lowers predictability. Bootstrap tends to display highest predictability of the four methods, ANOCOVA lies in the middle, while LSG and Madden appear to generate lower predictability. Seasonal precipitation from ANOCOVA, bootstrap, and Katz, resembling that for temperature, is more predictable over the tropical regions, and less predictable in extropics. Bootstrap and ANOCOVA are in good agreement with each other, both methods generating larger predictability than Katz. The seasonal predictability of evaporation over land bears considerably similarity with that of temperature using ANOCOVA, bootstrap, LSG and Madden. The remote SST forcing and soil moisture reveal substantial seasonality in their relations with the potentially predictable seasonal signals. For selected regions, either SST or soil moisture or both shows significant relationships with predictable signals, hence providing indirect insight on slowly varying boundary processes involved to enable useful seasonal climate predication. A multivariate analysis of covariance (MANOCOVA) model is established to identify distinctive predictable patterns, which are uncorrelated with each other. Generally speaking, the seasonal predictability from multivariate model is consistent with that from ANOCOVA. Besides unveiling the spatial variability of predictability, MANOCOVA model also reveals the temporal variability of each predictable pattern, which could be linked to the periodic oscillations.

  7. A qualitative comparison of fire spread models incorporating wind and slope effects

    Treesearch

    David R. Weise; Gregory S. Biging

    1997-01-01

    Wind velocity and slope are two critical variables that affect wildland fire rate of spread. The effects of these variables on rate of spread are often combined in rate-of-spread models using vector addition. The various methods used to combine wind and slope effects have seldom been validated or compared due to differences in the models or to lack of data. In this...

  8. Modeling continuous covariates with a "spike" at zero: Bivariate approaches.

    PubMed

    Jenkner, Carolin; Lorenz, Eva; Becher, Heiko; Sauerbrei, Willi

    2016-07-01

    In epidemiology and clinical research, predictors often take value zero for a large amount of observations while the distribution of the remaining observations is continuous. These predictors are called variables with a spike at zero. Examples include smoking or alcohol consumption. Recently, an extension of the fractional polynomial (FP) procedure, a technique for modeling nonlinear relationships, was proposed to deal with such situations. To indicate whether or not a value is zero, a binary variable is added to the model. In a two stage procedure, called FP-spike, the necessity of the binary variable and/or the continuous FP function for the positive part are assessed for a suitable fit. In univariate analyses, the FP-spike procedure usually leads to functional relationships that are easy to interpret. This paper introduces four approaches for dealing with two variables with a spike at zero (SAZ). The methods depend on the bivariate distribution of zero and nonzero values. Bi-Sep is the simplest of the four bivariate approaches. It uses the univariate FP-spike procedure separately for the two SAZ variables. In Bi-D3, Bi-D1, and Bi-Sub, proportions of zeros in both variables are considered simultaneously in the binary indicators. Therefore, these strategies can account for correlated variables. The methods can be used for arbitrary distributions of the covariates. For illustration and comparison of results, data from a case-control study on laryngeal cancer, with smoking and alcohol intake as two SAZ variables, is considered. In addition, a possible extension to three or more SAZ variables is outlined. A combination of log-linear models for the analysis of the correlation in combination with the bivariate approaches is proposed. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Analysis of host response to bacterial infection using error model based gene expression microarray experiments

    PubMed Central

    Stekel, Dov J.; Sarti, Donatella; Trevino, Victor; Zhang, Lihong; Salmon, Mike; Buckley, Chris D.; Stevens, Mark; Pallen, Mark J.; Penn, Charles; Falciani, Francesco

    2005-01-01

    A key step in the analysis of microarray data is the selection of genes that are differentially expressed. Ideally, such experiments should be properly replicated in order to infer both technical and biological variability, and the data should be subjected to rigorous hypothesis tests to identify the differentially expressed genes. However, in microarray experiments involving the analysis of very large numbers of biological samples, replication is not always practical. Therefore, there is a need for a method to select differentially expressed genes in a rational way from insufficiently replicated data. In this paper, we describe a simple method that uses bootstrapping to generate an error model from a replicated pilot study that can be used to identify differentially expressed genes in subsequent large-scale studies on the same platform, but in which there may be no replicated arrays. The method builds a stratified error model that includes array-to-array variability, feature-to-feature variability and the dependence of error on signal intensity. We apply this model to the characterization of the host response in a model of bacterial infection of human intestinal epithelial cells. We demonstrate the effectiveness of error model based microarray experiments and propose this as a general strategy for a microarray-based screening of large collections of biological samples. PMID:15800204

  10. Comparing hierarchical models via the marginalized deviance information criterion.

    PubMed

    Quintero, Adrian; Lesaffre, Emmanuel

    2018-07-20

    Hierarchical models are extensively used in pharmacokinetics and longitudinal studies. When the estimation is performed from a Bayesian approach, model comparison is often based on the deviance information criterion (DIC). In hierarchical models with latent variables, there are several versions of this statistic: the conditional DIC (cDIC) that incorporates the latent variables in the focus of the analysis and the marginalized DIC (mDIC) that integrates them out. Regardless of the asymptotic and coherency difficulties of cDIC, this alternative is usually used in Markov chain Monte Carlo (MCMC) methods for hierarchical models because of practical convenience. The mDIC criterion is more appropriate in most cases but requires integration of the likelihood, which is computationally demanding and not implemented in Bayesian software. Therefore, we consider a method to compute mDIC by generating replicate samples of the latent variables that need to be integrated out. This alternative can be easily conducted from the MCMC output of Bayesian packages and is widely applicable to hierarchical models in general. Additionally, we propose some approximations in order to reduce the computational complexity for large-sample situations. The method is illustrated with simulated data sets and 2 medical studies, evidencing that cDIC may be misleading whilst mDIC appears pertinent. Copyright © 2018 John Wiley & Sons, Ltd.

  11. Multivariate analysis in thoracic research.

    PubMed

    Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego

    2015-03-01

    Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.

  12. The Benefits of Latent Variable Modeling to Develop Norms for a Translated Version of a Standardized Scale

    ERIC Educational Resources Information Center

    Seo, Hyojeong; Shaw, Leslie A.; Shogren, Karrie A.; Lang, Kyle M.; Little, Todd D.

    2017-01-01

    This article demonstrates the use of structural equation modeling to develop norms for a translated version of a standardized scale, the Supports Intensity Scale-Children's Version (SIS-C). The latent variable norming method proposed is useful when the standardization sample for a translated version is relatively small to derive norms…

  13. Interrater Agreement Evaluation: A Latent Variable Modeling Approach

    ERIC Educational Resources Information Center

    Raykov, Tenko; Dimitrov, Dimiter M.; von Eye, Alexander; Marcoulides, George A.

    2013-01-01

    A latent variable modeling method for evaluation of interrater agreement is outlined. The procedure is useful for point and interval estimation of the degree of agreement among a given set of judges evaluating a group of targets. In addition, the approach allows one to test for identity in underlying thresholds across raters as well as to identify…

  14. Temporal Patterns of Variable Relationships in Person-Oriented Research: Longitudinal Models of Configural Frequency Analysis

    ERIC Educational Resources Information Center

    von Eye, Alexander; Mun, Eun Young; Bogat, G. Anne

    2008-01-01

    This article reviews the premises of configural frequency analysis (CFA), including methods of choosing significance tests and base models, as well as protecting [alpha], and discusses why CFA is a useful approach when conducting longitudinal person-oriented research. CFA operates at the manifest variable level. Longitudinal CFA seeks to identify…

  15. Application of Influence Diagrams in Identifying Soviet Satellite Missions

    DTIC Science & Technology

    1990-12-01

    Probabilities Comparison ......................... 58 35. Continuous Model Variables ............................ 59 36. Sample Inclination Data...diagramming is a method which allows the simple construction of a model to illustrate the interrelationships which exist among variables by capturing an...environmental monitoring systems. The module also contained an array of instruments for geophysical and astrophysical experimentation . 4.3.14.3 Soyuz. The Soyuz

  16. Classical Item Analysis Using Latent Variable Modeling: A Note on a Direct Evaluation Procedure

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2011-01-01

    A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits…

  17. Sparse High Dimensional Models in Economics

    PubMed Central

    Fan, Jianqing; Lv, Jinchi; Qi, Lei

    2010-01-01

    This paper reviews the literature on sparse high dimensional models and discusses some applications in economics and finance. Recent developments of theory, methods, and implementations in penalized least squares and penalized likelihood methods are highlighted. These variable selection methods are proved to be effective in high dimensional sparse modeling. The limits of dimensionality that regularization methods can handle, the role of penalty functions, and their statistical properties are detailed. Some recent advances in ultra-high dimensional sparse modeling are also briefly discussed. PMID:22022635

  18. Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization

    NASA Astrophysics Data System (ADS)

    Lee, Kyungbook; Song, Seok Goo

    2017-09-01

    Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events ( M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.

  19. Modification of the Integrated Sasang Constitutional Diagnostic Model

    PubMed Central

    Nam, Jiho

    2017-01-01

    In 2012, the Korea Institute of Oriental Medicine proposed an objective and comprehensive physical diagnostic model to address quantification problems in the existing Sasang constitutional diagnostic method. However, certain issues have been raised regarding a revision of the proposed diagnostic model. In this paper, we propose various methodological approaches to address the problems of the previous diagnostic model. Firstly, more useful variables are selected in each component. Secondly, the least absolute shrinkage and selection operator is used to reduce multicollinearity without the modification of explanatory variables. Thirdly, proportions of SC types and age are considered to construct individual diagnostic models and classify the training set and the test set for reflecting the characteristics of the entire dataset. Finally, an integrated model is constructed with explanatory variables of individual diagnosis models. The proposed integrated diagnostic model significantly improves the sensitivities for both the male SY type (36.4% → 62.0%) and the female SE type (43.7% → 64.5%), which were areas of limitation of the previous integrated diagnostic model. The ideas of these new algorithms are expected to contribute not only to the scientific development of Sasang constitutional medicine in Korea but also to that of other diagnostic methods for traditional medicine. PMID:29317897

  20. Coarse analysis of collective behaviors: Bifurcation analysis of the optimal velocity model for traffic jam formation

    NASA Astrophysics Data System (ADS)

    Miura, Yasunari; Sugiyama, Yuki

    2017-12-01

    We present a general method for analyzing macroscopic collective phenomena observed in many-body systems. For this purpose, we employ diffusion maps, which are one of the dimensionality-reduction techniques, and systematically define a few relevant coarse-grained variables for describing macroscopic phenomena. The time evolution of macroscopic behavior is described as a trajectory in the low-dimensional space constructed by these coarse variables. We apply this method to the analysis of the traffic model, called the optimal velocity model, and reveal a bifurcation structure, which features a transition to the emergence of a moving cluster as a traffic jam.

  1. Detecting multiple outliers in linear functional relationship model for circular variables using clustering technique

    NASA Astrophysics Data System (ADS)

    Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor

    2017-05-01

    Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.

  2. Upgrades to the REA method for producing probabilistic climate change projections

    NASA Astrophysics Data System (ADS)

    Xu, Ying; Gao, Xuejie; Giorgi, Filippo

    2010-05-01

    We present an augmented version of the Reliability Ensemble Averaging (REA) method designed to generate probabilistic climate change information from ensembles of climate model simulations. Compared to the original version, the augmented one includes consideration of multiple variables and statistics in the calculation of the performance-based weights. In addition, the model convergence criterion previously employed is removed. The method is applied to the calculation of changes in mean and variability for temperature and precipitation over different sub-regions of East Asia based on the recently completed CMIP3 multi-model ensemble. Comparison of the new and old REA methods, along with the simple averaging procedure, and the use of different combinations of performance metrics shows that at fine sub-regional scales the choice of weighting is relevant. This is mostly because the models show a substantial spread in performance for the simulation of precipitation statistics, a result that supports the use of model weighting as a useful option to account for wide ranges of quality of models. The REA method, and in particular the upgraded one, provides a simple and flexible framework for assessing the uncertainty related to the aggregation of results from ensembles of models in order to produce climate change information at the regional scale. KEY WORDS: REA method, Climate change, CMIP3

  3. Beyond the SCS curve number: A new stochastic spatial runoff approach

    NASA Astrophysics Data System (ADS)

    Bartlett, M. S., Jr.; Parolari, A.; McDonnell, J.; Porporato, A. M.

    2015-12-01

    The Soil Conservation Service curve number (SCS-CN) method is the standard approach in practice for predicting a storm event runoff response. It is popular because its low parametric complexity and ease of use. However, the SCS-CN method does not describe the spatial variability of runoff and is restricted to certain geographic regions and land use types. Here we present a general theory for extending the SCS-CN method. Our new theory accommodates different event based models derived from alternative rainfall-runoff mechanisms or distributions of watershed variables, which are the basis of different semi-distributed models such as VIC, PDM, and TOPMODEL. We introduce a parsimonious but flexible description where runoff is initiated by a pure threshold, i.e., saturation excess, that is complemented by fill and spill runoff behavior from areas of partial saturation. To facilitate event based runoff prediction, we derive simple equations for the fraction of the runoff source areas, the probability density function (PDF) describing runoff variability, and the corresponding average runoff value (a runoff curve analogous to the SCS-CN). The benefit of the theory is that it unites the SCS-CN method, VIC, PDM, and TOPMODEL as the same model type but with different assumptions for the spatial distribution of variables and the runoff mechanism. The new multiple runoff mechanism description for the SCS-CN enables runoff prediction in geographic regions and site runoff types previously misrepresented by the traditional SCS-CN method. In addition, we show that the VIC, PDM, and TOPMODEL runoff curves may be more suitable than the SCS-CN for different conditions. Lastly, we explore predictions of sediment and nutrient transport by applying the PDF describing runoff variability within our new framework.

  4. Power Analysis for Complex Mediational Designs Using Monte Carlo Methods

    ERIC Educational Resources Information Center

    Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.

    2010-01-01

    Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex…

  5. Accuracy and precision of polyurethane dental arch models fabricated using a three-dimensional subtractive rapid prototyping method with an intraoral scanning technique.

    PubMed

    Kim, Jae-Hong; Kim, Ki-Baek; Kim, Woong-Chul; Kim, Ji-Hwan; Kim, Hae-Young

    2014-03-01

    This study aimed to evaluate the accuracy and precision of polyurethane (PUT) dental arch models fabricated using a three-dimensional (3D) subtractive rapid prototyping (RP) method with an intraoral scanning technique by comparing linear measurements obtained from PUT models and conventional plaster models. Ten plaster models were duplicated using a selected standard master model and conventional impression, and 10 PUT models were duplicated using the 3D subtractive RP technique with an oral scanner. Six linear measurements were evaluated in terms of x, y, and z-axes using a non-contact white light scanner. Accuracy was assessed using mean differences between two measurements, and precision was examined using four quantitative methods and the Bland-Altman graphical method. Repeatability was evaluated in terms of intra-examiner variability, and reproducibility was assessed in terms of inter-examiner and inter-method variability. The mean difference between plaster models and PUT models ranged from 0.07 mm to 0.33 mm. Relative measurement errors ranged from 2.2% to 7.6% and intraclass correlation coefficients ranged from 0.93 to 0.96, when comparing plaster models and PUT models. The Bland-Altman plot showed good agreement. The accuracy and precision of PUT dental models for evaluating the performance of oral scanner and subtractive RP technology was acceptable. Because of the recent improvements in block material and computerized numeric control milling machines, the subtractive RP method may be a good choice for dental arch models.

  6. Force control compensation method with variable load stiffness and damping of the hydraulic drive unit force control system

    NASA Astrophysics Data System (ADS)

    Kong, Xiangdong; Ba, Kaixian; Yu, Bin; Cao, Yuan; Zhu, Qixin; Zhao, Hualong

    2016-05-01

    Each joint of hydraulic drive quadruped robot is driven by the hydraulic drive unit (HDU), and the contacting between the robot foot end and the ground is complex and variable, which increases the difficulty of force control inevitably. In the recent years, although many scholars researched some control methods such as disturbance rejection control, parameter self-adaptive control, impedance control and so on, to improve the force control performance of HDU, the robustness of the force control still needs improving. Therefore, how to simulate the complex and variable load characteristics of the environment structure and how to ensure HDU having excellent force control performance with the complex and variable load characteristics are key issues to be solved in this paper. The force control system mathematic model of HDU is established by the mechanism modeling method, and the theoretical models of a novel force control compensation method and a load characteristics simulation method under different environment structures are derived, considering the dynamic characteristics of the load stiffness and the load damping under different environment structures. Then, simulation effects of the variable load stiffness and load damping under the step and sinusoidal load force are analyzed experimentally on the HDU force control performance test platform, which provides the foundation for the force control compensation experiment research. In addition, the optimized PID control parameters are designed to make the HDU have better force control performance with suitable load stiffness and load damping, under which the force control compensation method is introduced, and the robustness of the force control system with several constant load characteristics and the variable load characteristics respectively are comparatively analyzed by experiment. The research results indicate that if the load characteristics are known, the force control compensation method presented in this paper has positive compensation effects on the load characteristics variation, i.e., this method decreases the effects of the load characteristics variation on the force control performance and enhances the force control system robustness with the constant PID parameters, thereby, the online PID parameters tuning control method which is complex needs not be adopted. All the above research provides theoretical and experimental foundation for the force control method of the quadruped robot joints with high robustness.

  7. Attributing uncertainty in streamflow simulations due to variable inputs via the Quantile Flow Deviation metric

    NASA Astrophysics Data System (ADS)

    Shoaib, Syed Abu; Marshall, Lucy; Sharma, Ashish

    2018-06-01

    Every model to characterise a real world process is affected by uncertainty. Selecting a suitable model is a vital aspect of engineering planning and design. Observation or input errors make the prediction of modelled responses more uncertain. By way of a recently developed attribution metric, this study is aimed at developing a method for analysing variability in model inputs together with model structure variability to quantify their relative contributions in typical hydrological modelling applications. The Quantile Flow Deviation (QFD) metric is used to assess these alternate sources of uncertainty. The Australian Water Availability Project (AWAP) precipitation data for four different Australian catchments is used to analyse the impact of spatial rainfall variability on simulated streamflow variability via the QFD. The QFD metric attributes the variability in flow ensembles to uncertainty associated with the selection of a model structure and input time series. For the case study catchments, the relative contribution of input uncertainty due to rainfall is higher than that due to potential evapotranspiration, and overall input uncertainty is significant compared to model structure and parameter uncertainty. Overall, this study investigates the propagation of input uncertainty in a daily streamflow modelling scenario and demonstrates how input errors manifest across different streamflow magnitudes.

  8. A drift-diffusion checkpoint model predicts a highly variable and growth-factor-sensitive portion of the cell cycle G1 phase.

    PubMed

    Jones, Zack W; Leander, Rachel; Quaranta, Vito; Harris, Leonard A; Tyson, Darren R

    2018-01-01

    Even among isogenic cells, the time to progress through the cell cycle, or the intermitotic time (IMT), is highly variable. This variability has been a topic of research for several decades and numerous mathematical models have been proposed to explain it. Previously, we developed a top-down, stochastic drift-diffusion+threshold (DDT) model of a cell cycle checkpoint and showed that it can accurately describe experimentally-derived IMT distributions [Leander R, Allen EJ, Garbett SP, Tyson DR, Quaranta V. Derivation and experimental comparison of cell-division probability densities. J. Theor. Biol. 2014;358:129-135]. Here, we use the DDT modeling approach for both descriptive and predictive data analysis. We develop a custom numerical method for the reliable maximum likelihood estimation of model parameters in the absence of a priori knowledge about the number of detectable checkpoints. We employ this method to fit different variants of the DDT model (with one, two, and three checkpoints) to IMT data from multiple cell lines under different growth conditions and drug treatments. We find that a two-checkpoint model best describes the data, consistent with the notion that the cell cycle can be broadly separated into two steps: the commitment to divide and the process of cell division. The model predicts one part of the cell cycle to be highly variable and growth factor sensitive while the other is less variable and relatively refractory to growth factor signaling. Using experimental data that separates IMT into G1 vs. S, G2, and M phases, we show that the model-predicted growth-factor-sensitive part of the cell cycle corresponds to a portion of G1, consistent with previous studies suggesting that the commitment step is the primary source of IMT variability. These results demonstrate that a simple stochastic model, with just a handful of parameters, can provide fundamental insights into the biological underpinnings of cell cycle progression.

  9. Sampling Strategies for Evaluating the Rate of Adventitious Transgene Presence in Non-Genetically Modified Crop Fields.

    PubMed

    Makowski, David; Bancal, Rémi; Bensadoun, Arnaud; Monod, Hervé; Messéan, Antoine

    2017-09-01

    According to E.U. regulations, the maximum allowable rate of adventitious transgene presence in non-genetically modified (GM) crops is 0.9%. We compared four sampling methods for the detection of transgenic material in agricultural non-GM maize fields: random sampling, stratified sampling, random sampling + ratio reweighting, random sampling + regression reweighting. Random sampling involves simply sampling maize grains from different locations selected at random from the field concerned. The stratified and reweighting sampling methods make use of an auxiliary variable corresponding to the output of a gene-flow model (a zero-inflated Poisson model) simulating cross-pollination as a function of wind speed, wind direction, and distance to the closest GM maize field. With the stratified sampling method, an auxiliary variable is used to define several strata with contrasting transgene presence rates, and grains are then sampled at random from each stratum. With the two methods involving reweighting, grains are first sampled at random from various locations within the field, and the observations are then reweighted according to the auxiliary variable. Data collected from three maize fields were used to compare the four sampling methods, and the results were used to determine the extent to which transgene presence rate estimation was improved by the use of stratified and reweighting sampling methods. We found that transgene rate estimates were more accurate and that substantially smaller samples could be used with sampling strategies based on an auxiliary variable derived from a gene-flow model. © 2017 Society for Risk Analysis.

  10. Effects of Ensemble Configuration on Estimates of Regional Climate Uncertainties

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goldenson, N.; Mauger, G.; Leung, L. R.

    Internal variability in the climate system can contribute substantial uncertainty in climate projections, particularly at regional scales. Internal variability can be quantified using large ensembles of simulations that are identical but for perturbed initial conditions. Here we compare methods for quantifying internal variability. Our study region spans the west coast of North America, which is strongly influenced by El Niño and other large-scale dynamics through their contribution to large-scale internal variability. Using a statistical framework to simultaneously account for multiple sources of uncertainty, we find that internal variability can be quantified consistently using a large ensemble or an ensemble ofmore » opportunity that includes small ensembles from multiple models and climate scenarios. The latter also produce estimates of uncertainty due to model differences. We conclude that projection uncertainties are best assessed using small single-model ensembles from as many model-scenario pairings as computationally feasible, which has implications for ensemble design in large modeling efforts.« less

  11. Calibrating the pixel-level Kepler imaging data with a causal data-driven model

    NASA Astrophysics Data System (ADS)

    Wang, Dun; Foreman-Mackey, Daniel; Hogg, David W.; Schölkopf, Bernhard

    2015-01-01

    In general, astronomical observations are affected by several kinds of noise, each with it's own causal source; there is photon noise, stochastic source variability, and residuals coming from imperfect calibration of the detector or telescope. In particular, the precision of NASA Kepler photometry for exoplanet science—the most precise photometric measurements of stars ever made—appears to be limited by unknown or untracked variations in spacecraft pointing and temperature, and unmodeled stellar variability. Here we present the Causal Pixel Model (CPM) for Kepler data, a data-driven model intended to capture variability but preserve transit signals. The CPM works at the pixel level (not the photometric measurement level); it can capture more fine-grained information about the variation of the spacecraft than is available in the pixel-summed aperture photometry. The basic idea is that CPM predicts each target pixel value from a large number of pixels of other stars sharing the instrument variabilities while not containing any information on possible transits at the target star. In addition, we use the target star's future and past (auto-regression). By appropriately separating the data into training and test sets, we ensure that information about any transit will be perfectly isolated from the fitting of the model. The method has four hyper-parameters (the number of predictor stars, the auto-regressive window size, and two L2-regularization amplitudes for model components), which we set by cross-validation. We determine a generic set of hyper-parameters that works well on most of the stars with 11≤V≤12 mag and apply the method to a corresponding set of target stars with known planet transits. We find that we can consistently outperform (for the purposes of exoplanet detection) the Kepler Pre-search Data Conditioning (PDC) method for exoplanet discovery, often improving the SNR by a factor of two. While we have not yet exhaustively tested the method at other magnitudes, we expect that it should be generally applicable, with positive consequences for subsequent exoplanet detection or stellar variability (in which case we must exclude the autoregressive part to preserve intrinsic variability).

  12. An Ensemble Successive Project Algorithm for Liquor Detection Using Near Infrared Sensor.

    PubMed

    Qu, Fangfang; Ren, Dong; Wang, Jihua; Zhang, Zhong; Lu, Na; Meng, Lei

    2016-01-11

    Spectral analysis technique based on near infrared (NIR) sensor is a powerful tool for complex information processing and high precision recognition, and it has been widely applied to quality analysis and online inspection of agricultural products. This paper proposes a new method to address the instability of small sample sizes in the successive projections algorithm (SPA) as well as the lack of association between selected variables and the analyte. The proposed method is an evaluated bootstrap ensemble SPA method (EBSPA) based on a variable evaluation index (EI) for variable selection, and is applied to the quantitative prediction of alcohol concentrations in liquor using NIR sensor. In the experiment, the proposed EBSPA with three kinds of modeling methods are established to test their performance. In addition, the proposed EBSPA combined with partial least square is compared with other state-of-the-art variable selection methods. The results show that the proposed method can solve the defects of SPA and it has the best generalization performance and stability. Furthermore, the physical meaning of the selected variables from the near infrared sensor data is clear, which can effectively reduce the variables and improve their prediction accuracy.

  13. New insights into soil temperature time series modeling: linear or nonlinear?

    NASA Astrophysics Data System (ADS)

    Bonakdari, Hossein; Moeeni, Hamid; Ebtehaj, Isa; Zeynoddin, Mohammad; Mahoammadian, Abdolmajid; Gharabaghi, Bahram

    2018-03-01

    Soil temperature (ST) is an important dynamic parameter, whose prediction is a major research topic in various fields including agriculture because ST has a critical role in hydrological processes at the soil surface. In this study, a new linear methodology is proposed based on stochastic methods for modeling daily soil temperature (DST). With this approach, the ST series components are determined to carry out modeling and spectral analysis. The results of this process are compared with two linear methods based on seasonal standardization and seasonal differencing in terms of four DST series. The series used in this study were measured at two stations, Champaign and Springfield, at depths of 10 and 20 cm. The results indicate that in all ST series reviewed, the periodic term is the most robust among all components. According to a comparison of the three methods applied to analyze the various series components, it appears that spectral analysis combined with stochastic methods outperformed the seasonal standardization and seasonal differencing methods. In addition to comparing the proposed methodology with linear methods, the ST modeling results were compared with the two nonlinear methods in two forms: considering hydrological variables (HV) as input variables and DST modeling as a time series. In a previous study at the mentioned sites, Kim and Singh Theor Appl Climatol 118:465-479, (2014) applied the popular Multilayer Perceptron (MLP) neural network and Adaptive Neuro-Fuzzy Inference System (ANFIS) nonlinear methods and considered HV as input variables. The comparison results signify that the relative error projected in estimating DST by the proposed methodology was about 6%, while this value with MLP and ANFIS was over 15%. Moreover, MLP and ANFIS models were employed for DST time series modeling. Due to these models' relatively inferior performance to the proposed methodology, two hybrid models were implemented: the weights and membership function of MLP and ANFIS (respectively) were optimized with the particle swarm optimization (PSO) algorithm in conjunction with the wavelet transform and nonlinear methods (Wavelet-MLP & Wavelet-ANFIS). A comparison of the proposed methodology with individual and hybrid nonlinear models in predicting DST time series indicates the lowest Akaike Information Criterion (AIC) index value, which considers model simplicity and accuracy simultaneously at different depths and stations. The methodology presented in this study can thus serve as an excellent alternative to complex nonlinear methods that are normally employed to examine DST.

  14. Uncertainty of streamwater solute fluxes in five contrasting headwater catchments including model uncertainty and natural variability (Invited)

    NASA Astrophysics Data System (ADS)

    Aulenbach, B. T.; Burns, D. A.; Shanley, J. B.; Yanai, R. D.; Bae, K.; Wild, A.; Yang, Y.; Dong, Y.

    2013-12-01

    There are many sources of uncertainty in estimates of streamwater solute flux. Flux is the product of discharge and concentration (summed over time), each of which has measurement uncertainty of its own. Discharge can be measured almost continuously, but concentrations are usually determined from discrete samples, which increases uncertainty dependent on sampling frequency and how concentrations are assigned for the periods between samples. Gaps between samples can be estimated by linear interpolation or by models that that use the relations between concentration and continuously measured or known variables such as discharge, season, temperature, and time. For this project, developed in cooperation with QUEST (Quantifying Uncertainty in Ecosystem Studies), we evaluated uncertainty for three flux estimation methods and three different sampling frequencies (monthly, weekly, and weekly plus event). The constituents investigated were dissolved NO3, Si, SO4, and dissolved organic carbon (DOC), solutes whose concentration dynamics exhibit strongly contrasting behavior. The evaluation was completed for a 10-year period at five small, forested watersheds in Georgia, New Hampshire, New York, Puerto Rico, and Vermont. Concentration regression models were developed for each solute at each of the three sampling frequencies for all five watersheds. Fluxes were then calculated using (1) a linear interpolation approach, (2) a regression-model method, and (3) the composite method - which combines the regression-model method for estimating concentrations and the linear interpolation method for correcting model residuals to the observed sample concentrations. We considered the best estimates of flux to be derived using the composite method at the highest sampling frequencies. We also evaluated the importance of sampling frequency and estimation method on flux estimate uncertainty; flux uncertainty was dependent on the variability characteristics of each solute and varied for different reporting periods (e.g. 10-year, study period vs. annually vs. monthly). The usefulness of the two regression model based flux estimation approaches was dependent upon the amount of variance in concentrations the regression models could explain. Our results can guide the development of optimal sampling strategies by weighing sampling frequency with improvements in uncertainty in stream flux estimates for solutes with particular characteristics of variability. The appropriate flux estimation method is dependent on a combination of sampling frequency and the strength of concentration regression models. Sites: Biscuit Brook (Frost Valley, NY), Hubbard Brook Experimental Forest and LTER (West Thornton, NH), Luquillo Experimental Forest and LTER (Luquillo, Puerto Rico), Panola Mountain (Stockbridge, GA), Sleepers River Research Watershed (Danville, VT)

  15. Independence screening for high dimensional nonlinear additive ODE models with applications to dynamic gene regulatory networks.

    PubMed

    Xue, Hongqi; Wu, Shuang; Wu, Yichao; Ramirez Idarraga, Juan C; Wu, Hulin

    2018-05-02

    Mechanism-driven low-dimensional ordinary differential equation (ODE) models are often used to model viral dynamics at cellular levels and epidemics of infectious diseases. However, low-dimensional mechanism-based ODE models are limited for modeling infectious diseases at molecular levels such as transcriptomic or proteomic levels, which is critical to understand pathogenesis of diseases. Although linear ODE models have been proposed for gene regulatory networks (GRNs), nonlinear regulations are common in GRNs. The reconstruction of large-scale nonlinear networks from time-course gene expression data remains an unresolved issue. Here, we use high-dimensional nonlinear additive ODEs to model GRNs and propose a 4-step procedure to efficiently perform variable selection for nonlinear ODEs. To tackle the challenge of high dimensionality, we couple the 2-stage smoothing-based estimation method for ODEs and a nonlinear independence screening method to perform variable selection for the nonlinear ODE models. We have shown that our method possesses the sure screening property and it can handle problems with non-polynomial dimensionality. Numerical performance of the proposed method is illustrated with simulated data and a real data example for identifying the dynamic GRN of Saccharomyces cerevisiae. Copyright © 2018 John Wiley & Sons, Ltd.

  16. Dissociation Predicts Later Attention Problems in Sexually Abused Children

    ERIC Educational Resources Information Center

    Kaplow, Julie B.; Hall, Erin; Koenen, Karestan C.; Dodge, Kenneth A.; Amaya-Jackson, Lisa

    2008-01-01

    Objective: The goals of this research are to develop and test a prospective model of attention problems in sexually abused children that includes fixed variables (e.g., gender), trauma, and disclosure-related pathways. Methods: At Time 1, fixed variables, trauma variables, and stress reactions upon disclosure were assessed in 156 children aged…

  17. The Potential for Differential Findings among Invariance Testing Strategies for Multisample Measured Variable Path Models

    ERIC Educational Resources Information Center

    Mann, Heather M.; Rutstein, Daisy W.; Hancock, Gregory R.

    2009-01-01

    Multisample measured variable path analysis is used to test whether causal/structural relations among measured variables differ across populations. Several invariance testing approaches are available for assessing cross-group equality of such relations, but the associated test statistics may vary considerably across methods. This study is a…

  18. Tutorial on Using Regression Models with Count Outcomes Using R

    ERIC Educational Resources Information Center

    Beaujean, A. Alexander; Morgan, Grant B.

    2016-01-01

    Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can…

  19. Least Principal Components Analysis (LPCA): An Alternative to Regression Analysis.

    ERIC Educational Resources Information Center

    Olson, Jeffery E.

    Often, all of the variables in a model are latent, random, or subject to measurement error, or there is not an obvious dependent variable. When any of these conditions exist, an appropriate method for estimating the linear relationships among the variables is Least Principal Components Analysis. Least Principal Components are robust, consistent,…

  20. Detection of "noisy" chaos in a time series

    NASA Technical Reports Server (NTRS)

    Chon, K. H.; Kanters, J. K.; Cohen, R. J.; Holstein-Rathlou, N. H.

    1997-01-01

    Time series from biological system often displays fluctuations in the measured variables. Much effort has been directed at determining whether this variability reflects deterministic chaos, or whether it is merely "noise". The output from most biological systems is probably the result of both the internal dynamics of the systems, and the input to the system from the surroundings. This implies that the system should be viewed as a mixed system with both stochastic and deterministic components. We present a method that appears to be useful in deciding whether determinism is present in a time series, and if this determinism has chaotic attributes. The method relies on fitting a nonlinear autoregressive model to the time series followed by an estimation of the characteristic exponents of the model over the observed probability distribution of states for the system. The method is tested by computer simulations, and applied to heart rate variability data.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pichara, Karim; Protopapas, Pavlos

    We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine howmore » classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.« less

  2. On the Spike Train Variability Characterized by Variance-to-Mean Power Relationship.

    PubMed

    Koyama, Shinsuke

    2015-07-01

    We propose a statistical method for modeling the non-Poisson variability of spike trains observed in a wide range of brain regions. Central to our approach is the assumption that the variance and the mean of interspike intervals are related by a power function characterized by two parameters: the scale factor and exponent. It is shown that this single assumption allows the variability of spike trains to have an arbitrary scale and various dependencies on the firing rate in the spike count statistics, as well as in the interval statistics, depending on the two parameters of the power function. We also propose a statistical model for spike trains that exhibits the variance-to-mean power relationship. Based on this, a maximum likelihood method is developed for inferring the parameters from rate-modulated spike trains. The proposed method is illustrated on simulated and experimental spike trains.

  3. Variable context Markov chains for HIV protease cleavage site prediction.

    PubMed

    Oğul, Hasan

    2009-06-01

    Deciphering the knowledge of HIV protease specificity and developing computational tools for detecting its cleavage sites in protein polypeptide chain are very desirable for designing efficient and specific chemical inhibitors to prevent acquired immunodeficiency syndrome. In this study, we developed a generative model based on a generalization of variable order Markov chains (VOMC) for peptide sequences and adapted the model for prediction of their cleavability by certain proteases. The new method, called variable context Markov chains (VCMC), attempts to identify the context equivalence based on the evolutionary similarities between individual amino acids. It was applied for HIV-1 protease cleavage site prediction problem and shown to outperform existing methods in terms of prediction accuracy on a common dataset. In general, the method is a promising tool for prediction of cleavage sites of all proteases and encouraged to be used for any kind of peptide classification problem as well.

  4. Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents

    PubMed Central

    Guo, Pi; Zeng, Fangfang; Hu, Xiaomin; Zhang, Dingmei; Zhu, Shuming; Deng, Yu; Hao, Yuantao

    2015-01-01

    Objectives In epidemiological studies, it is important to identify independent associations between collective exposures and a health outcome. The current stepwise selection technique ignores stochastic errors and suffers from a lack of stability. The alternative LASSO-penalized regression model can be applied to detect significant predictors from a pool of candidate variables. However, this technique is prone to false positives and tends to create excessive biases. It remains challenging to develop robust variable selection methods and enhance predictability. Material and methods Two improved algorithms denoted the two-stage hybrid and bootstrap ranking procedures, both using a LASSO-type penalty, were developed for epidemiological association analysis. The performance of the proposed procedures and other methods including conventional LASSO, Bolasso, stepwise and stability selection models were evaluated using intensive simulation. In addition, methods were compared by using an empirical analysis based on large-scale survey data of hepatitis B infection-relevant factors among Guangdong residents. Results The proposed procedures produced comparable or less biased selection results when compared to conventional variable selection models. In total, the two newly proposed procedures were stable with respect to various scenarios of simulation, demonstrating a higher power and a lower false positive rate during variable selection than the compared methods. In empirical analysis, the proposed procedures yielding a sparse set of hepatitis B infection-relevant factors gave the best predictive performance and showed that the procedures were able to select a more stringent set of factors. The individual history of hepatitis B vaccination, family and individual history of hepatitis B infection were associated with hepatitis B infection in the studied residents according to the proposed procedures. Conclusions The newly proposed procedures improve the identification of significant variables and enable us to derive a new insight into epidemiological association analysis. PMID:26214802

  5. Accuracy and precision of polyurethane dental arch models fabricated using a three-dimensional subtractive rapid prototyping method with an intraoral scanning technique

    PubMed Central

    Kim, Jae-Hong; Kim, Ki-Baek; Kim, Woong-Chul; Kim, Ji-Hwan

    2014-01-01

    Objective This study aimed to evaluate the accuracy and precision of polyurethane (PUT) dental arch models fabricated using a three-dimensional (3D) subtractive rapid prototyping (RP) method with an intraoral scanning technique by comparing linear measurements obtained from PUT models and conventional plaster models. Methods Ten plaster models were duplicated using a selected standard master model and conventional impression, and 10 PUT models were duplicated using the 3D subtractive RP technique with an oral scanner. Six linear measurements were evaluated in terms of x, y, and z-axes using a non-contact white light scanner. Accuracy was assessed using mean differences between two measurements, and precision was examined using four quantitative methods and the Bland-Altman graphical method. Repeatability was evaluated in terms of intra-examiner variability, and reproducibility was assessed in terms of inter-examiner and inter-method variability. Results The mean difference between plaster models and PUT models ranged from 0.07 mm to 0.33 mm. Relative measurement errors ranged from 2.2% to 7.6% and intraclass correlation coefficients ranged from 0.93 to 0.96, when comparing plaster models and PUT models. The Bland-Altman plot showed good agreement. Conclusions The accuracy and precision of PUT dental models for evaluating the performance of oral scanner and subtractive RP technology was acceptable. Because of the recent improvements in block material and computerized numeric control milling machines, the subtractive RP method may be a good choice for dental arch models. PMID:24696823

  6. A new state space model for the NASA/JPL 70-meter antenna servo controls

    NASA Technical Reports Server (NTRS)

    Hill, R. E.

    1987-01-01

    A control axis referenced model of the NASA/JPL 70-m antenna structure is combined with the dynamic equations of servo components to produce a comprehansive state variable (matrix) model of the coupled system. An interactive Fortran program for generating the linear system model and computing its salient parameters is described. Results are produced in a state variable, block diagram, and in factored transfer function forms to facilitate design and analysis by classical as well as modern control methods.

  7. Climate, soil or both? Which variables are better predictors of the distributions of Australian shrub species?

    PubMed Central

    Esperón-Rodríguez, Manuel; Baumgartner, John B.; Beaumont, Linda J.

    2017-01-01

    Background Shrubs play a key role in biogeochemical cycles, prevent soil and water erosion, provide forage for livestock, and are a source of food, wood and non-wood products. However, despite their ecological and societal importance, the influence of different environmental variables on shrub distributions remains unclear. We evaluated the influence of climate and soil characteristics, and whether including soil variables improved the performance of a species distribution model (SDM), Maxent. Methods This study assessed variation in predictions of environmental suitability for 29 Australian shrub species (representing dominant members of six shrubland classes) due to the use of alternative sets of predictor variables. Models were calibrated with (1) climate variables only, (2) climate and soil variables, and (3) soil variables only. Results The predictive power of SDMs differed substantially across species, but generally models calibrated with both climate and soil data performed better than those calibrated only with climate variables. Models calibrated solely with soil variables were the least accurate. We found regional differences in potential shrub species richness across Australia due to the use of different sets of variables. Conclusions Our study provides evidence that predicted patterns of species richness may be sensitive to the choice of predictor set when multiple, plausible alternatives exist, and demonstrates the importance of considering soil properties when modeling availability of habitat for plants. PMID:28652933

  8. Using Propensity Score Methods to Approximate Factorial Experimental Designs to Analyze the Relationship between Two Variables and an Outcome

    ERIC Educational Resources Information Center

    Dong, Nianbo

    2015-01-01

    Researchers have become increasingly interested in programs' main and interaction effects of two variables (A and B, e.g., two treatment variables or one treatment variable and one moderator) on outcomes. A challenge for estimating main and interaction effects is to eliminate selection bias across A-by-B groups. I introduce Rubin's causal model to…

  9. Development and Testing of a Coupled Ocean-atmosphere Mesoscale Ensemble Prediction System

    DTIC Science & Technology

    2011-06-28

    wind, temperature, and moisture variables, while the oceanographic ET is derived from ocean current, temperature, and salinity variables. Estimates of...wind, temperature, and moisture variables while the oceanographic ET is derived from ocean current temperature, and salinity variables. Estimates of...uncertainty in the model. Rigorously accurate ensemble methods for describing the distribution of future states given past information include particle

  10. The spatial pattern of suicide in the US in relation to deprivation, fragmentation and rurality.

    PubMed

    Congdon, Peter

    2011-01-01

    Analysis of geographical patterns of suicide and psychiatric morbidity has demonstrated the impact of latent ecological variables (such as deprivation, rurality). Such latent variables may be derived by conventional multivariate techniques from sets of observed indices (for example, by principal components), by composite variable methods or by methods which explicitly consider the spatial framework of areas and, in particular, the spatial clustering of latent risks and outcomes. This article considers a latent random variable approach to explaining geographical contrasts in suicide in the US; and it develops a spatial structural equation model incorporating deprivation, social fragmentation and rurality. The approach allows for such latent spatial constructs to be correlated both within and between areas. Potential effects of area ethnic mix are also included. The model is applied to male and female suicide deaths over 2002–06 in 3142 US counties.

  11. Discrete Choice Modeling (DCM): An Exciting Marketing Research Survey Method for Educational Researchers.

    ERIC Educational Resources Information Center

    Berdie, Doug R.

    Discrete Choice Marketing (DCM), a research technique that has become more popular in recent marketing research, is described. DCM is a method that forces people to look at the combination of relevant variables within each choice domain and, with each option fully defined in terms of the values for those variables, make a choice of options. DCM…

  12. Analyses of the most influential factors for vibration monitoring of planetary power transmissions in pellet mills by adaptive neuro-fuzzy technique

    NASA Astrophysics Data System (ADS)

    Milovančević, Miloš; Nikolić, Vlastimir; Anđelković, Boban

    2017-01-01

    Vibration-based structural health monitoring is widely recognized as an attractive strategy for early damage detection in civil structures. Vibration monitoring and prediction is important for any system since it can save many unpredictable behaviors of the system. If the vibration monitoring is properly managed, that can ensure economic and safe operations. Potentials for further improvement of vibration monitoring lie in the improvement of current control strategies. One of the options is the introduction of model predictive control. Multistep ahead predictive models of vibration are a starting point for creating a successful model predictive strategy. For the purpose of this article, predictive models of are created for vibration monitoring of planetary power transmissions in pellet mills. The models were developed using the novel method based on ANFIS (adaptive neuro fuzzy inference system). The aim of this study is to investigate the potential of ANFIS for selecting the most relevant variables for predictive models of vibration monitoring of pellet mills power transmission. The vibration data are collected by PIC (Programmable Interface Controller) microcontrollers. The goal of the predictive vibration monitoring of planetary power transmissions in pellet mills is to indicate deterioration in the vibration of the power transmissions before the actual failure occurs. The ANFIS process for variable selection was implemented in order to detect the predominant variables affecting the prediction of vibration monitoring. It was also used to select the minimal input subset of variables from the initial set of input variables - current and lagged variables (up to 11 steps) of vibration. The obtained results could be used for simplification of predictive methods so as to avoid multiple input variables. It was preferable to used models with less inputs because of overfitting between training and testing data. While the obtained results are promising, further work is required in order to get results that could be directly applied in practice.

  13. Mining Feature of Data Fusion in the Classification of Beer Flavor Information Using E-Tongue and E-Nose

    PubMed Central

    Men, Hong; Shi, Yan; Fu, Songlin; Jiao, Yanan; Qiao, Yu; Liu, Jingjing

    2017-01-01

    Multi-sensor data fusion can provide more comprehensive and more accurate analysis results. However, it also brings some redundant information, which is an important issue with respect to finding a feature-mining method for intuitive and efficient analysis. This paper demonstrates a feature-mining method based on variable accumulation to find the best expression form and variables’ behavior affecting beer flavor. First, e-tongue and e-nose were used to gather the taste and olfactory information of beer, respectively. Second, principal component analysis (PCA), genetic algorithm-partial least squares (GA-PLS), and variable importance of projection (VIP) scores were applied to select feature variables of the original fusion set. Finally, the classification models based on support vector machine (SVM), random forests (RF), and extreme learning machine (ELM) were established to evaluate the efficiency of the feature-mining method. The result shows that the feature-mining method based on variable accumulation obtains the main feature affecting beer flavor information, and the best classification performance for the SVM, RF, and ELM models with 96.67%, 94.44%, and 98.33% prediction accuracy, respectively. PMID:28753917

  14. A fault isolation method based on the incidence matrix of an augmented system

    NASA Astrophysics Data System (ADS)

    Chen, Changxiong; Chen, Liping; Ding, Jianwan; Wu, Yizhong

    2018-03-01

    A new approach is proposed for isolating faults and fast identifying the redundant sensors of a system in this paper. By introducing fault signal as additional state variable, an augmented system model is constructed by the original system model, fault signals and sensor measurement equations. The structural properties of an augmented system model are provided in this paper. From the viewpoint of evaluating fault variables, the calculating correlations of the fault variables in the system can be found, which imply the fault isolation properties of the system. Compared with previous isolation approaches, the highlights of the new approach are that it can quickly find the faults which can be isolated using exclusive residuals, at the same time, and can identify the redundant sensors in the system, which are useful for the design of diagnosis system. The simulation of a four-tank system is reported to validate the proposed method.

  15. Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability using High-Resolution Cloud Observations

    NASA Astrophysics Data System (ADS)

    Norris, P. M.; da Silva, A. M., Jr.

    2016-12-01

    Norris and da Silva recently published a method to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation (CDA). The gridcolumn model includes assumed-PDF intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used are MODIS cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. The new approach not only significantly reduces mean and standard deviation biases with respect to the assimilated observables, but also improves the simulated rotational-Ramman scattering cloud optical centroid pressure against independent (non-assimilated) retrievals from the OMI instrument. One obvious difficulty for the method, and other CDA methods, is the lack of information content in passive cloud observables on cloud vertical structure, beyond cloud-top and thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard is helpful, better honoring inversion structures in the background state.

  16. On Fitting a Multivariate Two-Part Latent Growth Model

    PubMed Central

    Xu, Shu; Blozis, Shelley A.; Vandewater, Elizabeth A.

    2017-01-01

    A 2-part latent growth model can be used to analyze semicontinuous data to simultaneously study change in the probability that an individual engages in a behavior, and if engaged, change in the behavior. This article uses a Monte Carlo (MC) integration algorithm to study the interrelationships between the growth factors of 2 variables measured longitudinally where each variable can follow a 2-part latent growth model. A SAS macro implementing Mplus is developed to estimate the model to take into account the sampling uncertainty of this simulation-based computational approach. A sample of time-use data is used to show how maximum likelihood estimates can be obtained using a rectangular numerical integration method and an MC integration method. PMID:29333054

  17. Estimate variable importance for recurrent event outcomes with an application to identify hypoglycemia risk factors.

    PubMed

    Duan, Ran; Fu, Haoda

    2015-08-30

    Recurrent event data are an important data type for medical research. In particular, many safety endpoints are recurrent outcomes, such as hypoglycemic events. For such a situation, it is important to identify the factors causing these events and rank these factors by their importance. Traditional model selection methods are not able to provide variable importance in this context. Methods that are able to evaluate the variable importance, such as gradient boosting and random forest algorithms, cannot directly be applied to recurrent events data. In this paper, we propose a two-step method that enables us to evaluate the variable importance for recurrent events data. We evaluated the performance of our proposed method by simulations and applied it to a data set from a diabetes study. Copyright © 2015 John Wiley & Sons, Ltd.

  18. Cross-scale modeling of surface temperature and tree seedling establishment inmountain landscapes

    USGS Publications Warehouse

    Dingman, John; Sweet, Lynn C.; McCullough, Ian M.; Davis, Frank W.; Flint, Alan L.; Franklin, Janet; Flint, Lorraine E.

    2013-01-01

    Abstract: Introduction: Estimating surface temperature from above-ground field measurements is important for understanding the complex landscape patterns of plant seedling survival and establishment, processes which occur at heights of only several centimeters. Currently, future climate models predict temperature at 2 m above ground, leaving ground-surface microclimate not well characterized. Methods: Using a network of field temperature sensors and climate models, a ground-surface temperature method was used to estimate microclimate variability of minimum and maximum temperature. Temperature lapse rates were derived from field temperature sensors and distributed across the landscape capturing differences in solar radiation and cold air drainages modeled at a 30-m spatial resolution. Results: The surface temperature estimation method used for this analysis successfully estimated minimum surface temperatures on north-facing, south-facing, valley, and ridgeline topographic settings, and when compared to measured temperatures yielded an R2 of 0.88, 0.80, 0.88, and 0.80, respectively. Maximum surface temperatures generally had slightly more spatial variability than minimum surface temperatures, resulting in R2 values of 0.86, 0.77, 0.72, and 0.79 for north-facing, south-facing, valley, and ridgeline topographic settings. Quasi-Poisson regressions predicting recruitment of Quercus kelloggii (black oak) seedlings from temperature variables were significantly improved using these estimates of surface temperature compared to air temperature modeled at 2 m. Conclusion: Predicting minimum and maximum ground-surface temperatures using a downscaled climate model coupled with temperature lapse rates estimated from field measurements provides a method for modeling temperature effects on plant recruitment. Such methods could be applied to improve projections of species’ range shifts under climate change. Areas of complex topography can provide intricate microclimates that may allow species to redistribute locally as climate changes.

  19. A model for helicopter guidance on spiral trajectories

    NASA Technical Reports Server (NTRS)

    Mendenhall, S.; Slater, G. L.

    1980-01-01

    A point mass model is developed for helicopter guidance on spiral trajectories. A fully coupled set of state equations is developed and perturbation equations suitable for 3-D and 4-D guidance are derived and shown to be amenable to conventional state variable feedback methods. Control variables are chosen to be the magnitude and orientation of the net rotor thrust. Using these variables reference controls for nonlevel accelerating trajectories are easily determined. The effects of constant wind are shown to require significant feedforward correction to some of the reference controls and to the time. Although not easily measured themselves, the controls variables chosen are shown to be easily related to the physical variables available in the cockpit.

  20. Nonlinear Friction Compensation of Ball Screw Driven Stage Based on Variable Natural Length Spring Model and Disturbance Observer

    NASA Astrophysics Data System (ADS)

    Asaumi, Hiroyoshi; Fujimoto, Hiroshi

    Ball screw driven stages are used for industrial equipments such as machine tools and semiconductor equipments. Fast and precise positioning is necessary to enhance productivity and microfabrication technology of the system. The rolling friction of the ball screw driven stage deteriorate the positioning performance. Therefore, the control system based on the friction model is necessary. In this paper, we propose variable natural length spring model (VNLS model) as the friction model. VNLS model is simple and easy to implement as friction controller. Next, we propose multi variable natural length spring model (MVNLS model) as the friction model. MVNLS model can represent friction characteristic of the stage precisely. Moreover, the control system based on MVNLS model and disturbance observer is proposed. Finally, the simulation results and experimental results show the advantages of the proposed method.

  1. A probabilistic approach to modeling erosion for spatially-varied conditions

    Treesearch

    William J. Elliot; Peter R. Robichaud; C. D. Pannkuk

    2001-01-01

    In the years following a major forest disturbance, such as fire, the erosion rate is greatly influenced by variability in weather, in soil properties, and in spatial distribution. This paper presents a method to incorporate these variabilities into the erosion rate predicted by the Water Erosion Prediction Project model. It appears that it is not necessary to describe...

  2. Do Two or More Multicomponent Instruments Measure the Same Construct? Testing Construct Congruence Using Latent Variable Modeling

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.; Tong, Bing

    2016-01-01

    A latent variable modeling procedure is discussed that can be used to test if two or more homogeneous multicomponent instruments with distinct components are measuring the same underlying construct. The method is widely applicable in scale construction and development research and can also be of special interest in construct validation studies.…

  3. Developing deterioration models for Wyoming bridges.

    DOT National Transportation Integrated Search

    2016-05-01

    Deterioration models for the Wyoming Bridge Inventory were developed using both stochastic and deterministic models. : The selection of explanatory variables is investigated and a new method using LASSO regression to eliminate human bias : in explana...

  4. Multicollinearity is a red herring in the search for moderator variables: A guide to interpreting moderated multiple regression models and a critique of Iacobucci, Schneider, Popovich, and Bakamitsos (2016).

    PubMed

    McClelland, Gary H; Irwin, Julie R; Disatnik, David; Sivan, Liron

    2017-02-01

    Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship. We show multicollinearity is completely irrelevant for tests of moderator variables. Furthermore, readers of Iacobucci et al. might be confused by a number of their errors. We note those errors, but more positively, we describe a variety of methods researchers might use to test and interpret their moderated multiple regression models, including two-stage testing, mean-centering, spotlighting, orthogonalizing, and floodlighting without regard to putative issues of multicollinearity. We cite a number of recent studies in the psychological literature in which the researchers used these methods appropriately to test, to interpret, and to report their moderated multiple regression models. We conclude with a set of recommendations for the analysis and reporting of moderated multiple regression that should help researchers better understand their models and facilitate generalizations across studies.

  5. Predictors of the number of under-five malnourished children in Bangladesh: application of the generalized poisson regression model

    PubMed Central

    2013-01-01

    Background Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. Methods The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. Results The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance < mean) property. Our study also identify several significant predictors of the outcome variable namely mother’s education, father’s education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Conclusions Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh. PMID:23297699

  6. Non-linear scaling of a musculoskeletal model of the lower limb using statistical shape models.

    PubMed

    Nolte, Daniel; Tsang, Chui Kit; Zhang, Kai Yu; Ding, Ziyun; Kedgley, Angela E; Bull, Anthony M J

    2016-10-03

    Accurate muscle geometry for musculoskeletal models is important to enable accurate subject-specific simulations. Commonly, linear scaling is used to obtain individualised muscle geometry. More advanced methods include non-linear scaling using segmented bone surfaces and manual or semi-automatic digitisation of muscle paths from medical images. In this study, a new scaling method combining non-linear scaling with reconstructions of bone surfaces using statistical shape modelling is presented. Statistical Shape Models (SSMs) of femur and tibia/fibula were used to reconstruct bone surfaces of nine subjects. Reference models were created by morphing manually digitised muscle paths to mean shapes of the SSMs using non-linear transformations and inter-subject variability was calculated. Subject-specific models of muscle attachment and via points were created from three reference models. The accuracy was evaluated by calculating the differences between the scaled and manually digitised models. The points defining the muscle paths showed large inter-subject variability at the thigh and shank - up to 26mm; this was found to limit the accuracy of all studied scaling methods. Errors for the subject-specific muscle point reconstructions of the thigh could be decreased by 9% to 20% by using the non-linear scaling compared to a typical linear scaling method. We conclude that the proposed non-linear scaling method is more accurate than linear scaling methods. Thus, when combined with the ability to reconstruct bone surfaces from incomplete or scattered geometry data using statistical shape models our proposed method is an alternative to linear scaling methods. Copyright © 2016 The Author. Published by Elsevier Ltd.. All rights reserved.

  7. On the measurement of stability in over-time data.

    PubMed

    Kenny, D A; Campbell, D T

    1989-06-01

    In this article, autoregressive models and growth curve models are compared. Autoregressive models are useful because they allow for random change, permit scores to increase or decrease, and do not require strong assumptions about the level of measurement. Three previously presented designs for estimating stability are described: (a) time-series, (b) simplex, and (c) two-wave, one-factor methods. A two-wave, multiple-factor model also is presented, in which the variables are assumed to be caused by a set of latent variables. The factor structure does not change over time and so the synchronous relationships are temporally invariant. The factors do not cause each other and have the same stability. The parameters of the model are the factor loading structure, each variable's reliability, and the stability of the factors. We apply the model to two data sets. For eight cognitive skill variables measured at four times, the 2-year stability is estimated to be .92 and the 6-year stability is .83. For nine personality variables, the 3-year stability is .68. We speculate that for many variables there are two components: one component that changes very slowly (the trait component) and another that changes very rapidly (the state component); thus each variable is a mixture of trait and state. Circumstantial evidence supporting this view is presented.

  8. Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea

    NASA Astrophysics Data System (ADS)

    Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng

    2011-11-01

    SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.

  9. An efficient deterministic-probabilistic approach to modeling regional groundwater flow: 2. Application to Owens Valley, California

    USGS Publications Warehouse

    Guymon, Gary L.; Yen, Chung-Cheng

    1990-01-01

    The applicability of a deterministic-probabilistic model for predicting water tables in southern Owens Valley, California, is evaluated. The model is based on a two-layer deterministic model that is cascaded with a two-point probability model. To reduce the potentially large number of uncertain variables in the deterministic model, lumping of uncertain variables was evaluated by sensitivity analysis to reduce the total number of uncertain variables to three variables: hydraulic conductivity, storage coefficient or specific yield, and source-sink function. Results demonstrate that lumping of uncertain parameters reduces computational effort while providing sufficient precision for the case studied. Simulated spatial coefficients of variation for water table temporal position in most of the basin is small, which suggests that deterministic models can predict water tables in these areas with good precision. However, in several important areas where pumping occurs or the geology is complex, the simulated spatial coefficients of variation are over estimated by the two-point probability method.

  10. An efficient deterministic-probabilistic approach to modeling regional groundwater flow: 2. Application to Owens Valley, California

    NASA Astrophysics Data System (ADS)

    Guymon, Gary L.; Yen, Chung-Cheng

    1990-07-01

    The applicability of a deterministic-probabilistic model for predicting water tables in southern Owens Valley, California, is evaluated. The model is based on a two-layer deterministic model that is cascaded with a two-point probability model. To reduce the potentially large number of uncertain variables in the deterministic model, lumping of uncertain variables was evaluated by sensitivity analysis to reduce the total number of uncertain variables to three variables: hydraulic conductivity, storage coefficient or specific yield, and source-sink function. Results demonstrate that lumping of uncertain parameters reduces computational effort while providing sufficient precision for the case studied. Simulated spatial coefficients of variation for water table temporal position in most of the basin is small, which suggests that deterministic models can predict water tables in these areas with good precision. However, in several important areas where pumping occurs or the geology is complex, the simulated spatial coefficients of variation are over estimated by the two-point probability method.

  11. Simulating initial attack with two fire containment models

    Treesearch

    Romain M. Mees

    1985-01-01

    Given a variable rate of fireline construction and an elliptical fire growth model, two methods for estimating the required number of resources, time to containment, and the resulting fire area were compared. Five examples illustrate some of the computational differences between the simple and the complex methods. The equations for the two methods can be used and...

  12. Quantifying and Testing Indirect Effects in Simple Mediation Models when the Constituent Paths Are Nonlinear

    ERIC Educational Resources Information Center

    Hayes, Andrew F.; Preacher, Kristopher J.

    2010-01-01

    Most treatments of indirect effects and mediation in the statistical methods literature and the corresponding methods used by behavioral scientists have assumed linear relationships between variables in the causal system. Here we describe and extend a method first introduced by Stolzenberg (1980) for estimating indirect effects in models of…

  13. The prediction of intelligence in preschool children using alternative models to regression.

    PubMed

    Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

    2011-12-01

    Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.

  14. Instrumental variables estimation of exposure effects on a time-to-event endpoint using structural cumulative survival models.

    PubMed

    Martinussen, Torben; Vansteelandt, Stijn; Tchetgen Tchetgen, Eric J; Zucker, David M

    2017-12-01

    The use of instrumental variables for estimating the effect of an exposure on an outcome is popular in econometrics, and increasingly so in epidemiology. This increasing popularity may be attributed to the natural occurrence of instrumental variables in observational studies that incorporate elements of randomization, either by design or by nature (e.g., random inheritance of genes). Instrumental variables estimation of exposure effects is well established for continuous outcomes and to some extent for binary outcomes. It is, however, largely lacking for time-to-event outcomes because of complications due to censoring and survivorship bias. In this article, we make a novel proposal under a class of structural cumulative survival models which parameterize time-varying effects of a point exposure directly on the scale of the survival function; these models are essentially equivalent with a semi-parametric variant of the instrumental variables additive hazards model. We propose a class of recursive instrumental variable estimators for these exposure effects, and derive their large sample properties along with inferential tools. We examine the performance of the proposed method in simulation studies and illustrate it in a Mendelian randomization study to evaluate the effect of diabetes on mortality using data from the Health and Retirement Study. We further use the proposed method to investigate potential benefit from breast cancer screening on subsequent breast cancer mortality based on the HIP-study. © 2017, The International Biometric Society.

  15. Training the max-margin sequence model with the relaxed slack variables.

    PubMed

    Niu, Lingfeng; Wu, Jianmin; Shi, Yong

    2012-09-01

    Sequence models are widely used in many applications such as natural language processing, information extraction and optical character recognition, etc. We propose a new approach to train the max-margin based sequence model by relaxing the slack variables in this paper. With the canonical feature mapping definition, the relaxed problem is solved by training a multiclass Support Vector Machine (SVM). Compared with the state-of-the-art solutions for the sequence learning, the new method has the following advantages: firstly, the sequence training problem is transformed into a multiclassification problem, which is more widely studied and already has quite a few off-the-shelf training packages; secondly, this new approach reduces the complexity of training significantly and achieves comparable prediction performance compared with the existing sequence models; thirdly, when the size of training data is limited, by assigning different slack variables to different microlabel pairs, the new method can use the discriminative information more frugally and produces more reliable model; last but not least, by employing kernels in the intermediate multiclass SVM, nonlinear feature space can be easily explored. Experimental results on the task of named entity recognition, information extraction and handwritten letter recognition with the public datasets illustrate the efficiency and effectiveness of our method. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. Modeling potential habitats for alien species Dreissena polymorpha in continental USA

    USGS Publications Warehouse

    Mingyang, Li; Yunwei, Ju; Kumar, Sunil; Stohlgren, Thomas J.

    2008-01-01

    The effective measure to minimize the damage of invasive species is to block the potential invasive species to enter into suitable areas. 1864 occurrence points with GPS coordinates and 34 environmental variables from Daymet datasets were gathered, and 4 modeling methods, i.e., Logistic Regression (LR), Classification and Regression Trees (CART), Genetic Algorithm for Rule-Set Prediction (GARP), and maximum entropy method (Maxent), were introduced to generate potential geographic distributions for invasive species Dreissena polymorpha in Continental USA. Then 3 statistical criteria of the area under the Receiver Operating Characteristic curve (AUC), Pearson correlation (COR) and Kappa value were calculated to evaluate the performance of the models, followed by analyses on major contribution variables. Results showed that in terms of the 3 statistical criteria, the prediction results of the 4 ecological niche models were either excellent or outstanding, in which Maxent outperformed the others in 3 aspects of predicting current distribution habitats, selecting major contribution factors, and quantifying the influence of environmental variables on habitats. Distance to water, elevation, frequency of precipitation and solar radiation were 4 environmental forcing factors. The method suggested in the paper can have some reference meaning for modeling habitats of alien species in China and provide a direction to prevent Mytilopsis sallei on the Chinese coast line.

  17. A method to calculate fission-fragment yields Y(Z,N) versus proton and neutron number in the Brownian shape-motion model

    DOE PAGES

    Moller, Peter; Ichikawa, Takatoshi

    2015-12-23

    In this study, we propose a method to calculate the two-dimensional (2D) fission-fragment yield Y(Z,N) versus both proton and neutron number, with inclusion of odd-even staggering effects in both variables. The approach is to use the Brownian shape-motion on a macroscopic-microscopic potential-energy surface which, for a particular compound system is calculated versus four shape variables: elongation (quadrupole moment Q 2), neck d, left nascent fragment spheroidal deformation ϵ f1, right nascent fragment deformation ϵ f2 and two asymmetry variables, namely proton and neutron numbers in each of the two fragments. The extension of previous models 1) introduces a method tomore » calculate this generalized potential-energy function and 2) allows the correlated transfer of nucleon pairs in one step, in addition to sequential transfer. In the previous version the potential energy was calculated as a function of Z and N of the compound system and its shape, including the asymmetry of the shape. We outline here how to generalize the model from the “compound-system” model to a model where the emerging fragment proton and neutron numbers also enter, over and above the compound system composition.« less

  18. Quantitative Characterization of Spurious Gibbs Waves in 45 CMIP5 Models

    NASA Astrophysics Data System (ADS)

    Geil, K. L.; Zeng, X.

    2014-12-01

    Gibbs oscillations appear in global climate models when representing fields, such as orography, that contain discontinuities or sharp gradients. It has been known for decades that the oscillations are associated with the transformation of the truncated spectral representation of a field to physical space and that the oscillations can also be present in global models that do not use spectral methods. The spurious oscillations are potentially detrimental to model simulations (e.g., over ocean) and this work provides a quantitative characterization of the Gibbs oscillations that appear across the Coupled Model Intercomparison Project Phase 5 (CMIP5) models. An ocean transect running through the South Pacific High toward the Andes is used to characterize the oscillations in ten different variables. These oscillations are found to be stationary and hence are not caused by (physical) waves in the atmosphere. We quantify the oscillation amplitude using the root mean square difference (RMSD) between the transect of a variable and its running mean (rather than the constant mean across the transect). We also compute the RMSD to interannual variability (IAV) ratio, which provides a relative measure of the oscillation amplitude. Of the variables examined, the largest RMSD values exist in the surface pressure field of spectral models, while the smallest RMSD values within the surface pressure field come from models that use finite difference (FD) techniques. Many spectral models have a surface pressure RMSD that is 2 to 15 times greater than IAV over the transect and an RMSD:IAV ratio greater than one for many other variables including surface temperature, incoming shortwave radiation at the surface, incoming longwave radiation at the surface, and total cloud fraction. In general, the FD models out-perform the spectral models, but not all the spectral models have large amplitude oscillations and there are a few FD models where the oscillations do appear. Finally, we present a brief comparison of the numerical methods of a select few models to better understand their Gibbs oscillations.

  19. Dynamics of Longitudinal Impact in the Variable Cross-Section Rods

    NASA Astrophysics Data System (ADS)

    Stepanov, R.; Romenskyi, D.; Tsarenko, S.

    2018-03-01

    Dynamics of longitudinal impact in rods of variable cross-section is considered. Rods of various configurations are used as elements of power pulse systems. There is no single method to the construction of a mathematical model of longitudinal impact on rods. The creation of a general method for constructing a mathematical model of longitudinal impact for rods of variable cross-section is the goal of the article. An elastic rod is considered with a cross-sectional area varying in powers of law from the longitudinal coordinate. The solution of the wave equation is obtained using the Fourier method. Special functions are introduced on the basis of recurrence relations for Bessel functions for solving boundary value problems. The expression for the square of the norm is obtained taking into account the orthogonality property of the eigen functions with weight. For example, the impact of an inelastic mass along the wide end of a conical rod is considered. The expressions for the displacements, forces and stresses of the rod sections are obtained for the cases of sudden velocity communication and the application of force. The proposed mathematical model makes it possible to carry out investigations of the stress-strain state in rods of variable and constant cross-section for various conditions of dynamic effects.

  20. Comparison of two methods for estimating base flow in selected reaches of the South Platte River, Colorado

    USGS Publications Warehouse

    Capesius, Joseph P.; Arnold, L. Rick

    2012-01-01

    The Mass Balance results were quite variable over time such that they appeared suspect with respect to the concept of groundwater flow as being gradual and slow. The large degree of variability in the day-to-day and month-to-month Mass Balance results is likely the result of many factors. These factors could include ungaged stream inflows or outflows, short-term streamflow losses to and gains from temporary bank storage, and any lag in streamflow accounting owing to streamflow lag time of flow within a reach. The Pilot Point time series results were much less variable than the Mass Balance results and extreme values were effectively constrained. Less day-to-day variability, smaller magnitude extreme values, and smoother transitions in base-flow estimates provided by the Pilot Point method are more consistent with a conceptual model of groundwater flow being gradual and slow. The Pilot Point method provided a better fit to the conceptual model of groundwater flow and appeared to provide reasonable estimates of base flow.

  1. Assessment of WENO-extended two-fluid modelling in compressible multiphase flows

    NASA Astrophysics Data System (ADS)

    Kitamura, Keiichi; Nonomura, Taku

    2017-03-01

    The two-fluid modelling based on an advection-upwind-splitting-method (AUSM)-family numerical flux function, AUSM+-up, following the work by Chang and Liou [Journal of Computational Physics 2007;225: 840-873], has been successfully extended to the fifth order by weighted-essentially-non-oscillatory (WENO) schemes. Then its performance is surveyed in several numerical tests. The results showed a desired performance in one-dimensional benchmark test problems: Without relying upon an anti-diffusion device, the higher-order two-fluid method captures the phase interface within a fewer grid points than the conventional second-order method, as well as a rarefaction wave and a very weak shock. At a high pressure ratio (e.g. 1,000), the interpolated variables appeared to affect the performance: the conservative-variable-based characteristic-wise WENO interpolation showed less sharper but more robust representations of the shocks and expansions than the primitive-variable-based counterpart did. In two-dimensional shock/droplet test case, however, only the primitive-variable-based WENO with a huge void fraction realised a stable computation.

  2. Latent variable method for automatic adaptation to background states in motor imagery BCI

    NASA Astrophysics Data System (ADS)

    Dagaev, Nikolay; Volkova, Ksenia; Ossadtchi, Alexei

    2018-02-01

    Objective. Brain-computer interface (BCI) systems are known to be vulnerable to variabilities in background states of a user. Usually, no detailed information on these states is available even during the training stage. Thus there is a need in a method which is capable of taking background states into account in an unsupervised way. Approach. We propose a latent variable method that is based on a probabilistic model with a discrete latent variable. In order to estimate the model’s parameters, we suggest to use the expectation maximization algorithm. The proposed method is aimed at assessing characteristics of background states without any corresponding data labeling. In the context of asynchronous motor imagery paradigm, we applied this method to the real data from twelve able-bodied subjects with open/closed eyes serving as background states. Main results. We found that the latent variable method improved classification of target states compared to the baseline method (in seven of twelve subjects). In addition, we found that our method was also capable of background states recognition (in six of twelve subjects). Significance. Without any supervised information on background states, the latent variable method provides a way to improve classification in BCI by taking background states into account at the training stage and then by making decisions on target states weighted by posterior probabilities of background states at the prediction stage.

  3. Linking landscape characteristics to local grizzly bear abundance using multiple detection methods in a hierarchical model

    USGS Publications Warehouse

    Graves, T.A.; Kendall, Katherine C.; Royle, J. Andrew; Stetz, J.B.; Macleod, A.C.

    2011-01-01

    Few studies link habitat to grizzly bear Ursus arctos abundance and these have not accounted for the variation in detection or spatial autocorrelation. We collected and genotyped bear hair in and around Glacier National Park in northwestern Montana during the summer of 2000. We developed a hierarchical Markov chain Monte Carlo model that extends the existing occupancy and count models by accounting for (1) spatially explicit variables that we hypothesized might influence abundance; (2) separate sub-models of detection probability for two distinct sampling methods (hair traps and rub trees) targeting different segments of the population; (3) covariates to explain variation in each sub-model of detection; (4) a conditional autoregressive term to account for spatial autocorrelation; (5) weights to identify most important variables. Road density and per cent mesic habitat best explained variation in female grizzly bear abundance; spatial autocorrelation was not supported. More female bears were predicted in places with lower road density and with more mesic habitat. Detection rates of females increased with rub tree sampling effort. Road density best explained variation in male grizzly bear abundance and spatial autocorrelation was supported. More male bears were predicted in areas of low road density. Detection rates of males increased with rub tree and hair trap sampling effort and decreased over the sampling period. We provide a new method to (1) incorporate multiple detection methods into hierarchical models of abundance; (2) determine whether spatial autocorrelation should be included in final models. Our results suggest that the influence of landscape variables is consistent between habitat selection and abundance in this system.

  4. Computer program for simulation of variable recharge with the U. S. Geological Survey modular finite-difference ground-water flow model (MODFLOW)

    USGS Publications Warehouse

    Kontis, A.L.

    2001-01-01

    The Variable-Recharge Package is a computerized method designed for use with the U.S. Geological Survey three-dimensional finitedifference ground-water flow model (MODFLOW-88) to simulate areal recharge to an aquifer. It is suitable for simulations of aquifers in which the relation between ground-water levels and land surface can affect the amount and distribution of recharge. The method is based on the premise that recharge to an aquifer cannot occur where the water level is at or above land surface. Consequently, recharge will vary spatially in simulations in which the Variable- Recharge Package is applied, if the water levels are sufficiently high. The input data required by the program for each model cell that can potentially receive recharge includes the average land-surface elevation and a quantity termed ?water available for recharge,? which is equal to precipitation minus evapotranspiration. The Variable-Recharge Package also can be used to simulate recharge to a valley-fill aquifer in which the valley fill and the adjoining uplands are explicitly simulated. Valley-fill aquifers, which are the most common type of aquifer in the glaciated northeastern United States, receive much of their recharge from upland sources as channeled and(or) unchanneled surface runoff and as lateral ground-water flow. Surface runoff in the uplands is generated in the model when the applied water available for recharge is rejected because simulated water levels are at or above land surface. The surface runoff can be distributed to other parts of the model by (1) applying the amount of the surface runoff that flows to upland streams (channeled runoff) to explicitly simulated streams that flow onto the valley floor, and(or) (2) applying the amount that flows downslope toward the valley- fill aquifer (unchanneled runoff) to specified model cells, typically those near the valley wall. An example model of an idealized valley- fill aquifer is presented to demonstrate application of the method and the type of information that can be derived from its use. Documentation of the Variable-Recharge Package is provided in the appendixes and includes listings of model code and of program variables. Comment statements in the program listings provide a narrative of the code. Input-data instructions and printed model output for the package are included.

  5. Generalized Full-Information Item Bifactor Analysis

    PubMed Central

    Cai, Li; Yang, Ji Seung; Hansen, Mark

    2011-01-01

    Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than one group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker’s (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood only requires two-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. PMID:21534682

  6. Geomatic methods at the service of water resources modelling

    NASA Astrophysics Data System (ADS)

    Molina, José-Luis; Rodríguez-Gonzálvez, Pablo; Molina, Mª Carmen; González-Aguilera, Diego; Espejo, Fernando

    2014-02-01

    Acquisition, management and/or use of spatial information are crucial for the quality of water resources studies. In this sense, several geomatic methods arise at the service of water modelling, aiming the generation of cartographic products, especially in terms of 3D models and orthophotos. They may also perform as tools for problem solving and decision making. However, choosing the right geomatic method is still a challenge in this field. That is mostly due to the complexity of the different applications and variables involved for water resources management. This study is aimed to provide a guide to best practices in this context by tackling a deep review of geomatic methods and their suitability assessment for the following study types: Surface Hydrology, Groundwater Hydrology, Hydraulics, Agronomy, Morphodynamics and Geotechnical Processes. This assessment is driven by several decision variables grouped in two categories, classified depending on their nature as geometric or radiometric. As a result, the reader comes with the best choice/choices for the method to use, depending on the type of water resources modelling study in hand.

  7. Realistic simulated MRI and SPECT databases. Application to SPECT/MRI registration evaluation.

    PubMed

    Aubert-Broche, Berengere; Grova, Christophe; Reilhac, Anthonin; Evans, Alan C; Collins, D Louis

    2006-01-01

    This paper describes the construction of simulated SPECT and MRI databases that account for realistic anatomical and functional variability. The data is used as a gold-standard to evaluate four SPECT/MRI similarity-based registration methods. Simulation realism was accounted for using accurate physical models of data generation and acquisition. MRI and SPECT simulations were generated from three subjects to take into account inter-subject anatomical variability. Functional SPECT data were computed from six functional models of brain perfusion. Previous models of normal perfusion and ictal perfusion observed in Mesial Temporal Lobe Epilepsy (MTLE) were considered to generate functional variability. We studied the impact noise and intensity non-uniformity in MRI simulations and SPECT scatter correction may have on registration accuracy. We quantified the amount of registration error caused by anatomical and functional variability. Registration involving ictal data was less accurate than registration involving normal data. MR intensity nonuniformity was the main factor decreasing registration accuracy. The proposed simulated database is promising to evaluate many functional neuroimaging methods, involving MRI and SPECT data.

  8. New formulation feed method in tariff model of solar PV in Indonesia

    NASA Astrophysics Data System (ADS)

    Djamal, Muchlishah Hadi; Setiawan, Eko Adhi; Setiawan, Aiman

    2017-03-01

    Geographically, Indonesia has 18 latitudes that correlated strongly with the potential of solar radiation for the implementation of solar photovoltaic (PV) technologies. This is becoming the basis assumption to develop a proportional model of Feed In Tariff (FIT), consequently the FIT will be vary, according to the various of latitudes in Indonesia. This paper proposed a new formulation of solar PV FIT based on the potential of solar radiation and some independent variables such as latitude, longitude, Levelized Cost of Electricity (LCOE), and also socio-economic. The Principal Component Regression (PCR) method is used to analyzed the correlation of six independent variables C1-C6 then three models of FIT are presented. Model FIT-2 is chosen because it has a small residual value and has higher financial benefit compared to the other models. This study reveals the value of variable FIT associated with solar energy potential in each region, can reduce the total FIT to be paid by the state around 80 billion rupiahs in 10 years of 1 MW photovoltaic operation at each 34 provinces in Indonesia.

  9. An Overview of Prognosis Health Management Research at Glenn Research Center for Gas Turbine Engine Structures With Special Emphasis on Deformation and Damage Modeling

    NASA Technical Reports Server (NTRS)

    Arnold, Steven M.; Goldberg, Robert K.; Lerch, Bradley A.; Saleeb, Atef F.

    2009-01-01

    Herein a general, multimechanism, physics-based viscoelastoplastic model is presented in the context of an integrated diagnosis and prognosis methodology which is proposed for structural health monitoring, with particular applicability to gas turbine engine structures. In this methodology, diagnostics and prognostics will be linked through state awareness variable(s). Key technologies which comprise the proposed integrated approach include (1) diagnostic/detection methodology, (2) prognosis/lifing methodology, (3) diagnostic/prognosis linkage, (4) experimental validation, and (5) material data information management system. A specific prognosis lifing methodology, experimental characterization and validation and data information management are the focal point of current activities being pursued within this integrated approach. The prognostic lifing methodology is based on an advanced multimechanism viscoelastoplastic model which accounts for both stiffness and/or strength reduction damage variables. Methods to characterize both the reversible and irreversible portions of the model are discussed. Once the multiscale model is validated the intent is to link it to appropriate diagnostic methods to provide a full-featured structural health monitoring system.

  10. An Overview of Prognosis Health Management Research at GRC for Gas Turbine Engine Structures With Special Emphasis on Deformation and Damage Modeling

    NASA Technical Reports Server (NTRS)

    Arnold, Steven M.; Goldberg, Robert K.; Lerch, Bradley A.; Saleeb, Atef F.

    2009-01-01

    Herein a general, multimechanism, physics-based viscoelastoplastic model is presented in the context of an integrated diagnosis and prognosis methodology which is proposed for structural health monitoring, with particular applicability to gas turbine engine structures. In this methodology, diagnostics and prognostics will be linked through state awareness variable(s). Key technologies which comprise the proposed integrated approach include 1) diagnostic/detection methodology, 2) prognosis/lifing methodology, 3) diagnostic/prognosis linkage, 4) experimental validation and 5) material data information management system. A specific prognosis lifing methodology, experimental characterization and validation and data information management are the focal point of current activities being pursued within this integrated approach. The prognostic lifing methodology is based on an advanced multi-mechanism viscoelastoplastic model which accounts for both stiffness and/or strength reduction damage variables. Methods to characterize both the reversible and irreversible portions of the model are discussed. Once the multiscale model is validated the intent is to link it to appropriate diagnostic methods to provide a full-featured structural health monitoring system.

  11. Online Estimation of Model Parameters of Lithium-Ion Battery Using the Cubature Kalman Filter

    NASA Astrophysics Data System (ADS)

    Tian, Yong; Yan, Rusheng; Tian, Jindong; Zhou, Shijie; Hu, Chao

    2017-11-01

    Online estimation of state variables, including state-of-charge (SOC), state-of-energy (SOE) and state-of-health (SOH) is greatly crucial for the operation safety of lithium-ion battery. In order to improve estimation accuracy of these state variables, a precise battery model needs to be established. As the lithium-ion battery is a nonlinear time-varying system, the model parameters significantly vary with many factors, such as ambient temperature, discharge rate and depth of discharge, etc. This paper presents an online estimation method of model parameters for lithium-ion battery based on the cubature Kalman filter. The commonly used first-order resistor-capacitor equivalent circuit model is selected as the battery model, based on which the model parameters are estimated online. Experimental results show that the presented method can accurately track the parameters variation at different scenarios.

  12. Variable speed limit strategies analysis with link transmission model on urban expressway

    NASA Astrophysics Data System (ADS)

    Li, Shubin; Cao, Danni

    2018-02-01

    The variable speed limit (VSL) is a kind of active traffic management method. Most of the strategies are used in the expressway traffic flow control in order to ensure traffic safety. However, the urban expressway system is the main artery, carrying most traffic pressure. It has similar traffic characteristics with the expressways between cities. In this paper, the improved link transmission model (LTM) combined with VSL strategies is proposed, based on the urban expressway network. The model can simulate the movement of the vehicles and the shock wave, and well balance the relationship between the amount of calculation and accuracy. Furthermore, the optimal VSL strategy can be proposed based on the simulation method. It can provide management strategies for managers. Finally, a simple example is given to illustrate the model and method. The selected indexes are the average density, the average speed and the average flow on the traffic network in the simulation. The simulation results show that the proposed model and method are feasible. The VSL strategy can effectively alleviate traffic congestion in some cases, and greatly promote the efficiency of the transportation system.

  13. [Determination of soluble solids content in Nanfeng Mandarin by Vis/NIR spectroscopy and UVE-ICA-LS-SVM].

    PubMed

    Sun, Tong; Xu, Wen-Li; Hu, Tian; Liu, Mu-Hua

    2013-12-01

    The objective of the present research was to assess soluble solids content (SSC) of Nanfeng mandarin by visible/near infrared (Vis/NIR) spectroscopy combined with new variable selection method, simplify prediction model and improve the performance of prediction model for SSC of Nanfeng mandarin. A total of 300 Nanfeng mandarin samples were used, the numbers of Nanfeng mandarin samples in calibration, validation and prediction sets were 150, 75 and 75, respectively. Vis/NIR spectra of Nanfeng mandarin samples were acquired by a QualitySpec spectrometer in the wavelength range of 350-1000 nm. Uninformative variables elimination (UVE) was used to eliminate wavelength variables that had few information of SSC, then independent component analysis (ICA) was used to extract independent components (ICs) from spectra that eliminated uninformative wavelength variables. At last, least squares support vector machine (LS-SVM) was used to develop calibration models for SSC of Nanfeng mandarin using extracted ICs, and 75 prediction samples that had not been used for model development were used to evaluate the performance of SSC model of Nanfeng mandarin. The results indicate t hat Vis/NIR spectroscopy combinedwith UVE-ICA-LS-SVM is suitable for assessing SSC o f Nanfeng mandarin, and t he precision o f prediction ishigh. UVE--ICA is an effective method to eliminate uninformative wavelength variables, extract important spectral information, simplify prediction model and improve the performance of prediction model. The SSC model developed by UVE-ICA-LS-SVM is superior to that developed by PLS, PCA-LS-SVM or ICA-LS-SVM, and the coefficient of determination and root mean square error in calibration, validation and prediction sets were 0.978, 0.230%, 0.965, 0.301% and 0.967, 0.292%, respectively.

  14. Variable Step-Size Selection Methods for Implicit Integration Schemes

    DTIC Science & Technology

    2005-10-01

    for ρk numerically. 23 4 Examples In this section we explore this variable step-size selection method for two problems, the Lotka - Volterra model and...the Kepler problem. 4.1 The Lotka - Volterra Model For this example we consider the Lotka - Volterra model of a simple predator- prey system from...problems. Consider this variation to the Lotka - Volterra problem:   u̇ v̇   =   u2v(v − 2) v2u(1− u)   = f(u, v); t ∈ [0, 50

  15. Using decision trees to understand structure in missing data

    PubMed Central

    Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L

    2015-01-01

    Objectives Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting Data taken from employees at 3 different industrial sites in Australia. Participants 7915 observations were included. Materials and methods The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions Researchers are encouraged to use CART and BRT models to explore and understand missing data. PMID:26124509

  16. Combining exposure and effect modeling into an integrated probabilistic environmental risk assessment for nanoparticles.

    PubMed

    Jacobs, Rianne; Meesters, Johannes A J; Ter Braak, Cajo J F; van de Meent, Dik; van der Voet, Hilko

    2016-12-01

    There is a growing need for good environmental risk assessment of engineered nanoparticles (ENPs). Environmental risk assessment of ENPs has been hampered by lack of data and knowledge about ENPs, their environmental fate, and their toxicity. This leads to uncertainty in the risk assessment. To deal with uncertainty in the risk assessment effectively, probabilistic methods are advantageous. In the present study, the authors developed a method to model both the variability and the uncertainty in environmental risk assessment of ENPs. This method is based on the concentration ratio and the ratio of the exposure concentration to the critical effect concentration, both considered to be random. In this method, variability and uncertainty are modeled separately so as to allow the user to see which part of the total variation in the concentration ratio is attributable to uncertainty and which part is attributable to variability. The authors illustrate the use of the method with a simplified aquatic risk assessment of nano-titanium dioxide. The authors' method allows a more transparent risk assessment and can also direct further environmental and toxicological research to the areas in which it is most needed. Environ Toxicol Chem 2016;35:2958-2967. © 2016 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals, Inc. on behalf of SETAC. © 2016 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals, Inc. on behalf of SETAC.

  17. Detection of time delays and directional interactions based on time series from complex dynamical systems

    NASA Astrophysics Data System (ADS)

    Ma, Huanfei; Leng, Siyang; Tao, Chenyang; Ying, Xiong; Kurths, Jürgen; Lai, Ying-Cheng; Lin, Wei

    2017-07-01

    Data-based and model-free accurate identification of intrinsic time delays and directional interactions is an extremely challenging problem in complex dynamical systems and their networks reconstruction. A model-free method with new scores is proposed to be generally capable of detecting single, multiple, and distributed time delays. The method is applicable not only to mutually interacting dynamical variables but also to self-interacting variables in a time-delayed feedback loop. Validation of the method is carried out using physical, biological, and ecological models and real data sets. Especially, applying the method to air pollution data and hospital admission records of cardiovascular diseases in Hong Kong reveals the major air pollutants as a cause of the diseases and, more importantly, it uncovers a hidden time delay (about 30-40 days) in the causal influence that previous studies failed to detect. The proposed method is expected to be universally applicable to ascertaining and quantifying subtle interactions (e.g., causation) in complex systems arising from a broad range of disciplines.

  18. Patterns of Therapist Variability: Therapist Effects and the Contribution of Patient Severity and Risk

    ERIC Educational Resources Information Center

    Saxon, David; Barkham, Michael

    2012-01-01

    Objective: To investigate the size of therapist effects using multilevel modeling (MLM), to compare the outcomes of therapists identified as above and below average, and to consider how key variables--in particular patient severity and risk and therapist caseload--contribute to therapist variability and outcomes. Method: We used a large…

  19. The Construction of a Long Variable of Conceptual Development in Social Education.

    ERIC Educational Resources Information Center

    Doig, Brian

    This paper demonstrates a method for constructing long variables using items that elicit partically correct responses across ages. Long variables may be defined by students at different ages (year levels) attempting common items within a test containing other items considered to be appropriate for each age or year level. A developmental model of…

  20. A method for estimation of bias and variability of continuous gas monitor data: application to carbon monoxide monitor accuracy.

    PubMed

    Shulman, Stanley A; Smith, Jerome P

    2002-01-01

    A method is presented for the evaluation of the bias, variability, and accuracy of gas monitors. This method is based on using the parameters for the fitted response curves of the monitors. Thereby, variability between calibrations, between dates within each calibration period, and between different units can be evaluated at several different standard concentrations. By combining variability information with bias information, accuracy can be assessed. An example using carbon monoxide monitor data is provided. Although the most general statistical software required for these tasks is not available on a spreadsheet, when the same number of dates in a calibration period are evaluated for each monitor unit, the calculations can be done on a spreadsheet. An example of such calculations, together with the formulas needed for their implementation, is provided. In addition, the methods can be extended by use of appropriate statistical models and software to evaluate monitor trends within calibration periods, as well as consider the effects of other variables, such as humidity and temperature, on monitor variability and bias.

  1. Performance of nonlinear mixed effects models in the presence of informative dropout.

    PubMed

    Björnsson, Marcus A; Friberg, Lena E; Simonsson, Ulrika S H

    2015-01-01

    Informative dropout can lead to bias in statistical analyses if not handled appropriately. The objective of this simulation study was to investigate the performance of nonlinear mixed effects models with regard to bias and precision, with and without handling informative dropout. An efficacy variable and dropout depending on that efficacy variable were simulated and model parameters were reestimated, with or without including a dropout model. The Laplace and FOCE-I estimation methods in NONMEM 7, and the stochastic simulations and estimations (SSE) functionality in PsN, were used in the analysis. For the base scenario, bias was low, less than 5% for all fixed effects parameters, when a dropout model was used in the estimations. When a dropout model was not included, bias increased up to 8% for the Laplace method and up to 21% if the FOCE-I estimation method was applied. The bias increased with decreasing number of observations per subject, increasing placebo effect and increasing dropout rate, but was relatively unaffected by the number of subjects in the study. This study illustrates that ignoring informative dropout can lead to biased parameters in nonlinear mixed effects modeling, but even in cases with few observations or high dropout rate, the bias is relatively low and only translates into small effects on predictions of the underlying effect variable. A dropout model is, however, crucial in the presence of informative dropout in order to make realistic simulations of trial outcomes.

  2. Cognitive ability and risk of post-traumatic stress disorder after military deployment: an observational cohort study

    PubMed Central

    Karstoft, Karen-Inge; Vedtofte, Mia S.; Nielsen, Anni B.S.; Osler, Merete; Mortensen, Erik L.; Christensen, Gunhild T.; Andersen, Søren B.

    2017-01-01

    Background Studies of the association between pre-deployment cognitive ability and post-deployment post-traumatic stress disorder (PTSD) have shown mixed results. Aims To study the influence of pre-deployment cognitive ability on PTSD symptoms 6–8 months post-deployment in a large population while controlling for pre-deployment education and deployment-related variables. Method Study linking prospective pre-deployment conscription board data with post-deployment self-reported data in 9695 Danish Army personnel deployed to different war zones in 1997–2013. The association between pre-deployment cognitive ability and post-deployment PTSD was investigated using repeated-measure logistic regression models. Two models with cognitive ability score as the main exposure variable were created (model 1 and model 2). Model 1 was only adjusted for pre-deployment variables, while model 2 was adjusted for both pre-deployment and deployment-related variables. Results When including only variables recorded pre-deployment (cognitive ability score and educational level) and gender (model 1), all variables predicted post-deployment PTSD. When deployment-related variables were added (model 2), this was no longer the case for cognitive ability score. However, when educational level was removed from the model adjusted for deployment-related variables, the association between cognitive ability and post-deployment PTSD became significant. Conclusions Pre-deployment lower cognitive ability did not predict post-deployment PTSD independently of educational level after adjustment for deployment-related variables. Declaration of interest None. Copyright and usage © The Royal College of Psychiatrists 2017. This is an open access article distributed under the terms of the Creative Commons Non-Commercial, No Derivatives (CC BY-NC-ND) license. PMID:29163983

  3. A variable capacitance based modeling and power capability predicting method for ultracapacitor

    NASA Astrophysics Data System (ADS)

    Liu, Chang; Wang, Yujie; Chen, Zonghai; Ling, Qiang

    2018-01-01

    Methods of accurate modeling and power capability predicting for ultracapacitors are of great significance in management and application of lithium-ion battery/ultracapacitor hybrid energy storage system. To overcome the simulation error coming from constant capacitance model, an improved ultracapacitor model based on variable capacitance is proposed, where the main capacitance varies with voltage according to a piecewise linear function. A novel state-of-charge calculation approach is developed accordingly. After that, a multi-constraint power capability prediction is developed for ultracapacitor, in which a Kalman-filter-based state observer is designed for tracking ultracapacitor's real-time behavior. Finally, experimental results verify the proposed methods. The accuracy of the proposed model is verified by terminal voltage simulating results under different temperatures, and the effectiveness of the designed observer is proved by various test conditions. Additionally, the power capability prediction results of different time scales and temperatures are compared, to study their effects on ultracapacitor's power capability.

  4. [Gaussian process regression and its application in near-infrared spectroscopy analysis].

    PubMed

    Feng, Ai-Ming; Fang, Li-Min; Lin, Min

    2011-06-01

    Gaussian process (GP) is applied in the present paper as a chemometric method to explore the complicated relationship between the near infrared (NIR) spectra and ingredients. After the outliers were detected by Monte Carlo cross validation (MCCV) method and removed from dataset, different preprocessing methods, such as multiplicative scatter correction (MSC), smoothing and derivate, were tried for the best performance of the models. Furthermore, uninformative variable elimination (UVE) was introduced as a variable selection technique and the characteristic wavelengths obtained were further employed as input for modeling. A public dataset with 80 NIR spectra of corn was introduced as an example for evaluating the new algorithm. The optimal models for oil, starch and protein were obtained by the GP regression method. The performance of the final models were evaluated according to the root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP) and correlation coefficient (r). The models give good calibration ability with r values above 0.99 and the prediction ability is also satisfactory with r values higher than 0.96. The overall results demonstrate that GP algorithm is an effective chemometric method and is promising for the NIR analysis.

  5. Direct estimation and correction of bias from temporally variable non-stationary noise in a channelized Hotelling model observer.

    PubMed

    Fetterly, Kenneth A; Favazza, Christopher P

    2016-08-07

    Channelized Hotelling model observer (CHO) methods were developed to assess performance of an x-ray angiography system. The analytical methods included correction for known bias error due to finite sampling. Detectability indices ([Formula: see text]) corresponding to disk-shaped objects with diameters in the range 0.5-4 mm were calculated. Application of the CHO for variable detector target dose (DTD) in the range 6-240 nGy frame(-1) resulted in [Formula: see text] estimates which were as much as 2.9×  greater than expected of a quantum limited system. Over-estimation of [Formula: see text] was presumed to be a result of bias error due to temporally variable non-stationary noise. Statistical theory which allows for independent contributions of 'signal' from a test object (o) and temporally variable non-stationary noise (ns) was developed. The theory demonstrates that the biased [Formula: see text] is the sum of the detectability indices associated with the test object [Formula: see text] and non-stationary noise ([Formula: see text]). Given the nature of the imaging system and the experimental methods, [Formula: see text] cannot be directly determined independent of [Formula: see text]. However, methods to estimate [Formula: see text] independent of [Formula: see text] were developed. In accordance with the theory, [Formula: see text] was subtracted from experimental estimates of [Formula: see text], providing an unbiased estimate of [Formula: see text]. Estimates of [Formula: see text] exhibited trends consistent with expectations of an angiography system that is quantum limited for high DTD and compromised by detector electronic readout noise for low DTD conditions. Results suggest that these methods provide [Formula: see text] estimates which are accurate and precise for [Formula: see text]. Further, results demonstrated that the source of bias was detector electronic readout noise. In summary, this work presents theory and methods to test for the presence of bias in Hotelling model observers due to temporally variable non-stationary noise and correct this bias when the temporally variable non-stationary noise is independent and additive with respect to the test object signal.

  6. Model-Based Control of Observer Bias for the Analysis of Presence-Only Data in Ecology

    PubMed Central

    Warton, David I.; Renner, Ian W.; Ramp, Daniel

    2013-01-01

    Presence-only data, where information is available concerning species presence but not species absence, are subject to bias due to observers being more likely to visit and record sightings at some locations than others (hereafter “observer bias”). In this paper, we describe and evaluate a model-based approach to accounting for observer bias directly – by modelling presence locations as a function of known observer bias variables (such as accessibility variables) in addition to environmental variables, then conditioning on a common level of bias to make predictions of species occurrence free of such observer bias. We implement this idea using point process models with a LASSO penalty, a new presence-only method related to maximum entropy modelling, that implicitly addresses the “pseudo-absence problem” of where to locate pseudo-absences (and how many). The proposed method of bias-correction is evaluated using systematically collected presence/absence data for 62 plant species endemic to the Blue Mountains near Sydney, Australia. It is shown that modelling and controlling for observer bias significantly improves the accuracy of predictions made using presence-only data, and usually improves predictions as compared to pseudo-absence or “inventory” methods of bias correction based on absences from non-target species. Future research will consider the potential for improving the proposed bias-correction approach by estimating the observer bias simultaneously across multiple species. PMID:24260167

  7. Ascertainment-adjusted parameter estimation approach to improve robustness against misspecification of health monitoring methods

    NASA Astrophysics Data System (ADS)

    Juesas, P.; Ramasso, E.

    2016-12-01

    Condition monitoring aims at ensuring system safety which is a fundamental requirement for industrial applications and that has become an inescapable social demand. This objective is attained by instrumenting the system and developing data analytics methods such as statistical models able to turn data into relevant knowledge. One difficulty is to be able to correctly estimate the parameters of those methods based on time-series data. This paper suggests the use of the Weighted Distribution Theory together with the Expectation-Maximization algorithm to improve parameter estimation in statistical models with latent variables with an application to health monotonic under uncertainty. The improvement of estimates is made possible by incorporating uncertain and possibly noisy prior knowledge on latent variables in a sound manner. The latent variables are exploited to build a degradation model of dynamical system represented as a sequence of discrete states. Examples on Gaussian Mixture Models, Hidden Markov Models (HMM) with discrete and continuous outputs are presented on both simulated data and benchmarks using the turbofan engine datasets. A focus on the application of a discrete HMM to health monitoring under uncertainty allows to emphasize the interest of the proposed approach in presence of different operating conditions and fault modes. It is shown that the proposed model depicts high robustness in presence of noisy and uncertain prior.

  8. A waste characterisation procedure for ADM1 implementation based on degradation kinetics.

    PubMed

    Girault, R; Bridoux, G; Nauleau, F; Poullain, C; Buffet, J; Steyer, J-P; Sadowski, A G; Béline, F

    2012-09-01

    In this study, a procedure accounting for degradation kinetics was developed to split the total COD of a substrate into each input state variable required for Anaerobic Digestion Model n°1. The procedure is based on the combination of batch experimental degradation tests ("anaerobic respirometry") and numerical interpretation of the results obtained (optimisation of the ADM1 input state variable set). The effects of the main operating parameters, such as the substrate to inoculum ratio in batch experiments and the origin of the inoculum, were investigated. Combined with biochemical fractionation of the total COD of substrates, this method enabled determination of an ADM1-consistent input state variable set for each substrate with affordable identifiability. The substrate to inoculum ratio in the batch experiments and the origin of the inoculum influenced input state variables. However, based on results modelled for a CSTR fed with the substrate concerned, these effects were not significant. Indeed, if the optimal ranges of these operational parameters are respected, uncertainty in COD fractionation is mainly limited to temporal variability of the properties of the substrates. As the method is based on kinetics and is easy to implement for a wide range of substrates, it is a very promising way to numerically predict the effect of design parameters on the efficiency of an anaerobic CSTR. This method thus promotes the use of modelling for the design and optimisation of anaerobic processes. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. A reduced successive quadratic programming strategy for errors-in-variables estimation.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tjoa, I.-B.; Biegler, L. T.; Carnegie-Mellon Univ.

    Parameter estimation problems in process engineering represent a special class of nonlinear optimization problems, because the maximum likelihood structure of the objective function can be exploited. Within this class, the errors in variables method (EVM) is particularly interesting. Here we seek a weighted least-squares fit to the measurements with an underdetermined process model. Thus, both the number of variables and degrees of freedom available for optimization increase linearly with the number of data sets. Large optimization problems of this type can be particularly challenging and expensive to solve because, for general-purpose nonlinear programming (NLP) algorithms, the computational effort increases atmore » least quadratically with problem size. In this study we develop a tailored NLP strategy for EVM problems. The method is based on a reduced Hessian approach to successive quadratic programming (SQP), but with the decomposition performed separately for each data set. This leads to the elimination of all variables but the model parameters, which are determined by a QP coordination step. In this way the computational effort remains linear in the number of data sets. Moreover, unlike previous approaches to the EVM problem, global and superlinear properties of the SQP algorithm apply naturally. Also, the method directly incorporates inequality constraints on the model parameters (although not on the fitted variables). This approach is demonstrated on five example problems with up to 102 degrees of freedom. Compared to general-purpose NLP algorithms, large improvements in computational performance are observed.« less

  10. Variable viscosity and density biofilm simulations using an immersed boundary method, part II: Experimental validation and the heterogeneous rheology-IBM

    NASA Astrophysics Data System (ADS)

    Stotsky, Jay A.; Hammond, Jason F.; Pavlovsky, Leonid; Stewart, Elizabeth J.; Younger, John G.; Solomon, Michael J.; Bortz, David M.

    2016-07-01

    The goal of this work is to develop a numerical simulation that accurately captures the biomechanical response of bacterial biofilms and their associated extracellular matrix (ECM). In this, the second of a two-part effort, the primary focus is on formally presenting the heterogeneous rheology Immersed Boundary Method (hrIBM) and validating our model by comparison to experimental results. With this extension of the Immersed Boundary Method (IBM), we use the techniques originally developed in Part I ([19]) to treat biofilms as viscoelastic fluids possessing variable rheological properties anchored to a set of moving locations (i.e., the bacteria locations). In particular, we incorporate spatially continuous variable viscosity and density fields into our model. Although in [14,15], variable viscosity is used in an IBM context to model discrete viscosity changes across interfaces, to our knowledge this work and Part I are the first to apply the IBM to model a continuously variable viscosity field. We validate our modeling approach from Part I by comparing dynamic moduli and compliance moduli computed from our model to data from mechanical characterization experiments on Staphylococcus epidermidis biofilms. The experimental setup is described in [26] in which biofilms are grown and tested in a parallel plate rheometer. In order to initialize the positions of bacteria in the biofilm, experimentally obtained three dimensional coordinate data was used. One of the major conclusions of this effort is that treating the spring-like connections between bacteria as Maxwell or Zener elements provides good agreement with the mechanical characterization data. We also found that initializing the simulations with different coordinate data sets only led to small changes in the mechanical characterization results. Matlab code used to produce results in this paper will be available at https://github.com/MathBioCU/BiofilmSim.

  11. Improvements to an earth observing statistical performance model with applications to LWIR spectral variability

    NASA Astrophysics Data System (ADS)

    Zhao, Runchen; Ientilucci, Emmett J.

    2017-05-01

    Hyperspectral remote sensing systems provide spectral data composed of hundreds of narrow spectral bands. Spectral remote sensing systems can be used to identify targets, for example, without physical interaction. Often it is of interested to characterize the spectral variability of targets or objects. The purpose of this paper is to identify and characterize the LWIR spectral variability of targets based on an improved earth observing statistical performance model, known as the Forecasting and Analysis of Spectroradiometric System Performance (FASSP) model. FASSP contains three basic modules including a scene model, sensor model and a processing model. Instead of using mean surface reflectance only as input to the model, FASSP transfers user defined statistical characteristics of a scene through the image chain (i.e., from source to sensor). The radiative transfer model, MODTRAN, is used to simulate the radiative transfer based on user defined atmospheric parameters. To retrieve class emissivity and temperature statistics, or temperature / emissivity separation (TES), a LWIR atmospheric compensation method is necessary. The FASSP model has a method to transform statistics in the visible (ie., ELM) but currently does not have LWIR TES algorithm in place. This paper addresses the implementation of such a TES algorithm and its associated transformation of statistics.

  12. Automated reverse engineering of nonlinear dynamical systems

    PubMed Central

    Bongard, Josh; Lipson, Hod

    2007-01-01

    Complex nonlinear dynamics arise in many fields of science and engineering, but uncovering the underlying differential equations directly from observations poses a challenging task. The ability to symbolically model complex networked systems is key to understanding them, an open problem in many disciplines. Here we introduce for the first time a method that can automatically generate symbolic equations for a nonlinear coupled dynamical system directly from time series data. This method is applicable to any system that can be described using sets of ordinary nonlinear differential equations, and assumes that the (possibly noisy) time series of all variables are observable. Previous automated symbolic modeling approaches of coupled physical systems produced linear models or required a nonlinear model to be provided manually. The advance presented here is made possible by allowing the method to model each (possibly coupled) variable separately, intelligently perturbing and destabilizing the system to extract its less observable characteristics, and automatically simplifying the equations during modeling. We demonstrate this method on four simulated and two real systems spanning mechanics, ecology, and systems biology. Unlike numerical models, symbolic models have explanatory value, suggesting that automated “reverse engineering” approaches for model-free symbolic nonlinear system identification may play an increasing role in our ability to understand progressively more complex systems in the future. PMID:17553966

  13. Automated reverse engineering of nonlinear dynamical systems.

    PubMed

    Bongard, Josh; Lipson, Hod

    2007-06-12

    Complex nonlinear dynamics arise in many fields of science and engineering, but uncovering the underlying differential equations directly from observations poses a challenging task. The ability to symbolically model complex networked systems is key to understanding them, an open problem in many disciplines. Here we introduce for the first time a method that can automatically generate symbolic equations for a nonlinear coupled dynamical system directly from time series data. This method is applicable to any system that can be described using sets of ordinary nonlinear differential equations, and assumes that the (possibly noisy) time series of all variables are observable. Previous automated symbolic modeling approaches of coupled physical systems produced linear models or required a nonlinear model to be provided manually. The advance presented here is made possible by allowing the method to model each (possibly coupled) variable separately, intelligently perturbing and destabilizing the system to extract its less observable characteristics, and automatically simplifying the equations during modeling. We demonstrate this method on four simulated and two real systems spanning mechanics, ecology, and systems biology. Unlike numerical models, symbolic models have explanatory value, suggesting that automated "reverse engineering" approaches for model-free symbolic nonlinear system identification may play an increasing role in our ability to understand progressively more complex systems in the future.

  14. The theory and method of variable frequency directional seismic wave under the complex geologic conditions

    NASA Astrophysics Data System (ADS)

    Jiang, T.; Yue, Y.

    2017-12-01

    It is well known that the mono-frequency directional seismic wave technology can concentrate seismic waves into a beam. However, little work on the method and effect of variable frequency directional seismic wave under complex geological conditions have been done .We studied the variable frequency directional wave theory in several aspects. Firstly, we studied the relation between directional parameters and the direction of the main beam. Secondly, we analyzed the parameters that affect the beam width of main beam significantly, such as spacing of vibrator, wavelet dominant frequency, and number of vibrator. In addition, we will study different characteristics of variable frequency directional seismic wave in typical velocity models. In order to examine the propagation characteristics of directional seismic wave, we designed appropriate parameters according to the character of direction parameters, which is capable to enhance the energy of the main beam direction. Further study on directional seismic wave was discussed in the viewpoint of power spectral. The results indicate that the energy intensity of main beam direction increased 2 to 6 times for a multi-ore body velocity model. It showed us that the variable frequency directional seismic technology provided an effective way to strengthen the target signals under complex geological conditions. For concave interface model, we introduced complicated directional seismic technology which supports multiple main beams to obtain high quality data. Finally, we applied the 9-element variable frequency directional seismic wave technology to process the raw data acquired in a oil-shale exploration area. The results show that the depth of exploration increased 4 times with directional seismic wave method. Based on the above analysis, we draw the conclusion that the variable frequency directional seismic wave technology can improve the target signals of different geologic conditions and increase exploration depth with little cost. Due to inconvenience of hydraulic vibrators in complicated surface area, we suggest that the combination of high frequency portable vibrator and variable frequency directional seismic wave method is an alternative technology to increase depth of exploration or prospecting.

  15. Bayesian Inference for Functional Dynamics Exploring in fMRI Data.

    PubMed

    Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing

    2016-01-01

    This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.

  16. Binary variable multiple-model multiple imputation to address missing data mechanism uncertainty: Application to a smoking cessation trial

    PubMed Central

    Siddique, Juned; Harel, Ofer; Crespi, Catherine M.; Hedeker, Donald

    2014-01-01

    The true missing data mechanism is never known in practice. We present a method for generating multiple imputations for binary variables that formally incorporates missing data mechanism uncertainty. Imputations are generated from a distribution of imputation models rather than a single model, with the distribution reflecting subjective notions of missing data mechanism uncertainty. Parameter estimates and standard errors are obtained using rules for nested multiple imputation. Using simulation, we investigate the impact of missing data mechanism uncertainty on post-imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal smoking cessation trial where nonignorably missing data were a concern. Our method provides a simple approach for formalizing subjective notions regarding nonresponse and can be implemented using existing imputation software. PMID:24634315

  17. Model independent particle mass measurements in missing energy events at hadron colliders

    NASA Astrophysics Data System (ADS)

    Park, Myeonghun

    2011-12-01

    This dissertation describes several new kinematic methods to measure the masses of new particles in events with missing transverse energy at hadron colliders. Each method relies on the measurement of some feature (a peak or an endpoint) in the distribution of a suitable kinematic variable. The first method makes use of the "Gator" variable s min , whose peak provides a global and fully inclusive measure of the production scale of the new particles. In the early stage of the LHC, this variable can be used both as an estimator and a discriminator for new physics over the standard model backgrounds. The next method studies the invariant mass distributions of the visible decay products from a cascade decay chain and the shapes and endpoints of those distributions. Given a sufficient number of endpoint measurements, one could in principle attempt to invert and solve for the mass spectrum. However, the non-linear character of the relevant coupled quadratic equations often leads to multiple solutions. In addition, there is a combinatorial ambiguity related to the ordering of the decay products from the cascade decay chain. We propose a new set of invariant mass variables which are less sensitive to these problems. We demonstrate how the new particle mass spectrum can be extracted from the measurement of their kinematic endpoints. The remaining methods described in the dissertation are based on "transverse" invariant mass variables like the "Cambridge" transverse mass MT2, the "Sheffield" contrasverse mass MCT and their corresponding one-dimensional projections MT2⊥, M T2||, MCT⊥ , and MCT|| with respect to the upstream transverse momentum U⃗T . The main advantage of all those methods is that they can be applied to very short (single-stage) decay topologies, as well as to a subsystem of the observed event. The methods can also be generalized to the case of non-identical missing particles, as demonstrated in Chapter 7. A complete set of analytical results for the calculation of the relevant variables in each event, as well as the dependence of their endpoints on the underlying mass spectrum is given for each case. In some circumstances, the whole shape of the differential distribution can be theoretically predicted as well. The methods are illustrated with examples from supersymmetry and from top quark production in the standard model.

  18. A Selective Overview of Variable Selection in High Dimensional Feature Space

    PubMed Central

    Fan, Jianqing

    2010-01-01

    High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976

  19. When Can Categorical Variables Be Treated as Continuous? A Comparison of Robust Continuous and Categorical SEM Estimation Methods under Suboptimal Conditions

    ERIC Educational Resources Information Center

    Rhemtulla, Mijke; Brosseau-Liard, Patricia E.; Savalei, Victoria

    2012-01-01

    A simulation study compared the performance of robust normal theory maximum likelihood (ML) and robust categorical least squares (cat-LS) methodology for estimating confirmatory factor analysis models with ordinal variables. Data were generated from 2 models with 2-7 categories, 4 sample sizes, 2 latent distributions, and 5 patterns of category…

  20. A coherent discrete variable representation method on a sphere

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, Hua -Gen

    Here, the coherent discrete variable representation (ZDVR) has been extended for construct- ing a multidimensional potential-optimized DVR basis on a sphere. In order to deal with the non-constant Jacobian in spherical angles, two direct product primitive basis methods are proposed so that the original ZDVR technique can be properly implemented. The method has been demonstrated by computing the lowest states of a two dimensional (2D) vibrational model. Results show that the extended ZDVR method gives accurate eigenval- ues and exponential convergence with increasing ZDVR basis size.

  1. A coherent discrete variable representation method on a sphere

    DOE PAGES

    Yu, Hua -Gen

    2017-09-05

    Here, the coherent discrete variable representation (ZDVR) has been extended for construct- ing a multidimensional potential-optimized DVR basis on a sphere. In order to deal with the non-constant Jacobian in spherical angles, two direct product primitive basis methods are proposed so that the original ZDVR technique can be properly implemented. The method has been demonstrated by computing the lowest states of a two dimensional (2D) vibrational model. Results show that the extended ZDVR method gives accurate eigenval- ues and exponential convergence with increasing ZDVR basis size.

  2. Multisite-multivariable sensitivity analysis of distributed watershed models: enhancing the perceptions from computationally frugal methods

    USDA-ARS?s Scientific Manuscript database

    This paper assesses the impact of different likelihood functions in identifying sensitive parameters of the highly parameterized, spatially distributed Soil and Water Assessment Tool (SWAT) watershed model for multiple variables at multiple sites. The global one-factor-at-a-time (OAT) method of Morr...

  3. A Correlation-Based Transition Model using Local Variables. Part 1; Model Formation

    NASA Technical Reports Server (NTRS)

    Menter, F. R.; Langtry, R. B.; Likki, S. R.; Suzen, Y. B.; Huang, P. G.; Volker, S.

    2006-01-01

    A new correlation-based transition model has been developed, which is based strictly on local variables. As a result, the transition model is compatible with modern computational fluid dynamics (CFD) approaches, such as unstructured grids and massive parallel execution. The model is based on two transport equations, one for intermittency and one for the transition onset criteria in terms of momentum thickness Reynolds number. The proposed transport equations do not attempt to model the physics of the transition process (unlike, e.g., turbulence models) but from a framework for the implementation of correlation-based models into general-purpose CFD methods.

  4. Analysis of Genome-Wide Association Studies with Multiple Outcomes Using Penalization

    PubMed Central

    Liu, Jin; Huang, Jian; Ma, Shuangge

    2012-01-01

    Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods. PMID:23272092

  5. A Two-moment Radiation Hydrodynamics Module in ATHENA Using a Godunov Method

    NASA Astrophysics Data System (ADS)

    Skinner, M. A.; Ostriker, E. C.

    2013-04-01

    We describe a module for the Athena code that solves the grey equations of radiation hydrodynamics (RHD) using a local variable Eddington tensor (VET) based on the M1 closure of the two-moment hierarchy of the transfer equation. The variables are updated via a combination of explicit Godunov methods to advance the gas and radiation variables including the non-stiff source terms, and a local implicit method to integrate the stiff source terms. We employ the reduced speed of light approximation (RSLA) with subcycling of the radiation variables in order to reduce computational costs. The streaming and diffusion limits are well-described by the M1 closure model, and our implementation shows excellent behavior for problems containing both regimes simultaneously. Our operator-split method is ideally suited for problems with a slowly-varying radiation field and dynamical gas flows, in which the effect of the RSLA is minimal.

  6. An 'Observational Large Ensemble' to compare observed and modeled temperature trend uncertainty due to internal variability.

    NASA Astrophysics Data System (ADS)

    Poppick, A. N.; McKinnon, K. A.; Dunn-Sigouin, E.; Deser, C.

    2017-12-01

    Initial condition climate model ensembles suggest that regional temperature trends can be highly variable on decadal timescales due to characteristics of internal climate variability. Accounting for trend uncertainty due to internal variability is therefore necessary to contextualize recent observed temperature changes. However, while the variability of trends in a climate model ensemble can be evaluated directly (as the spread across ensemble members), internal variability simulated by a climate model may be inconsistent with observations. Observation-based methods for assessing the role of internal variability on trend uncertainty are therefore required. Here, we use a statistical resampling approach to assess trend uncertainty due to internal variability in historical 50-year (1966-2015) winter near-surface air temperature trends over North America. We compare this estimate of trend uncertainty to simulated trend variability in the NCAR CESM1 Large Ensemble (LENS), finding that uncertainty in wintertime temperature trends over North America due to internal variability is largely overestimated by CESM1, on average by a factor of 32%. Our observation-based resampling approach is combined with the forced signal from LENS to produce an 'Observational Large Ensemble' (OLENS). The members of OLENS indicate a range of spatially coherent fields of temperature trends resulting from different sequences of internal variability consistent with observations. The smaller trend variability in OLENS suggests that uncertainty in the historical climate change signal in observations due to internal variability is less than suggested by LENS.

  7. An Open Source modular platform for hydrological model implementation

    NASA Astrophysics Data System (ADS)

    Kolberg, Sjur; Bruland, Oddbjørn

    2010-05-01

    An implementation framework for setup and evaluation of spatio-temporal models is developed, forming a highly modularized distributed model system. The ENKI framework allows building space-time models for hydrological or other environmental purposes, from a suite of separately compiled subroutine modules. The approach makes it easy for students, researchers and other model developers to implement, exchange, and test single routines in a fixed framework. The open-source license and modular design of ENKI will also facilitate rapid dissemination of new methods to institutions engaged in operational hydropower forecasting or other water resource management. Written in C++, ENKI uses a plug-in structure to build a complete model from separately compiled subroutine implementations. These modules contain very little code apart from the core process simulation, and are compiled as dynamic-link libraries (dll). A narrow interface allows the main executable to recognise the number and type of the different variables in each routine. The framework then exposes these variables to the user within the proper context, ensuring that time series exist for input variables, initialisation for states, GIS data sets for static map data, manually or automatically calibrated values for parameters etc. ENKI is designed to meet three different levels of involvement in model construction: • Model application: Running and evaluating a given model. Regional calibration against arbitrary data using a rich suite of objective functions, including likelihood and Bayesian estimation. Uncertainty analysis directed towards input or parameter uncertainty. o Need not: Know the model's composition of subroutines, or the internal variables in the model, or the creation of method modules. • Model analysis: Link together different process methods, including parallel setup of alternative methods for solving the same task. Investigate the effect of different spatial discretization schemes. o Need not: Write or compile computer code, handle file IO for each modules, • Routine implementation and testing. Implementation of new process-simulating methods/equations, specialised objective functions or quality control routines, testing of these in an existing framework. o Need not: Implement user or model interface for the new routine, IO handling, administration of model setup and run, calibration and validation routines etc. From being developed for Norway's largest hydropower producer Statkraft, ENKI is now being turned into an Open Source project. At the time of writing, the licence and the project administration is not established. Also, it remains to port the application to other compilers and computer platforms. However, we hope that ENKI will prove useful for both academic and operational users.

  8. Exact simulation of integrate-and-fire models with exponential currents.

    PubMed

    Brette, Romain

    2007-10-01

    Neural networks can be simulated exactly using event-driven strategies, in which the algorithm advances directly from one spike to the next spike. It applies to neuron models for which we have (1) an explicit expression for the evolution of the state variables between spikes and (2) an explicit test on the state variables that predicts whether and when a spike will be emitted. In a previous work, we proposed a method that allows exact simulation of an integrate-and-fire model with exponential conductances, with the constraint of a single synaptic time constant. In this note, we propose a method, based on polynomial root finding, that applies to integrate-and-fire models with exponential currents, with possibly many different synaptic time constants. Models can include biexponential synaptic currents and spike-triggered adaptation currents.

  9. Quantifying inter- and intra-population niche variability using hierarchical bayesian stable isotope mixing models.

    PubMed

    Semmens, Brice X; Ward, Eric J; Moore, Jonathan W; Darimont, Chris T

    2009-07-09

    Variability in resource use defines the width of a trophic niche occupied by a population. Intra-population variability in resource use may occur across hierarchical levels of population structure from individuals to subpopulations. Understanding how levels of population organization contribute to population niche width is critical to ecology and evolution. Here we describe a hierarchical stable isotope mixing model that can simultaneously estimate both the prey composition of a consumer diet and the diet variability among individuals and across levels of population organization. By explicitly estimating variance components for multiple scales, the model can deconstruct the niche width of a consumer population into relevant levels of population structure. We apply this new approach to stable isotope data from a population of gray wolves from coastal British Columbia, and show support for extensive intra-population niche variability among individuals, social groups, and geographically isolated subpopulations. The analytic method we describe improves mixing models by accounting for diet variability, and improves isotope niche width analysis by quantitatively assessing the contribution of levels of organization to the niche width of a population.

  10. Moderation analysis with missing data in the predictors.

    PubMed

    Zhang, Qian; Wang, Lijuan

    2017-12-01

    The most widely used statistical model for conducting moderation analysis is the moderated multiple regression (MMR) model. In MMR modeling, missing data could pose a challenge, mainly because the interaction term is a product of two or more variables and thus is a nonlinear function of the involved variables. In this study, we consider a simple MMR model, where the effect of the focal predictor X on the outcome Y is moderated by a moderator U. The primary interest is to find ways of estimating and testing the moderation effect with the existence of missing data in X. We mainly focus on cases when X is missing completely at random (MCAR) and missing at random (MAR). Three methods are compared: (a) Normal-distribution-based maximum likelihood estimation (NML); (b) Normal-distribution-based multiple imputation (NMI); and (c) Bayesian estimation (BE). Via simulations, we found that NML and NMI could lead to biased estimates of moderation effects under MAR missingness mechanism. The BE method outperformed NMI and NML for MMR modeling with missing data in the focal predictor, missingness depending on the moderator and/or auxiliary variables, and correctly specified distributions for the focal predictor. In addition, more robust BE methods are needed in terms of the distribution mis-specification problem of the focal predictor. An empirical example was used to illustrate the applications of the methods with a simple sensitivity analysis. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  11. Weather models as virtual sensors to data-driven rainfall predictions in urban watersheds

    NASA Astrophysics Data System (ADS)

    Cozzi, Lorenzo; Galelli, Stefano; Pascal, Samuel Jolivet De Marc; Castelletti, Andrea

    2013-04-01

    Weather and climate predictions are a key element of urban hydrology where they are used to inform water management and assist in flood warning delivering. Indeed, the modelling of the very fast dynamics of urbanized catchments can be substantially improved by the use of weather/rainfall predictions. For example, in Singapore Marina Reservoir catchment runoff processes have a very short time of concentration (roughly one hour) and observational data are thus nearly useless for runoff predictions and weather prediction are required. Unfortunately, radar nowcasting methods do not allow to carrying out long - term weather predictions, whereas numerical models are limited by their coarse spatial scale. Moreover, numerical models are usually poorly reliable because of the fast motion and limited spatial extension of rainfall events. In this study we investigate the combined use of data-driven modelling techniques and weather variables observed/simulated with a numerical model as a way to improve rainfall prediction accuracy and lead time in the Singapore metropolitan area. To explore the feasibility of the approach, we use a Weather Research and Forecast (WRF) model as a virtual sensor network for the input variables (the states of the WRF model) to a machine learning rainfall prediction model. More precisely, we combine an input variable selection method and a non-parametric tree-based model to characterize the empirical relation between the rainfall measured at the catchment level and all possible weather input variables provided by WRF model. We explore different lead time to evaluate the model reliability for different long - term predictions, as well as different time lags to see how past information could improve results. Results show that the proposed approach allow a significant improvement of the prediction accuracy of the WRF model on the Singapore urban area.

  12. Predicting the graft survival for heart-lung transplantation patients: an integrated data mining methodology.

    PubMed

    Oztekin, Asil; Delen, Dursun; Kong, Zhenyu James

    2009-12-01

    Predicting the survival of heart-lung transplant patients has the potential to play a critical role in understanding and improving the matching procedure between the recipient and graft. Although voluminous data related to the transplantation procedures is being collected and stored, only a small subset of the predictive factors has been used in modeling heart-lung transplantation outcomes. The previous studies have mainly focused on applying statistical techniques to a small set of factors selected by the domain-experts in order to reveal the simple linear relationships between the factors and survival. The collection of methods known as 'data mining' offers significant advantages over conventional statistical techniques in dealing with the latter's limitations such as normality assumption of observations, independence of observations from each other, and linearity of the relationship between the observations and the output measure(s). There are statistical methods that overcome these limitations. Yet, they are computationally more expensive and do not provide fast and flexible solutions as do data mining techniques in large datasets. The main objective of this study is to improve the prediction of outcomes following combined heart-lung transplantation by proposing an integrated data-mining methodology. A large and feature-rich dataset (16,604 cases with 283 variables) is used to (1) develop machine learning based predictive models and (2) extract the most important predictive factors. Then, using three different variable selection methods, namely, (i) machine learning methods driven variables-using decision trees, neural networks, logistic regression, (ii) the literature review-based expert-defined variables, and (iii) common sense-based interaction variables, a consolidated set of factors is generated and used to develop Cox regression models for heart-lung graft survival. The predictive models' performance in terms of 10-fold cross-validation accuracy rates for two multi-imputed datasets ranged from 79% to 86% for neural networks, from 78% to 86% for logistic regression, and from 71% to 79% for decision trees. The results indicate that the proposed integrated data mining methodology using Cox hazard models better predicted the graft survival with different variables than the conventional approaches commonly used in the literature. This result is validated by the comparison of the corresponding Gains charts for our proposed methodology and the literature review based Cox results, and by the comparison of Akaike information criteria (AIC) values received from each. Data mining-based methodology proposed in this study reveals that there are undiscovered relationships (i.e. interactions of the existing variables) among the survival-related variables, which helps better predict the survival of the heart-lung transplants. It also brings a different set of variables into the scene to be evaluated by the domain-experts and be considered prior to the organ transplantation.

  13. Group Variable Selection Via Convex Log-Exp-Sum Penalty with Application to a Breast Cancer Survivor Study

    PubMed Central

    Geng, Zhigeng; Wang, Sijian; Yu, Menggang; Monahan, Patrick O.; Champion, Victoria; Wahba, Grace

    2017-01-01

    Summary In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this article, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors. PMID:25257196

  14. Mathematical models of the simplest fuzzy PI/PD controllers with skewed input and output fuzzy sets.

    PubMed

    Mohan, B M; Sinha, Arpita

    2008-07-01

    This paper unveils mathematical models for fuzzy PI/PD controllers which employ two skewed fuzzy sets for each of the two-input variables and three skewed fuzzy sets for the output variable. The basic constituents of these models are Gamma-type and L-type membership functions for each input, trapezoidal/triangular membership functions for output, intersection/algebraic product triangular norm, maximum/drastic sum triangular conorm, Mamdani minimum/Larsen product/drastic product inference method, and center of sums defuzzification method. The existing simplest fuzzy PI/PD controller structures derived via symmetrical fuzzy sets become special cases of the mathematical models revealed in this paper. Finally, a numerical example along with its simulation results are included to demonstrate the effectiveness of the simplest fuzzy PI controllers.

  15. Variable-Domain Functional Regression for Modeling ICU Data.

    PubMed

    Gellar, Jonathan E; Colantuoni, Elizabeth; Needham, Dale M; Crainiceanu, Ciprian M

    2014-12-01

    We introduce a class of scalar-on-function regression models with subject-specific functional predictor domains. The fundamental idea is to consider a bivariate functional parameter that depends both on the functional argument and on the width of the functional predictor domain. Both parametric and nonparametric models are introduced to fit the functional coefficient. The nonparametric model is theoretically and practically invariant to functional support transformation, or support registration. Methods were motivated by and applied to a study of association between daily measures of the Intensive Care Unit (ICU) Sequential Organ Failure Assessment (SOFA) score and two outcomes: in-hospital mortality, and physical impairment at hospital discharge among survivors. Methods are generally applicable to a large number of new studies that record a continuous variables over unequal domains.

  16. PREDICTING TWO-DIMENSIONAL STEADY-STATE SOIL FREEZING FRONTS USING THE CVBEM.

    USGS Publications Warehouse

    Hromadka, T.V.

    1986-01-01

    The complex variable boundary element method (CVBEM) is used instead of a real variable boundary element method due to the available modeling error evaluation techniques developed. The modeling accuracy is evaluated by the model-user in the determination of an approximative boundary upon which the CVBEM provides an exact solution. Although inhomogeneity (and anisotropy) can be included in the CVBEM model, the resulting fully populated matrix system quickly becomes large. Therefore in this paper, the domain is assumed homogeneous and isotropic except for differences in frozen and thawed conduction parameters on either side of the freezing front. The example problems presented were obtained by use of a popular 64K microcomputer (the current version of the program used in this study has the capacity to accommodate 30 nodal points).

  17. Laser system refinements to reduce variability in infarct size in the rat photothrombotic stroke model

    PubMed Central

    Alaverdashvili, Mariam; Paterson, Phyllis G.; Bradley, Michael P.

    2015-01-01

    Background The rat photothrombotic stroke model can induce brain infarcts with reasonable biological variability. Nevertheless, we observed unexplained high inter-individual variability despite using a rigorous protocol. Of the three major determinants of infarct volume, photosensitive dye concentration and illumination period were strictly controlled, whereas undetected fluctuation in laser power output was suspected to account for the variability. New method The frequently utilized Diode Pumped Solid State (DPSS) lasers emitting 532 nm (green) light can exhibit fluctuations in output power due to temperature and input power alterations. The polarization properties of the Nd:YAG and Nd:YVO4 crystals commonly used in these lasers are another potential source of fluctuation, since one means of controlling output power uses a polarizer with a variable transmission axis. Thus, the properties of DPSS lasers and the relationship between power output and infarct size were explored. Results DPSS laser beam intensity showed considerable variation. Either a polarizer or a variable neutral density filter allowed adjustment of a polarized laser beam to the desired intensity. When the beam was unpolarized, the experimenter was restricted to using a variable neutral density filter. Comparison with existing method(s) Our refined approach includes continuous monitoring of DPSS laser intensity via beam sampling using a pellicle beamsplitter and photodiode sensor. This guarantees the desired beam intensity at the targeted brain area during stroke induction, with the intensity controlled either through a polarizer or variable neutral density filter. Conclusions Continuous monitoring and control of laser beam intensity is critical for ensuring consistent infarct size. PMID:25840363

  18. POWER ANALYSIS FOR COMPLEX MEDIATIONAL DESIGNS USING MONTE CARLO METHODS

    PubMed Central

    Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.

    2013-01-01

    Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex mediational models. The approach is based on the well known technique of generating a large number of samples in a Monte Carlo study, and estimating power as the percentage of cases in which an estimate of interest is significantly different from zero. Examples of power calculation for commonly used mediational models are provided. Power analyses for the single mediator, multiple mediators, three-path mediation, mediation with latent variables, moderated mediation, and mediation in longitudinal designs are described. Annotated sample syntax for Mplus is appended and tabled values of required sample sizes are shown for some models. PMID:23935262

  19. Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.

    PubMed

    Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui

    2017-07-15

    New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier Inc.

  20. A simulation study on Bayesian Ridge regression models for several collinearity levels

    NASA Astrophysics Data System (ADS)

    Efendi, Achmad; Effrihan

    2017-12-01

    When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.

Top