Sample records for variable selection based

  1. Input variable selection for data-driven models of Coriolis flowmeters for two-phase flow measurement

    NASA Astrophysics Data System (ADS)

    Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao

    2017-03-01

    Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction.

  2. Variables selection methods in near-infrared spectroscopy.

    PubMed

    Xiaobo, Zou; Jiewen, Zhao; Povey, Malcolm J W; Holmes, Mel; Hanpin, Mao

    2010-05-14

    Near-infrared (NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields, such as the petrochemical, pharmaceutical, environmental, clinical, agricultural, food and biomedical sectors during the past 15 years. A NIR spectrum of a sample is typically measured by modern scanning instruments at hundreds of equally spaced wavelengths. The large number of spectral variables in most data sets encountered in NIR spectral chemometrics often renders the prediction of a dependent variable unreliable. Recently, considerable effort has been directed towards developing and evaluating different procedures that objectively identify variables which contribute useful information and/or eliminate variables containing mostly noise. This review focuses on the variable selection methods in NIR spectroscopy. Selection methods include some classical approaches, such as manual approach (knowledge based selection), "Univariate" and "Sequential" selection methods; sophisticated methods such as successive projections algorithm (SPA) and uninformative variable elimination (UVE), elaborate search-based strategies such as simulated annealing (SA), artificial neural networks (ANN) and genetic algorithms (GAs) and interval base algorithms such as interval partial least squares (iPLS), windows PLS and iterative PLS. Wavelength selection with B-spline, Kalman filtering, Fisher's weights and Bayesian are also mentioned. Finally, the websites of some variable selection software and toolboxes for non-commercial use are given. Copyright 2010 Elsevier B.V. All rights reserved.

  3. Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 3. Variable selection in classification.

    PubMed

    Ballabio, Davide; Consonni, Viviana; Mauri, Andrea; Todeschini, Roberto

    2010-01-11

    In multivariate regression and classification issues variable selection is an important procedure used to select an optimal subset of variables with the aim of producing more parsimonious and eventually more predictive models. Variable selection is often necessary when dealing with methodologies that produce thousands of variables, such as Quantitative Structure-Activity Relationships (QSARs) and highly dimensional analytical procedures. In this paper a novel method for variable selection for classification purposes is introduced. This method exploits the recently proposed Canonical Measure of Correlation between two sets of variables (CMC index). The CMC index is in this case calculated for two specific sets of variables, the former being comprised of the independent variables and the latter of the unfolded class matrix. The CMC values, calculated by considering one variable at a time, can be sorted and a ranking of the variables on the basis of their class discrimination capabilities results. Alternatively, CMC index can be calculated for all the possible combinations of variables and the variable subset with the maximal CMC can be selected, but this procedure is computationally more demanding and classification performance of the selected subset is not always the best one. The effectiveness of the CMC index in selecting variables with discriminative ability was compared with that of other well-known strategies for variable selection, such as the Wilks' Lambda, the VIP index based on the Partial Least Squares-Discriminant Analysis, and the selection provided by classification trees. A variable Forward Selection based on the CMC index was finally used in conjunction of Linear Discriminant Analysis. This approach was tested on several chemical data sets. Obtained results were encouraging.

  4. Variable Selection through Correlation Sifting

    NASA Astrophysics Data System (ADS)

    Huang, Jim C.; Jojic, Nebojsa

    Many applications of computational biology require a variable selection procedure to sift through a large number of input variables and select some smaller number that influence a target variable of interest. For example, in virology, only some small number of viral protein fragments influence the nature of the immune response during viral infection. Due to the large number of variables to be considered, a brute-force search for the subset of variables is in general intractable. To approximate this, methods based on ℓ1-regularized linear regression have been proposed and have been found to be particularly successful. It is well understood however that such methods fail to choose the correct subset of variables if these are highly correlated with other "decoy" variables. We present a method for sifting through sets of highly correlated variables which leads to higher accuracy in selecting the correct variables. The main innovation is a filtering step that reduces correlations among variables to be selected, making the ℓ1-regularization effective for datasets on which many methods for variable selection fail. The filtering step changes both the values of the predictor variables and output values by projections onto components obtained through a computationally-inexpensive principal components analysis. In this paper we demonstrate the usefulness of our method on synthetic datasets and on novel applications in virology. These include HIV viral load analysis based on patients' HIV sequences and immune types, as well as the analysis of seasonal variation in influenza death rates based on the regions of the influenza genome that undergo diversifying selection in the previous season.

  5. A review of covariate selection for non-experimental comparative effectiveness research.

    PubMed

    Sauer, Brian C; Brookhart, M Alan; Roy, Jason; VanderWeele, Tyler

    2013-11-01

    This paper addresses strategies for selecting variables for adjustment in non-experimental comparative effectiveness research and uses causal graphs to illustrate the causal network that relates treatment to outcome. Variables in the causal network take on multiple structural forms. Adjustment for a common cause pathway between treatment and outcome can remove confounding, whereas adjustment for other structural types may increase bias. For this reason, variable selection would ideally be based on an understanding of the causal network; however, the true causal network is rarely known. Therefore, we describe more practical variable selection approaches based on background knowledge when the causal structure is only partially known. These approaches include adjustment for all observed pretreatment variables thought to have some connection to the outcome, all known risk factors for the outcome, and all direct causes of the treatment or the outcome. Empirical approaches, such as forward and backward selection and automatic high-dimensional proxy adjustment, are also discussed. As there is a continuum between knowing and not knowing the causal, structural relations of variables, we recommend addressing variable selection in a practical way that involves a combination of background knowledge and empirical selection and that uses high-dimensional approaches. This empirical approach can be used to select from a set of a priori variables based on the researcher's knowledge to be included in the final analysis or to identify additional variables for consideration. This more limited use of empirically derived variables may reduce confounding while simultaneously reducing the risk of including variables that may increase bias. Copyright © 2013 John Wiley & Sons, Ltd.

  6. A Review of Covariate Selection for Nonexperimental Comparative Effectiveness Research

    PubMed Central

    Sauer, Brian C.; Brookhart, Alan; Roy, Jason; Vanderweele, Tyler

    2014-01-01

    This paper addresses strategies for selecting variables for adjustment in non-experimental comparative effectiveness research (CER), and uses causal graphs to illustrate the causal network that relates treatment to outcome. Variables in the causal network take on multiple structural forms. Adjustment for on a common cause pathway between treatment and outcome can remove confounding, while adjustment for other structural types may increase bias. For this reason variable selection would ideally be based on an understanding of the causal network; however, the true causal network is rarely know. Therefore, we describe more practical variable selection approaches based on background knowledge when the causal structure is only partially known. These approaches include adjustment for all observed pretreatment variables thought to have some connection to the outcome, all known risk factors for the outcome, and all direct causes of the treatment or the outcome. Empirical approaches, such as forward and backward selection and automatic high-dimensional proxy adjustment, are also discussed. As there is a continuum between knowing and not knowing the causal, structural relations of variables, we recommend addressing variable selection in a practical way that involves a combination of background knowledge and empirical selection and that uses the high-dimensional approaches. This empirical approach can be used to select from a set of a priori variables based on the researcher’s knowledge to be included in the final analysis or to identify additional variables for consideration. This more limited use of empirically-derived variables may reduce confounding while simultaneously reducing the risk of including variables that may increase bias. PMID:24006330

  7. A GPU-Based Implementation of the Firefly Algorithm for Variable Selection in Multivariate Calibration Problems

    PubMed Central

    de Paula, Lauro C. M.; Soares, Anderson S.; de Lima, Telma W.; Delbem, Alexandre C. B.; Coelho, Clarimar J.; Filho, Arlindo R. G.

    2014-01-01

    Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation. PMID:25493625

  8. A GPU-Based Implementation of the Firefly Algorithm for Variable Selection in Multivariate Calibration Problems.

    PubMed

    de Paula, Lauro C M; Soares, Anderson S; de Lima, Telma W; Delbem, Alexandre C B; Coelho, Clarimar J; Filho, Arlindo R G

    2014-01-01

    Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation.

  9. [Application of characteristic NIR variables selection in portable detection of soluble solids content of apple by near infrared spectroscopy].

    PubMed

    Fan, Shu-Xiang; Huang, Wen-Qian; Li, Jiang-Bo; Guo, Zhi-Ming; Zhaq, Chun-Jiang

    2014-10-01

    In order to detect the soluble solids content(SSC)of apple conveniently and rapidly, a ring fiber probe and a portable spectrometer were applied to obtain the spectroscopy of apple. Different wavelength variable selection methods, including unin- formative variable elimination (UVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm (GA) were pro- posed to select effective wavelength variables of the NIR spectroscopy of the SSC in apple based on PLS. The back interval LS- SVM (BiLS-SVM) and GA were used to select effective wavelength variables based on LS-SVM. Selected wavelength variables and full wavelength range were set as input variables of PLS model and LS-SVM model, respectively. The results indicated that PLS model built using GA-CARS on 50 characteristic variables selected from full-spectrum which had 1512 wavelengths achieved the optimal performance. The correlation coefficient (Rp) and root mean square error of prediction (RMSEP) for prediction sets were 0.962, 0.403°Brix respectively for SSC. The proposed method of GA-CARS could effectively simplify the portable detection model of SSC in apple based on near infrared spectroscopy and enhance the predictive precision. The study can provide a reference for the development of portable apple soluble solids content spectrometer.

  10. Robust check loss-based variable selection of high-dimensional single-index varying-coefficient model

    NASA Astrophysics Data System (ADS)

    Song, Yunquan; Lin, Lu; Jian, Ling

    2016-07-01

    Single-index varying-coefficient model is an important mathematical modeling method to model nonlinear phenomena in science and engineering. In this paper, we develop a variable selection method for high-dimensional single-index varying-coefficient models using a shrinkage idea. The proposed procedure can simultaneously select significant nonparametric components and parametric components. Under defined regularity conditions, with appropriate selection of tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Moreover, due to the robustness of the check loss function to outliers in the finite samples, our proposed variable selection method is more robust than the ones based on the least squares criterion. Finally, the method is illustrated with numerical simulations.

  11. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.

    PubMed

    Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan

    2017-01-01

    Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  12. GWASinlps: Nonlocal prior based iterative SNP selection tool for genome-wide association studies.

    PubMed

    Sanyal, Nilotpal; Lo, Min-Tzu; Kauppi, Karolina; Djurovic, Srdjan; Andreassen, Ole A; Johnson, Valen E; Chen, Chi-Hua

    2018-06-19

    Multiple marker analysis of the genome-wide association study (GWAS) data has gained ample attention in recent years. However, because of the ultra high-dimensionality of GWAS data, such analysis is challenging. Frequently used penalized regression methods often lead to large number of false positives, whereas Bayesian methods are computationally very expensive. Motivated to ameliorate these issues simultaneously, we consider the novel approach of using nonlocal priors in an iterative variable selection framework. We develop a variable selection method, named, iterative nonlocal prior based selection for GWAS, or GWASinlps, that combines, in an iterative variable selection framework, the computational efficiency of the screen-and-select approach based on some association learning and the parsimonious uncertainty quantification provided by the use of nonlocal priors. The hallmark of our method is the introduction of 'structured screen-and-select' strategy, that considers hierarchical screening, which is not only based on response-predictor associations, but also based on response-response associations, and concatenates variable selection within that hierarchy. Extensive simulation studies with SNPs having realistic linkage disequilibrium structures demonstrate the advantages of our computationally efficient method compared to several frequentist and Bayesian variable selection methods, in terms of true positive rate, false discovery rate, mean squared error, and effect size estimation error. Further, we provide empirical power analysis useful for study design. Finally, a real GWAS data application was considered with human height as phenotype. An R-package for implementing the GWASinlps method is available at https://cran.r-project.org/web/packages/GWASinlps/index.html. Supplementary data are available at Bioinformatics online.

  13. Variable selection for confounder control, flexible modeling and Collaborative Targeted Minimum Loss-based Estimation in causal inference

    PubMed Central

    Schnitzer, Mireille E.; Lok, Judith J.; Gruber, Susan

    2015-01-01

    This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low-and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios. PMID:26226129

  14. Variable Selection for Confounder Control, Flexible Modeling and Collaborative Targeted Minimum Loss-Based Estimation in Causal Inference.

    PubMed

    Schnitzer, Mireille E; Lok, Judith J; Gruber, Susan

    2016-05-01

    This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010 [27]) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low- and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios.

  15. Variable Selection in the Presence of Missing Data: Imputation-based Methods.

    PubMed

    Zhao, Yize; Long, Qi

    2017-01-01

    Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.

  16. Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have difficulties when non-informative variables (i.e., random noise) are included in the model. Furthermore, the distribution of the random noise greatly impacts the…

  17. Detection of Nitrogen Content in Rubber Leaves Using Near-Infrared (NIR) Spectroscopy with Correlation-Based Successive Projections Algorithm (SPA).

    PubMed

    Tang, Rongnian; Chen, Xupeng; Li, Chuang

    2018-05-01

    Near-infrared spectroscopy is an efficient, low-cost technology that has potential as an accurate method in detecting the nitrogen content of natural rubber leaves. Successive projections algorithm (SPA) is a widely used variable selection method for multivariate calibration, which uses projection operations to select a variable subset with minimum multi-collinearity. However, due to the fluctuation of correlation between variables, high collinearity may still exist in non-adjacent variables of subset obtained by basic SPA. Based on analysis to the correlation matrix of the spectra data, this paper proposed a correlation-based SPA (CB-SPA) to apply the successive projections algorithm in regions with consistent correlation. The result shows that CB-SPA can select variable subsets with more valuable variables and less multi-collinearity. Meanwhile, models established by the CB-SPA subset outperform basic SPA subsets in predicting nitrogen content in terms of both cross-validation and external prediction. Moreover, CB-SPA is assured to be more efficient, for the time cost in its selection procedure is one-twelfth that of the basic SPA.

  18. Input variable selection and calibration data selection for storm water quality regression models.

    PubMed

    Sun, Siao; Bertrand-Krajewski, Jean-Luc

    2013-01-01

    Storm water quality models are useful tools in storm water management. Interest has been growing in analyzing existing data for developing models for urban storm water quality evaluations. It is important to select appropriate model inputs when many candidate explanatory variables are available. Model calibration and verification are essential steps in any storm water quality modeling. This study investigates input variable selection and calibration data selection in storm water quality regression models. The two selection problems are mutually interacted. A procedure is developed in order to fulfil the two selection tasks in order. The procedure firstly selects model input variables using a cross validation method. An appropriate number of variables are identified as model inputs to ensure that a model is neither overfitted nor underfitted. Based on the model input selection results, calibration data selection is studied. Uncertainty of model performances due to calibration data selection is investigated with a random selection method. An approach using the cluster method is applied in order to enhance model calibration practice based on the principle of selecting representative data for calibration. The comparison between results from the cluster selection method and random selection shows that the former can significantly improve performances of calibrated models. It is found that the information content in calibration data is important in addition to the size of calibration data.

  19. Diversified models for portfolio selection based on uncertain semivariance

    NASA Astrophysics Data System (ADS)

    Chen, Lin; Peng, Jin; Zhang, Bo; Rosyida, Isnaini

    2017-02-01

    Since the financial markets are complex, sometimes the future security returns are represented mainly based on experts' estimations due to lack of historical data. This paper proposes a semivariance method for diversified portfolio selection, in which the security returns are given subjective to experts' estimations and depicted as uncertain variables. In the paper, three properties of the semivariance of uncertain variables are verified. Based on the concept of semivariance of uncertain variables, two types of mean-semivariance diversified models for uncertain portfolio selection are proposed. Since the models are complex, a hybrid intelligent algorithm which is based on 99-method and genetic algorithm is designed to solve the models. In this hybrid intelligent algorithm, 99-method is applied to compute the expected value and semivariance of uncertain variables, and genetic algorithm is employed to seek the best allocation plan for portfolio selection. At last, several numerical examples are presented to illustrate the modelling idea and the effectiveness of the algorithm.

  20. Classification and quantitation of milk powder by near-infrared spectroscopy and mutual information-based variable selection and partial least squares

    NASA Astrophysics Data System (ADS)

    Chen, Hui; Tan, Chao; Lin, Zan; Wu, Tong

    2018-01-01

    Milk is among the most popular nutrient source worldwide, which is of great interest due to its beneficial medicinal properties. The feasibility of the classification of milk powder samples with respect to their brands and the determination of protein concentration is investigated by NIR spectroscopy along with chemometrics. Two datasets were prepared for experiment. One contains 179 samples of four brands for classification and the other contains 30 samples for quantitative analysis. Principal component analysis (PCA) was used for exploratory analysis. Based on an effective model-independent variable selection method, i.e., minimal-redundancy maximal-relevance (MRMR), only 18 variables were selected to construct a partial least-square discriminant analysis (PLS-DA) model. On the test set, the PLS-DA model based on the selected variable set was compared with the full-spectrum PLS-DA model, both of which achieved 100% accuracy. In quantitative analysis, the partial least-square regression (PLSR) model constructed by the selected subset of 260 variables outperforms significantly the full-spectrum model. It seems that the combination of NIR spectroscopy, MRMR and PLS-DA or PLSR is a powerful tool for classifying different brands of milk and determining the protein content.

  1. Advances in variable selection methods II: Effect of variable selection method on classification of hydrologically similar watersheds in three Mid-Atlantic ecoregions

    EPA Science Inventory

    Hydrological flow predictions in ungauged and sparsely gauged watersheds use regionalization or classification of hydrologically similar watersheds to develop empirical relationships between hydrologic, climatic, and watershed variables. The watershed classifications may be based...

  2. A Variable-Selection Heuristic for K-Means Clustering.

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Cradit, J. Dennis

    2001-01-01

    Presents a variable selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. Subjected the heuristic to Monte Carlo testing across more than 2,200 datasets. Results indicate that the heuristic is extremely effective at eliminating masking variables. (SLD)

  3. Evaluation of variable selection methods for random forests and omics data sets.

    PubMed

    Degenhardt, Frauke; Seifert, Stephan; Szymczak, Silke

    2017-10-16

    Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our comparison included the Boruta algorithm, the Vita method, recurrent relative variable importance, a permutation approach and its parametric variant (Altmann) as well as recursive feature elimination (RFE). In our simulation studies, Boruta was the most powerful approach, followed closely by the Vita method. Both approaches demonstrated similar stability in variable selection, while Vita was the most robust approach under a pure null model without any predictor variables related to the outcome. In the analysis of the different experimental data sets, Vita demonstrated slightly better stability in variable selection and was less computationally intensive than Boruta.In conclusion, we recommend the Boruta and Vita approaches for the analysis of high-dimensional data sets. Vita is considerably faster than Boruta and thus more suitable for large data sets, but only Boruta can also be applied in low-dimensional settings. © The Author 2017. Published by Oxford University Press.

  4. Variable Selection for Regression Models of Percentile Flows

    NASA Astrophysics Data System (ADS)

    Fouad, G.

    2017-12-01

    Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high degree of multicollinearity, possibly illustrating the co-evolution of climatic and physiographic conditions. Given the ineffectiveness of many variables used here, future work should develop new variables that target specific processes associated with percentile flows.

  5. Model selection bias and Freedman's paradox

    USGS Publications Warehouse

    Lukacs, P.M.; Burnham, K.P.; Anderson, D.R.

    2010-01-01

    In situations where limited knowledge of a system exists and the ratio of data points to variables is small, variable selection methods can often be misleading. Freedman (Am Stat 37:152-155, 1983) demonstrated how common it is to select completely unrelated variables as highly "significant" when the number of data points is similar in magnitude to the number of variables. A new type of model averaging estimator based on model selection with Akaike's AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly "best" model from a (often large) set of models employing many predictor variables. The new model averaging estimator helps reduce these problems and provides confidence interval coverage at the nominal level while traditional stepwise selection has poor inferential properties. ?? The Institute of Statistical Mathematics, Tokyo 2009.

  6. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data

    NASA Astrophysics Data System (ADS)

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-01

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively.

  7. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data.

    PubMed

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-05

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Firmness prediction in Prunus persica 'Calrico' peaches by visible/short-wave near infrared spectroscopy and acoustic measurements using optimised linear and non-linear chemometric models.

    PubMed

    Lafuente, Victoria; Herrera, Luis J; Pérez, María del Mar; Val, Jesús; Negueruela, Ignacio

    2015-08-15

    In this work, near infrared spectroscopy (NIR) and an acoustic measure (AWETA) (two non-destructive methods) were applied in Prunus persica fruit 'Calrico' (n = 260) to predict Magness-Taylor (MT) firmness. Separate and combined use of these measures was evaluated and compared using partial least squares (PLS) and least squares support vector machine (LS-SVM) regression methods. Also, a mutual-information-based variable selection method, seeking to find the most significant variables to produce optimal accuracy of the regression models, was applied to a joint set of variables (NIR wavelengths and AWETA measure). The newly proposed combined NIR-AWETA model gave good values of the determination coefficient (R(2)) for PLS and LS-SVM methods (0.77 and 0.78, respectively), improving the reliability of MT firmness prediction in comparison with separate NIR and AWETA predictions. The three variables selected by the variable selection method (AWETA measure plus NIR wavelengths 675 and 697 nm) achieved R(2) values 0.76 and 0.77, PLS and LS-SVM. These results indicated that the proposed mutual-information-based variable selection algorithm was a powerful tool for the selection of the most relevant variables. © 2014 Society of Chemical Industry.

  9. Predictors of Start of Different Antidepressants in Patient Charts among Patients with Depression

    PubMed Central

    Kim, Hyungjin Myra; Zivin, Kara; Choe, Hae Mi; Stano, Clare M.; Ganoczy, Dara; Walters, Heather; Valenstein, Marcia

    2016-01-01

    Background In usual psychiatric care, antidepressant treatments are selected based on physician and patient preferences rather than being randomly allocated, resulting in spurious associations between these treatments and outcome studies. Objectives To identify factors recorded in electronic medical chart progress notes predictive of antidepressant selection among patients who had received a depression diagnosis. Methods This retrospective study sample consisted of 556 randomly selected Veterans Health Administration (VHA) patients diagnosed with depression from April 1, 1999 to September 30, 2004, stratified by the antidepressant agent, geographic region, gender, and year of depression cohort entry. Predictors were obtained from administrative data, and additional variables were abstracted from electronic medical chart notes in the year prior to the start of the antidepressant in five categories: clinical symptoms and diagnoses, substance use, life stressors, behavioral/ideation measures (e.g., suicide attempts), and treatments received. Multinomial logistic regression analysis was used to assess the predictors associated with different antidepressant prescribing, and adjusted relative risk ratios (RRR) are reported. Results Of the administrative data-based variables, gender, age, illicit drug abuse or dependence, and number of psychiatric medications in prior year were significantly associated with antidepressant selection. After adjusting for administrative data-based variables, sleep problems (RRR = 2.47) or marital issues (RRR = 2.64) identified in the charts were significantly associated with prescribing mirtazapine rather than sertraline; however, no other chart-based variables showed a significant association or an association with a large magnitude. Conclusion Some chart data-based variables were predictive of antidepressant selection, but we neither found many nor found them highly predictive of antidepressant selection in patients treated for depression. PMID:25943003

  10. Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

    PubMed

    Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

    2016-07-01

    We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.

  11. Quantifying Variability of Avian Colours: Are Signalling Traits More Variable?

    PubMed Central

    Delhey, Kaspar; Peters, Anne

    2008-01-01

    Background Increased variability in sexually selected ornaments, a key assumption of evolutionary theory, is thought to be maintained through condition-dependence. Condition-dependent handicap models of sexual selection predict that (a) sexually selected traits show amplified variability compared to equivalent non-sexually selected traits, and since males are usually the sexually selected sex, that (b) males are more variable than females, and (c) sexually dimorphic traits more variable than monomorphic ones. So far these predictions have only been tested for metric traits. Surprisingly, they have not been examined for bright coloration, one of the most prominent sexual traits. This omission stems from computational difficulties: different types of colours are quantified on different scales precluding the use of coefficients of variation. Methodology/Principal Findings Based on physiological models of avian colour vision we develop an index to quantify the degree of discriminable colour variation as it can be perceived by conspecifics. A comparison of variability in ornamental and non-ornamental colours in six bird species confirmed (a) that those coloured patches that are sexually selected or act as indicators of quality show increased chromatic variability. However, we found no support for (b) that males generally show higher levels of variability than females, or (c) that sexual dichromatism per se is associated with increased variability. Conclusions/Significance We show that it is currently possible to realistically estimate variability of animal colours as perceived by them, something difficult to achieve with other traits. Increased variability of known sexually-selected/quality-indicating colours in the studied species, provides support to the predictions borne from sexual selection theory but the lack of increased overall variability in males or dimorphic colours in general indicates that sexual differences might not always be shaped by similar selective forces. PMID:18301766

  12. Identification of Genes Involved in Breast Cancer Metastasis by Integrating Protein-Protein Interaction Information with Expression Data.

    PubMed

    Tian, Xin; Xin, Mingyuan; Luo, Jian; Liu, Mingyao; Jiang, Zhenran

    2017-02-01

    The selection of relevant genes for breast cancer metastasis is critical for the treatment and prognosis of cancer patients. Although much effort has been devoted to the gene selection procedures by use of different statistical analysis methods or computational techniques, the interpretation of the variables in the resulting survival models has been limited so far. This article proposes a new Random Forest (RF)-based algorithm to identify important variables highly related with breast cancer metastasis, which is based on the important scores of two variable selection algorithms, including the mean decrease Gini (MDG) criteria of Random Forest and the GeneRank algorithm with protein-protein interaction (PPI) information. The new gene selection algorithm can be called PPIRF. The improved prediction accuracy fully illustrated the reliability and high interpretability of gene list selected by the PPIRF approach.

  13. An Ensemble Successive Project Algorithm for Liquor Detection Using Near Infrared Sensor.

    PubMed

    Qu, Fangfang; Ren, Dong; Wang, Jihua; Zhang, Zhong; Lu, Na; Meng, Lei

    2016-01-11

    Spectral analysis technique based on near infrared (NIR) sensor is a powerful tool for complex information processing and high precision recognition, and it has been widely applied to quality analysis and online inspection of agricultural products. This paper proposes a new method to address the instability of small sample sizes in the successive projections algorithm (SPA) as well as the lack of association between selected variables and the analyte. The proposed method is an evaluated bootstrap ensemble SPA method (EBSPA) based on a variable evaluation index (EI) for variable selection, and is applied to the quantitative prediction of alcohol concentrations in liquor using NIR sensor. In the experiment, the proposed EBSPA with three kinds of modeling methods are established to test their performance. In addition, the proposed EBSPA combined with partial least square is compared with other state-of-the-art variable selection methods. The results show that the proposed method can solve the defects of SPA and it has the best generalization performance and stability. Furthermore, the physical meaning of the selected variables from the near infrared sensor data is clear, which can effectively reduce the variables and improve their prediction accuracy.

  14. Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology

    PubMed Central

    2012-01-01

    Background An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. Results We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. Conclusions The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge. PMID:22578440

  15. Optimization and design of ibuprofen-loaded nanostructured lipid carriers using a hybrid-design approach for ocular drug delivery

    NASA Astrophysics Data System (ADS)

    Rathod, Vishal

    The objective of the present project was to develop the Ibuprofen-loaded Nanostructured Lipid Carrier (IBU-NLCs) for topical ocular delivery based on substantial pre-formulation screening of the components and understanding the interplay between the formulation and process variables. The BCS Class II drug: Ibuprofen was selected as the model drug for the current study. IBU-NLCs were prepared by melt emulsification and ultrasonication technique. Extensive pre-formulation studies were performed to screen the lipid components (solid and liquid) based on drug's solubility and affinity as well as components compatibility. The results from DSC & XRD assisted in selecting the most suitable ratio to be utilized for future studies. DynasanRTM 114 was selected as the solid lipid & MiglyolRTM 840 was selected as the liquid lipid based on preliminary lipid screening. The ratio of 6:4 was predicted to be the best based on its crystallinity index and the thermal events. As there are many variables involved for further optimization of the formulation, a single design approach is not always adequate. A hybrid-design approach was applied by employing the Plackett Burman design (PBD) for preliminary screening of 7 critical variables, followed by Box-Behnken design (BBD), a sub-type of response surface methodology (RSM) design using 2 relatively significant variables from the former design and incorporating Surfactant/Co-surfactant ratio as the third variable. Comparatively, KolliphorRTM HS15 demonstrated lower Mean Particle Size (PS) & Polydispersity Index (PDI) and KolliphorRTM P188 resulted in Zeta Potential (ZP) < -20 mV during the surfactant screening & stability studies. Hence, Surfactant/Cosurfactant ratio was employed as the third variable to understand its synergistic effect on the response variables. We selected PS, PDI, and ZP as critical response variables in the PBD since they significantly influence the stability & performance of NLCs. Formulations prepared using BBD were further characterized and evaluated concerning PS, PDI, ZP and Entrapment Efficiency (EE) to identify the multi-factor interactions between selected formulation variables. In vitro release studies were performed using Spectra/por dialysis membrane on Franz diffusion cell and Phosphate Saline buffer (7.4 pH) as the medium. Samples for assay, EE, Loading Capacity (LC), Solubility studies & in-vitro release were filtered using Amicon 50K and analyzed via UPLC system (Waters) at a detection wavelength of 220 nm. Significant variables were selected through PBD, and the third variable was incorporated based on surfactant screening & stability studies for the next design. Assay of the BBD based formulations was found to be within 95-104% of the theoretically calculated values. Further studies were investigated for PS, PDI, ZP & EE. PS was found to be in the range of 103-194 nm with PDI ranging from 0.118 to 0.265. The ZP and EE were observed to be in the range of -22.2 to -11 mV & 90 to 98.7 % respectively. Drug release of 30% was observed from the optimized formulation in the first 6 hr of in-vitro studies, and the drug release showed a sustained release of ibuprofen thereafter over several hours. These values also confirm that the production method, and all other selected variables, effectively promoted the incorporation of ibuprofen in NLC. Quality by Design (QbD) approach was successfully implemented in developing a robust ophthalmic formulation with superior physicochemical and morphometric properties. NLCs as the nanocarrier demonstrated promising perspective for topical delivery of poorly water-soluble drugs.

  16. A decision tool for selecting trench cap designs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paige, G.B.; Stone, J.J.; Lane, L.J.

    1995-12-31

    A computer based prototype decision support system (PDSS) is being developed to assist the risk manager in selecting an appropriate trench cap design for waste disposal sites. The selection of the {open_quote}best{close_quote} design among feasible alternatives requires consideration of multiple and often conflicting objectives. The methodology used in the selection process consists of: selecting and parameterizing decision variables using data, simulation models, or expert opinion; selecting feasible trench cap design alternatives; ordering the decision variables and ranking the design alternatives. The decision model is based on multi-objective decision theory and uses a unique approach to order the decision variables andmore » rank the design alternatives. Trench cap designs are evaluated based on federal regulations, hydrologic performance, cover stability and cost. Four trench cap designs, which were monitored for a four year period at Hill Air Force Base in Utah, are used to demonstrate the application of the PDSS and evaluate the results of the decision model. The results of the PDSS, using both data and simulations, illustrate the relative advantages of each of the cap designs and which cap is the {open_quotes}best{close_quotes} alternative for a given set of criteria and a particular importance order of those decision criteria.« less

  17. Using variable rate models to identify genes under selection in sequence pairs: their validity and limitations for EST sequences.

    PubMed

    Church, Sheri A; Livingstone, Kevin; Lai, Zhao; Kozik, Alexander; Knapp, Steven J; Michelmore, Richard W; Rieseberg, Loren H

    2007-02-01

    Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes.

  18. A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling.

    PubMed

    Deng, Bai-chuan; Yun, Yong-huan; Liang, Yi-zeng; Yi, Lun-zhao

    2014-10-07

    In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.

  19. Novel harmonic regularization approach for variable selection in Cox's proportional hazards model.

    PubMed

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.

  20. Variable selection in discrete survival models including heterogeneity.

    PubMed

    Groll, Andreas; Tutz, Gerhard

    2017-04-01

    Several variable selection procedures are available for continuous time-to-event data. However, if time is measured in a discrete way and therefore many ties occur models for continuous time are inadequate. We propose penalized likelihood methods that perform efficient variable selection in discrete survival modeling with explicit modeling of the heterogeneity in the population. The method is based on a combination of ridge and lasso type penalties that are tailored to the case of discrete survival. The performance is studied in simulation studies and an application to the birth of the first child.

  1. Automatic variable selection method and a comparison for quantitative analysis in laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Duan, Fajie; Fu, Xiao; Jiang, Jiajia; Huang, Tingting; Ma, Ling; Zhang, Cong

    2018-05-01

    In this work, an automatic variable selection method for quantitative analysis of soil samples using laser-induced breakdown spectroscopy (LIBS) is proposed, which is based on full spectrum correction (FSC) and modified iterative predictor weighting-partial least squares (mIPW-PLS). The method features automatic selection without artificial processes. To illustrate the feasibility and effectiveness of the method, a comparison with genetic algorithm (GA) and successive projections algorithm (SPA) for different elements (copper, barium and chromium) detection in soil was implemented. The experimental results showed that all the three methods could accomplish variable selection effectively, among which FSC-mIPW-PLS required significantly shorter computation time (12 s approximately for 40,000 initial variables) than the others. Moreover, improved quantification models were got with variable selection approaches. The root mean square errors of prediction (RMSEP) of models utilizing the new method were 27.47 (copper), 37.15 (barium) and 39.70 (chromium) mg/kg, which showed comparable prediction effect with GA and SPA.

  2. Integrative Bayesian variable selection with gene-based informative priors for genome-wide association studies.

    PubMed

    Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei

    2014-12-10

    Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.

  3. A fast chaos-based image encryption scheme with a dynamic state variables selection mechanism

    NASA Astrophysics Data System (ADS)

    Chen, Jun-xin; Zhu, Zhi-liang; Fu, Chong; Yu, Hai; Zhang, Li-bo

    2015-03-01

    In recent years, a variety of chaos-based image cryptosystems have been investigated to meet the increasing demand for real-time secure image transmission. Most of them are based on permutation-diffusion architecture, in which permutation and diffusion are two independent procedures with fixed control parameters. This property results in two flaws. (1) At least two chaotic state variables are required for encrypting one plain pixel, in permutation and diffusion stages respectively. Chaotic state variables produced with high computation complexity are not sufficiently used. (2) The key stream solely depends on the secret key, and hence the cryptosystem is vulnerable against known/chosen-plaintext attacks. In this paper, a fast chaos-based image encryption scheme with a dynamic state variables selection mechanism is proposed to enhance the security and promote the efficiency of chaos-based image cryptosystems. Experimental simulations and extensive cryptanalysis have been carried out and the results prove the superior security and high efficiency of the scheme.

  4. Novel Harmonic Regularization Approach for Variable Selection in Cox's Proportional Hazards Model

    PubMed Central

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods. PMID:25506389

  5. Examining the Moderating Effect of Disability Status on the Relationship between Trauma Symptomatology and Select Career Variables

    ERIC Educational Resources Information Center

    Strauser, David R.; Lustig, Daniel C.; Uruk, Aye Ciftci

    2006-01-01

    In the current study, the authors examined whether the influence of trauma symptomatology on select career variables differs based on disability status. A total of 131 college students and 81 individuals with disabilities completed the "Career Thoughts Inventory," "My Vocational Situation," "Developmental Work Personality…

  6. Characterizing the Optical Variability of Bright Blazars: Variability-Based Selection of Fermi Active Galactic Nuclei

    DTIC Science & Technology

    2012-11-20

    10′. We do not apply cosmological redshift corrections here for blazar selection. Similar to the conclusions drawn from Figure 4, there is clear...effects. For example, the observed blazar characteristic damping timescale τblz,obs (after correcting for cosmological redshift) should be shortened in

  7. Rank-based estimation in the {ell}1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data.

    PubMed

    Johnson, Brent A

    2009-10-01

    We consider estimation and variable selection in the partial linear model for censored data. The partial linear model for censored data is a direct extension of the accelerated failure time model, the latter of which is a very important alternative model to the proportional hazards model. We extend rank-based lasso-type estimators to a model that may contain nonlinear effects. Variable selection in such partial linear model has direct application to high-dimensional survival analyses that attempt to adjust for clinical predictors. In the microarray setting, previous methods can adjust for other clinical predictors by assuming that clinical and gene expression data enter the model linearly in the same fashion. Here, we select important variables after adjusting for prognostic clinical variables but the clinical effects are assumed nonlinear. Our estimator is based on stratification and can be extended naturally to account for multiple nonlinear effects. We illustrate the utility of our method through simulation studies and application to the Wisconsin prognostic breast cancer data set.

  8. VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA

    PubMed Central

    Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu

    2009-01-01

    We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology. PMID:20336190

  9. Dynamic Educational e-Content Selection Using Multiple Criteria in Web-Based Personalized Learning Environments.

    ERIC Educational Resources Information Center

    Manouselis, Nikos; Sampson, Demetrios

    This paper focuses on the way a multi-criteria decision making methodology is applied in the case of agent-based selection of offered learning objects. The problem of selection is modeled as a decision making one, with the decision variables being the learner model and the learning objects' educational description. In this way, selection of…

  10. Punishment induced behavioural and neurophysiological variability reveals dopamine-dependent selection of kinematic movement parameters

    PubMed Central

    Galea, Joseph M.; Ruge, Diane; Buijink, Arthur; Bestmann, Sven; Rothwell, John C.

    2013-01-01

    Action selection describes the high-level process which selects between competing movements. In animals, behavioural variability is critical for the motor exploration required to select the action which optimizes reward and minimizes cost/punishment, and is guided by dopamine (DA). The aim of this study was to test in humans whether low-level movement parameters are affected by punishment and reward in ways similar to high-level action selection. Moreover, we addressed the proposed dependence of behavioural and neurophysiological variability on DA, and whether this may underpin the exploration of kinematic parameters. Participants performed an out-and-back index finger movement and were instructed that monetary reward and punishment were based on its maximal acceleration (MA). In fact, the feedback was not contingent on the participant’s behaviour but pre-determined. Blocks highly-biased towards punishment were associated with increased MA variability relative to blocks with either reward or without feedback. This increase in behavioural variability was positively correlated with neurophysiological variability, as measured by changes in cortico-spinal excitability with transcranial magnetic stimulation over the primary motor cortex. Following the administration of a DA-antagonist, the variability associated with punishment diminished and the correlation between behavioural and neurophysiological variability no longer existed. Similar changes in variability were not observed when participants executed a pre-determined MA, nor did DA influence resting neurophysiological variability. Thus, under conditions of punishment, DA-dependent processes influence the selection of low-level movement parameters. We propose that the enhanced behavioural variability reflects the exploration of kinematic parameters for less punishing, or conversely more rewarding, outcomes. PMID:23447607

  11. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    PubMed Central

    Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

    2014-01-01

    Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565

  12. Protein construct storage: Bayesian variable selection and prediction with mixtures.

    PubMed

    Clyde, M A; Parmigiani, G

    1998-07-01

    Determining optimal conditions for protein storage while maintaining a high level of protein activity is an important question in pharmaceutical research. A designed experiment based on a space-filling design was conducted to understand the effects of factors affecting protein storage and to establish optimal storage conditions. Different model-selection strategies to identify important factors may lead to very different answers about optimal conditions. Uncertainty about which factors are important, or model uncertainty, can be a critical issue in decision-making. We use Bayesian variable selection methods for linear models to identify important variables in the protein storage data, while accounting for model uncertainty. We also use the Bayesian framework to build predictions based on a large family of models, rather than an individual model, and to evaluate the probability that certain candidate storage conditions are optimal.

  13. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    NASA Astrophysics Data System (ADS)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  14. Talent identification and selection in elite youth football: An Australian context.

    PubMed

    O'Connor, Donna; Larkin, Paul; Mark Williams, A

    2016-10-01

    We identified the perceptual-cognitive skills and player history variables that differentiate players selected or not selected into an elite youth football (i.e. soccer) programme in Australia. A sample of elite youth male football players (n = 127) completed an adapted participation history questionnaire and video-based assessments of perceptual-cognitive skills. Following data collection, 22 of these players were offered a full-time scholarship for enrolment at an elite player residential programme. Participants selected for the scholarship programme recorded superior performance on the combined perceptual-cognitive skills tests compared to the non-selected group. There were no significant between group differences on the player history variables. Stepwise discriminant function analysis identified four predictor variables that resulted in the best categorization of selected and non-selected players (i.e. recent match-play performance, region, number of other sports participated, combined perceptual-cognitive performance). The effectiveness of the discriminant function is reflected by 93.7% of players being correctly classified, with the four variables accounting for 57.6% of the variance. Our discriminating model for selection may provide a greater understanding of the factors that influence elite youth talent selection and identification.

  15. Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs.

    PubMed

    Cao, Hongbao; Duan, Junbo; Lin, Dongdong; Shugart, Yin Yao; Calhoun, Vince; Wang, Yu-Ping

    2014-11-15

    Integrative analysis of multiple data types can take advantage of their complementary information and therefore may provide higher power to identify potential biomarkers that would be missed using individual data analysis. Due to different natures of diverse data modality, data integration is challenging. Here we address the data integration problem by developing a generalized sparse model (GSM) using weighting factors to integrate multi-modality data for biomarker selection. As an example, we applied the GSM model to a joint analysis of two types of schizophrenia data sets: 759,075 SNPs and 153,594 functional magnetic resonance imaging (fMRI) voxels in 208 subjects (92 cases/116 controls). To solve this small-sample-large-variable problem, we developed a novel sparse representation based variable selection (SRVS) algorithm, with the primary aim to identify biomarkers associated with schizophrenia. To validate the effectiveness of the selected variables, we performed multivariate classification followed by a ten-fold cross validation. We compared our proposed SRVS algorithm with an earlier sparse model based variable selection algorithm for integrated analysis. In addition, we compared with the traditional statistics method for uni-variant data analysis (Chi-squared test for SNP data and ANOVA for fMRI data). Results showed that our proposed SRVS method can identify novel biomarkers that show stronger capability in distinguishing schizophrenia patients from healthy controls. Moreover, better classification ratios were achieved using biomarkers from both types of data, suggesting the importance of integrative analysis. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Construction of nested maximin designs based on successive local enumeration and modified novel global harmony search algorithm

    NASA Astrophysics Data System (ADS)

    Yi, Jin; Li, Xinyu; Xiao, Mi; Xu, Junnan; Zhang, Lin

    2017-01-01

    Engineering design often involves different types of simulation, which results in expensive computational costs. Variable fidelity approximation-based design optimization approaches can realize effective simulation and efficiency optimization of the design space using approximation models with different levels of fidelity and have been widely used in different fields. As the foundations of variable fidelity approximation models, the selection of sample points of variable-fidelity approximation, called nested designs, is essential. In this article a novel nested maximin Latin hypercube design is constructed based on successive local enumeration and a modified novel global harmony search algorithm. In the proposed nested designs, successive local enumeration is employed to select sample points for a low-fidelity model, whereas the modified novel global harmony search algorithm is employed to select sample points for a high-fidelity model. A comparative study with multiple criteria and an engineering application are employed to verify the efficiency of the proposed nested designs approach.

  17. Fine-scale habitat modeling of a top marine predator: do prey data improve predictive capacity?

    PubMed

    Torres, Leigh G; Read, Andrew J; Halpin, Patrick

    2008-10-01

    Predators and prey assort themselves relative to each other, the availability of resources and refuges, and the temporal and spatial scale of their interaction. Predictive models of predator distributions often rely on these relationships by incorporating data on environmental variability and prey availability to determine predator habitat selection patterns. This approach to predictive modeling holds true in marine systems where observations of predators are logistically difficult, emphasizing the need for accurate models. In this paper, we ask whether including prey distribution data in fine-scale predictive models of bottlenose dolphin (Tursiops truncatus) habitat selection in Florida Bay, Florida, U.S.A., improves predictive capacity. Environmental characteristics are often used as predictor variables in habitat models of top marine predators with the assumption that they act as proxies of prey distribution. We examine the validity of this assumption by comparing the response of dolphin distribution and fish catch rates to the same environmental variables. Next, the predictive capacities of four models, with and without prey distribution data, are tested to determine whether dolphin habitat selection can be predicted without recourse to describing the distribution of their prey. The final analysis determines the accuracy of predictive maps of dolphin distribution produced by modeling areas of high fish catch based on significant environmental characteristics. We use spatial analysis and independent data sets to train and test the models. Our results indicate that, due to high habitat heterogeneity and the spatial variability of prey patches, fine-scale models of dolphin habitat selection in coastal habitats will be more successful if environmental variables are used as predictor variables of predator distributions rather than relying on prey data as explanatory variables. However, predictive modeling of prey distribution as the response variable based on environmental variability did produce high predictive performance of dolphin habitat selection, particularly foraging habitat.

  18. Variable Cycle Engine Technology Program Planning and Definition Study

    NASA Technical Reports Server (NTRS)

    Westmoreland, J. S.; Stern, A. M.

    1978-01-01

    The variable stream control engine, VSCE-502B, was selected as the base engine, with the inverted flow engine concept selected as a backup. Critical component technologies were identified, and technology programs were formulated. Several engine configurations were defined on a preliminary basis to serve as demonstration vehicles for the various technologies. The different configurations present compromises in cost, technical risk, and technology return. Plans for possible variably cycle engine technology programs were formulated by synthesizing the technology requirements with the different demonstrator configurations.

  19. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

    PubMed

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

  20. SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

    PubMed Central

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306

  1. Multivariate fault isolation of batch processes via variable selection in partial least squares discriminant analysis.

    PubMed

    Yan, Zhengbing; Kuang, Te-Hui; Yao, Yuan

    2017-09-01

    In recent years, multivariate statistical monitoring of batch processes has become a popular research topic, wherein multivariate fault isolation is an important step aiming at the identification of the faulty variables contributing most to the detected process abnormality. Although contribution plots have been commonly used in statistical fault isolation, such methods suffer from the smearing effect between correlated variables. In particular, in batch process monitoring, the high autocorrelations and cross-correlations that exist in variable trajectories make the smearing effect unavoidable. To address such a problem, a variable selection-based fault isolation method is proposed in this research, which transforms the fault isolation problem into a variable selection problem in partial least squares discriminant analysis and solves it by calculating a sparse partial least squares model. As different from the traditional methods, the proposed method emphasizes the relative importance of each process variable. Such information may help process engineers in conducting root-cause diagnosis. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  2. Purposeful Variable Selection and Stratification to Impute Missing FAST Data in Trauma Research

    PubMed Central

    Fuchs, Paul A.; del Junco, Deborah J.; Fox, Erin E.; Holcomb, John B.; Rahbar, Mohammad H.; Wade, Charles A.; Alarcon, Louis H.; Brasel, Karen J.; Bulger, Eileen M.; Cohen, Mitchell J.; Myers, John G.; Muskat, Peter; Phelan, Herb A.; Schreiber, Martin A.; Cotton, Bryan A.

    2013-01-01

    Background The Focused Assessment with Sonography for Trauma (FAST) exam is an important variable in many retrospective trauma studies. The purpose of this study was to devise an imputation method to overcome missing data for the FAST exam. Due to variability in patients’ injuries and trauma care, these data are unlikely to be missing completely at random (MCAR), raising concern for validity when analyses exclude patients with missing values. Methods Imputation was conducted under a less restrictive, more plausible missing at random (MAR) assumption. Patients with missing FAST exams had available data on alternate, clinically relevant elements that were strongly associated with FAST results in complete cases, especially when considered jointly. Subjects with missing data (32.7%) were divided into eight mutually exclusive groups based on selected variables that both described the injury and were associated with missing FAST values. Additional variables were selected within each group to classify missing FAST values as positive or negative, and correct FAST exam classification based on these variables was determined for patients with non-missing FAST values. Results Severe head/neck injury (odds ratio, OR=2.04), severe extremity injury (OR=4.03), severe abdominal injury (OR=1.94), no injury (OR=1.94), other abdominal injury (OR=0.47), other head/neck injury (OR=0.57) and other extremity injury (OR=0.45) groups had significant ORs for missing data; the other group odds ratio was not significant (OR=0.84). All 407 missing FAST values were imputed, with 109 classified as positive. Correct classification of non-missing FAST results using the alternate variables was 87.2%. Conclusions Purposeful imputation for missing FAST exams based on interactions among selected variables assessed by simple stratification may be a useful adjunct to sensitivity analysis in the evaluation of imputation strategies under different missing data mechanisms. This approach has the potential for widespread application in clinical and translational research and validation is warranted. Level of Evidence Level II Prognostic or Epidemiological PMID:23778515

  3. Advanced supersonic propulsion system technology study, phase 2

    NASA Technical Reports Server (NTRS)

    Allan, R. D.

    1975-01-01

    Variable cycle engines were identified, based on the mixed-flow low-bypass-ratio augmented turbofan cycle, which has shown excellent range capability in the AST airplane. The best mixed-flow augmented turbofan engine was selected based on range in the AST Baseline Airplane. Selected variable cycle engine features were added to this best conventional baseline engine, and the Dual-Cycle VCE and Double-Bypass VCE were defined. The conventional mixed-flow turbofan and the Double-Bypass VCE were on the subjects of engine preliminary design studies to determine mechanical feasibility, confirm weight and dimensional estimates, and identify the necessary technology considered not yet available. Critical engine components were studied and incorporated into the variable cycle engine design.

  4. Harmonize input selection for sediment transport prediction

    NASA Astrophysics Data System (ADS)

    Afan, Haitham Abdulmohsin; Keshtegar, Behrooz; Mohtar, Wan Hanna Melini Wan; El-Shafie, Ahmed

    2017-09-01

    In this paper, three modeling approaches using a Neural Network (NN), Response Surface Method (RSM) and response surface method basis Global Harmony Search (GHS) are applied to predict the daily time series suspended sediment load. Generally, the input variables for forecasting the suspended sediment load are manually selected based on the maximum correlations of input variables in the modeling approaches based on NN and RSM. The RSM is improved to select the input variables by using the errors terms of training data based on the GHS, namely as response surface method and global harmony search (RSM-GHS) modeling method. The second-order polynomial function with cross terms is applied to calibrate the time series suspended sediment load with three, four and five input variables in the proposed RSM-GHS. The linear, square and cross corrections of twenty input variables of antecedent values of suspended sediment load and water discharge are investigated to achieve the best predictions of the RSM based on the GHS method. The performances of the NN, RSM and proposed RSM-GHS including both accuracy and simplicity are compared through several comparative predicted and error statistics. The results illustrated that the proposed RSM-GHS is as uncomplicated as the RSM but performed better, where fewer errors and better correlation was observed (R = 0.95, MAE = 18.09 (ton/day), RMSE = 25.16 (ton/day)) compared to the ANN (R = 0.91, MAE = 20.17 (ton/day), RMSE = 33.09 (ton/day)) and RSM (R = 0.91, MAE = 20.06 (ton/day), RMSE = 31.92 (ton/day)) for all types of input variables.

  5. Humidity: A review and primer on atmospheric moisture and human health.

    PubMed

    Davis, Robert E; McGregor, Glenn R; Enfield, Kyle B

    2016-01-01

    Research examining associations between weather and human health frequently includes the effects of atmospheric humidity. A large number of humidity variables have been developed for numerous purposes, but little guidance is available to health researchers regarding appropriate variable selection. We examine a suite of commonly used humidity variables and summarize both the medical and biometeorological literature on associations between humidity and human health. As an example of the importance of humidity variable selection, we correlate numerous hourly humidity variables to daily respiratory syncytial virus isolates in Singapore from 1992 to 1994. Most water-vapor mass based variables (specific humidity, absolute humidity, mixing ratio, dewpoint temperature, vapor pressure) exhibit comparable correlations. Variables that include a thermal component (relative humidity, dewpoint depression, saturation vapor pressure) exhibit strong diurnality and seasonality. Humidity variable selection must be dictated by the underlying research question. Despite being the most commonly used humidity variable, relative humidity should be used sparingly and avoided in cases when the proximity to saturation is not medically relevant. Care must be taken in averaging certain humidity variables daily or seasonally to avoid statistical biasing associated with variables that are inherently diurnal through their relationship to temperature. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Characterizing the Optical Variability of Bright Blazars: Variability-based Selection of Fermi Active Galactic Nuclei

    NASA Astrophysics Data System (ADS)

    Ruan, John J.; Anderson, Scott F.; MacLeod, Chelsea L.; Becker, Andrew C.; Burnett, T. H.; Davenport, James R. A.; Ivezić, Željko; Kochanek, Christopher S.; Plotkin, Richard M.; Sesar, Branimir; Stuart, J. Scott

    2012-11-01

    We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the ~30% of γ-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability τ, and driving amplitudes on short timescales \\hat{\\sigma }. Imposing cuts on minimum τ and \\hat{\\sigma } allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of γ-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously associated optical counterparts to Fermi active galactic nuclei with E >= 88% and C = 88% in Fermi 95% confidence error ellipses having semimajor axis r < 8'. We find that the suggested radio counterpart to Fermi source 2FGL J1649.6+5238 has optical variability consistent with other γ-ray blazars and is likely to be the γ-ray source. Our results suggest that the variability of the non-thermal jet emission in blazars is stochastic in nature, with unique variability properties due to the effects of relativistic beaming. After correcting for beaming, we estimate that the characteristic timescale of blazar variability is ~3 years in the rest frame of the jet, in contrast with the ~320 day disk flux timescale observed in quasars. The variability-based selection method presented will be useful for blazar identification in time-domain optical surveys and is also a probe of jet physics.

  7. Variable selection based cotton bollworm odor spectroscopic detection

    NASA Astrophysics Data System (ADS)

    Lü, Chengxu; Gai, Shasha; Luo, Min; Zhao, Bo

    2016-10-01

    Aiming at rapid automatic pest detection based efficient and targeting pesticide application and shooting the trouble of reflectance spectral signal covered and attenuated by the solid plant, the possibility of near infrared spectroscopy (NIRS) detection on cotton bollworm odor is studied. Three cotton bollworm odor samples and 3 blank air gas samples were prepared. Different concentrations of cotton bollworm odor were prepared by mixing the above gas samples, resulting a calibration group of 62 samples and a validation group of 31 samples. Spectral collection system includes light source, optical fiber, sample chamber, spectrometer. Spectra were pretreated by baseline correction, modeled with partial least squares (PLS), and optimized by genetic algorithm (GA) and competitive adaptive reweighted sampling (CARS). Minor counts differences are found among spectra of different cotton bollworm odor concentrations. PLS model of all the variables was built presenting RMSEV of 14 and RV2 of 0.89, its theory basis is insect volatilizes specific odor, including pheromone and allelochemics, which are used for intra-specific and inter-specific communication and could be detected by NIR spectroscopy. 28 sensitive variables are selected by GA, presenting the model performance of RMSEV of 14 and RV2 of 0.90. Comparably, 8 sensitive variables are selected by CARS, presenting the model performance of RMSEV of 13 and RV2 of 0.92. CARS model employs only 1.5% variables presenting smaller error than that of all variable. Odor gas based NIR technique shows the potential for cotton bollworm detection.

  8. Sympatric speciation by sexual selection alone is unlikely.

    PubMed

    Arnegard, Matthew E; Kondrashov, Alexey S

    2004-02-01

    According to Darwin, sympatric speciation is driven by disruptive, frequency-dependent natural selection caused by competition for diverse resources. Recently, several authors have argued that disruptive sexual selection can also cause sympatric speciation. Here, we use hypergeometric phenotypic and individual-based genotypic models to explore sympatric speciation by sexual selection under a broad range of conditions. If variabilities of preference and display traits are each caused by more than one or two polymorphic loci, sympatric speciation requires rather strong sexual selection when females exert preferences for extreme male phenotypes. Under this kind of mate choice, speciation can occur only if initial distributions of preference and display are close to symmetric. Otherwise, the population rapidly loses variability. Thus, unless allele replacements at very few loci are enough for reproductive isolation, female preferences for extreme male displays are unlikely to drive sympatric speciation. By contrast, similarity-based female preferences that do not cause sexual selection are less destabilizing to the maintenance of genetic variability and may result in sympatric speciation across a broader range of initial conditions. Certain groups of African cichlids have served as the exclusive motivation for the hypothesis of sympatric speciation by sexual selection. Mate choice in these fishes appears to be driven by female preferences for extreme male phenotypes rather than similarity-based preferences, and the evolution of premating reproductive isolation commonly involves at least several genes. Therefore, differences in female preferences and male display in cichlids and other species of sympatric origin are more likely to have evolved as isolating mechanisms under disruptive natural selection.

  9. An imbalance fault detection method based on data normalization and EMD for marine current turbines.

    PubMed

    Zhang, Milu; Wang, Tianzhen; Tang, Tianhao; Benbouzid, Mohamed; Diallo, Demba

    2017-05-01

    This paper proposes an imbalance fault detection method based on data normalization and Empirical Mode Decomposition (EMD) for variable speed direct-drive Marine Current Turbine (MCT) system. The method is based on the MCT stator current under the condition of wave and turbulence. The goal of this method is to extract blade imbalance fault feature, which is concealed by the supply frequency and the environment noise. First, a Generalized Likelihood Ratio Test (GLRT) detector is developed and the monitoring variable is selected by analyzing the relationship between the variables. Then, the selected monitoring variable is converted into a time series through data normalization, which makes the imbalance fault characteristic frequency into a constant. At the end, the monitoring variable is filtered out by EMD method to eliminate the effect of turbulence. The experiments show that the proposed method is robust against turbulence through comparing the different fault severities and the different turbulence intensities. Comparison with other methods, the experimental results indicate the feasibility and efficacy of the proposed method. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  10. A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks.

    PubMed

    Mujalli, Randa Oqab; de Oña, Juan

    2011-10-01

    This study describes a method for reducing the number of variables frequently considered in modeling the severity of traffic accidents. The method's efficiency is assessed by constructing Bayesian networks (BN). It is based on a two stage selection process. Several variable selection algorithms, commonly used in data mining, are applied in order to select subsets of variables. BNs are built using the selected subsets and their performance is compared with the original BN (with all the variables) using five indicators. The BNs that improve the indicators' values are further analyzed for identifying the most significant variables (accident type, age, atmospheric factors, gender, lighting, number of injured, and occupant involved). A new BN is built using these variables, where the results of the indicators indicate, in most of the cases, a statistically significant improvement with respect to the original BN. It is possible to reduce the number of variables used to model traffic accidents injury severity through BNs without reducing the performance of the model. The study provides the safety analysts a methodology that could be used to minimize the number of variables used in order to determine efficiently the injury severity of traffic accidents without reducing the performance of the model. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. Molecular Classification Substitutes for the Prognostic Variables Stage, Age, and MYCN Status in Neuroblastoma Risk Assessment.

    PubMed

    Rosswog, Carolina; Schmidt, Rene; Oberthuer, André; Juraeva, Dilafruz; Brors, Benedikt; Engesser, Anne; Kahlert, Yvonne; Volland, Ruth; Bartenhagen, Christoph; Simon, Thorsten; Berthold, Frank; Hero, Barbara; Faldum, Andreas; Fischer, Matthias

    2017-12-01

    Current risk stratification systems for neuroblastoma patients consider clinical, histopathological, and genetic variables, and additional prognostic markers have been proposed in recent years. We here sought to select highly informative covariates in a multistep strategy based on consecutive Cox regression models, resulting in a risk score that integrates hazard ratios of prognostic variables. A cohort of 695 neuroblastoma patients was divided into a discovery set (n=75) for multigene predictor generation, a training set (n=411) for risk score development, and a validation set (n=209). Relevant prognostic variables were identified by stepwise multivariable L1-penalized least absolute shrinkage and selection operator (LASSO) Cox regression, followed by backward selection in multivariable Cox regression, and then integrated into a novel risk score. The variables stage, age, MYCN status, and two multigene predictors, NB-th24 and NB-th44, were selected as independent prognostic markers by LASSO Cox regression analysis. Following backward selection, only the multigene predictors were retained in the final model. Integration of these classifiers in a risk scoring system distinguished three patient subgroups that differed substantially in their outcome. The scoring system discriminated patients with diverging outcome in the validation cohort (5-year event-free survival, 84.9±3.4 vs 63.6±14.5 vs 31.0±5.4; P<.001), and its prognostic value was validated by multivariable analysis. We here propose a translational strategy for developing risk assessment systems based on hazard ratios of relevant prognostic variables. Our final neuroblastoma risk score comprised two multigene predictors only, supporting the notion that molecular properties of the tumor cells strongly impact clinical courses of neuroblastoma patients. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  12. System Complexity Reduction via Feature Selection

    ERIC Educational Resources Information Center

    Deng, Houtao

    2011-01-01

    This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree…

  13. Variable selection for marginal longitudinal generalized linear models.

    PubMed

    Cantoni, Eva; Flemming, Joanna Mills; Ronchetti, Elvezio

    2005-06-01

    Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this article, we propose a generalized version of Mallows's C(p) (GC(p)) suitable for use with both parametric and nonparametric models. GC(p) provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GC(p).

  14. [Measurement of Water COD Based on UV-Vis Spectroscopy Technology].

    PubMed

    Wang, Xiao-ming; Zhang, Hai-liang; Luo, Wei; Liu, Xue-mei

    2016-01-01

    Ultraviolet/visible (UV/Vis) spectroscopy technology was used to measure water COD. A total of 135 water samples were collected from Zhejiang province. Raw spectra with 3 different pretreatment methods (Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV) and 1st Derivatives were compared to determine the optimal pretreatment method for analysis. Spectral variable selection is an important strategy in spectrum modeling analysis, because it tends to parsimonious data representation and can lead to multivariate models with better performance. In order to simply calibration models, the preprocessed spectra were then used to select sensitive wavelengths by competitive adaptive reweighted sampling (CARS), Random frog and Successive Genetic Algorithm (GA) methods. Different numbers of sensitive wavelengths were selected by different variable selection methods with SNV preprocessing method. Partial least squares (PLS) was used to build models with the full spectra, and Extreme Learning Machine (ELM) was applied to build models with the selected wavelength variables. The overall results showed that ELM model performed better than PLS model, and the ELM model with the selected wavelengths based on CARS obtained the best results with the determination coefficient (R2), RMSEP and RPD were 0.82, 14.48 and 2.34 for prediction set. The results indicated that it was feasible to use UV/Vis with characteristic wavelengths which were obtained by CARS variable selection method, combined with ELM calibration could apply for the rapid and accurate determination of COD in aquaculture water. Moreover, this study laid the foundation for further implementation of online analysis of aquaculture water and rapid determination of other water quality parameters.

  15. Variables Affecting Student Motivation Based on Academic Publications

    ERIC Educational Resources Information Center

    Yilmaz, Ercan; Sahin, Mehmet; Turgut, Mehmet

    2017-01-01

    In this study, the variables having impact on the student motivation have been analyzed based on the articles, conference papers, master's theses and doctoral dissertations published in the years 2000-2017. A total of 165 research papers were selected for the research material and the data were collected through qualitative research techniques…

  16. Impact of strong selection for the PrP major gene on genetic variability of four French sheep breeds (Open Access publication)

    PubMed Central

    Palhiere, Isabelle; Brochard, Mickaël; Moazami-Goudarzi, Katayoun; Laloë, Denis; Amigues, Yves; Bed'hom, Bertrand; Neuts, Étienne; Leymarie, Cyril; Pantano, Thais; Cribiu, Edmond Paul; Bibé, Bernard; Verrier, Étienne

    2008-01-01

    Effective selection on the PrP gene has been implemented since October 2001 in all French sheep breeds. After four years, the ARR "resistant" allele frequency increased by about 35% in young males. The aim of this study was to evaluate the impact of this strong selection on genetic variability. It is focussed on four French sheep breeds and based on the comparison of two groups of 94 animals within each breed: the first group of animals was born before the selection began, and the second, 3–4 years later. Genetic variability was assessed using genealogical and molecular data (29 microsatellite markers). The expected loss of genetic variability on the PrP gene was confirmed. Moreover, among the five markers located in the PrP region, only the three closest ones were affected. The evolution of the number of alleles, heterozygote deficiency within population, expected heterozygosity and the Reynolds distances agreed with the criteria from pedigree and pointed out that neutral genetic variability was not much affected. This trend depended on breed, i.e. on their initial states (population size, PrP frequencies) and on the selection strategies for improving scrapie resistance while carrying out selection for production traits. PMID:18990357

  17. Selecting an Informative/Discriminating Multivariate Response for Inverse Prediction

    DOE PAGES

    Thomas, Edward V.; Lewis, John R.; Anderson-Cook, Christine M.; ...

    2017-11-21

    nverse prediction is important in a wide variety of scientific and engineering contexts. One might use inverse prediction to predict fundamental properties/characteristics of an object using measurements obtained from it. This can be accomplished by “inverting” parameterized forward models that relate the measurements (responses) to the properties/characteristics of interest. Sometimes forward models are science based; but often, forward models are empirically based, using the results of experimentation. For empirically-based forward models, it is important that the experiments provide a sound basis to develop accurate forward models in terms of the properties/characteristics (factors). While nature dictates the causal relationship between factorsmore » and responses, experimenters can influence control of the type, accuracy, and precision of forward models that can be constructed via selection of factors, factor levels, and the set of trials that are performed. Whether the forward models are based on science, experiments or both, researchers can influence the ability to perform inverse prediction by selecting informative response variables. By using an errors-in-variables framework for inverse prediction, this paper shows via simple analysis and examples how the capability of a multivariate response (with respect to being informative and discriminating) can vary depending on how well the various responses complement one another over the range of the factor-space of interest. Insights derived from this analysis could be useful for selecting a set of response variables among candidates in cases where the number of response variables that can be acquired is limited by difficulty, expense, and/or availability of material.« less

  18. Selecting an Informative/Discriminating Multivariate Response for Inverse Prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thomas, Edward V.; Lewis, John R.; Anderson-Cook, Christine M.

    nverse prediction is important in a wide variety of scientific and engineering contexts. One might use inverse prediction to predict fundamental properties/characteristics of an object using measurements obtained from it. This can be accomplished by “inverting” parameterized forward models that relate the measurements (responses) to the properties/characteristics of interest. Sometimes forward models are science based; but often, forward models are empirically based, using the results of experimentation. For empirically-based forward models, it is important that the experiments provide a sound basis to develop accurate forward models in terms of the properties/characteristics (factors). While nature dictates the causal relationship between factorsmore » and responses, experimenters can influence control of the type, accuracy, and precision of forward models that can be constructed via selection of factors, factor levels, and the set of trials that are performed. Whether the forward models are based on science, experiments or both, researchers can influence the ability to perform inverse prediction by selecting informative response variables. By using an errors-in-variables framework for inverse prediction, this paper shows via simple analysis and examples how the capability of a multivariate response (with respect to being informative and discriminating) can vary depending on how well the various responses complement one another over the range of the factor-space of interest. Insights derived from this analysis could be useful for selecting a set of response variables among candidates in cases where the number of response variables that can be acquired is limited by difficulty, expense, and/or availability of material.« less

  19. A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection

    PubMed Central

    Sabourin, Jeremy A; Valdar, William; Nobel, Andrew B

    2015-01-01

    Summary We describe a simple, computationally effcient, permutation-based procedure for selecting the penalty parameter in LASSO penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), Scaled Sparse Linear Regression, and a selection method based on recently developed testing procedures for the LASSO. PMID:26243050

  20. Evaluation of redundancy analysis to identify signatures of local adaptation.

    PubMed

    Capblancq, Thibaut; Luu, Keurcien; Blum, Michael G B; Bazin, Eric

    2018-05-26

    Ordination is a common tool in ecology that aims at representing complex biological information in a reduced space. In landscape genetics, ordination methods such as principal component analysis (PCA) have been used to detect adaptive variation based on genomic data. Taking advantage of environmental data in addition to genotype data, redundancy analysis (RDA) is another ordination approach that is useful to detect adaptive variation. This paper aims at proposing a test statistic based on RDA to search for loci under selection. We compare redundancy analysis to pcadapt, which is a nonconstrained ordination method, and to a latent factor mixed model (LFMM), which is a univariate genotype-environment association method. Individual-based simulations identify evolutionary scenarios where RDA genome scans have a greater statistical power than genome scans based on PCA. By constraining the analysis with environmental variables, RDA performs better than PCA in identifying adaptive variation when selection gradients are weakly correlated with population structure. Additionally, we show that if RDA and LFMM have a similar power to identify genetic markers associated with environmental variables, the RDA-based procedure has the advantage to identify the main selective gradients as a combination of environmental variables. To give a concrete illustration of RDA in population genomics, we apply this method to the detection of outliers and selective gradients on an SNP data set of Populus trichocarpa (Geraldes et al., 2013). The RDA-based approach identifies the main selective gradient contrasting southern and coastal populations to northern and continental populations in the northwestern American coast. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  1. Variable selection under multiple imputation using the bootstrap in a prognostic study

    PubMed Central

    Heymans, Martijn W; van Buuren, Stef; Knol, Dirk L; van Mechelen, Willem; de Vet, Henrica CW

    2007-01-01

    Background Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. Method In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. Results We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. Conclusion We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values. PMID:17629912

  2. Collective feature selection to identify crucial epistatic variants.

    PubMed

    Verma, Shefali S; Lucas, Anastasia; Zhang, Xinyuan; Veturi, Yogasudha; Dudek, Scott; Li, Binglan; Li, Ruowang; Urbanowicz, Ryan; Moore, Jason H; Kim, Dokyoon; Ritchie, Marylyn D

    2018-01-01

    Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach. Through our simulation study we propose a collective feature selection approach to select features that are in the "union" of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~ 44,000 samples obtained from Geisinger's MyCode Community Health Initiative (on behalf of DiscovEHR collaboration). In this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.

  3. Selectivity in reversed-phase separations: general influence of solvent type and mobile phase pH.

    PubMed

    Neue, Uwe D; Méndez, Alberto

    2007-05-01

    The influence of the mobile phase on retention is studied in this paper for a group of over 70 compounds with a broad range of multiple functional groups. We varied the pH of the mobile phase (pH 3, 7, and 10) and the organic modifier (methanol, acetonitrile (ACN), and tetrahydrofuran (THF)), using 15 different stationary phases. In this paper, we describe the overall retention and selectivity changes observed with these variables. We focus on the primary effects of solvent choice and pH. For example, transfer rules for solvent composition resulting in equivalent retention depend on the packing as well as on the type of analyte. Based on the retention patterns, one can calculate selectivity difference values for different variables. The selectivity difference is a measure of the importance of the different variables involved in method development. Selectivity changes specific to the type of analyte are described. The largest selectivity differences are obtained with pH changes.

  4. Variable selection in semiparametric cure models based on penalized likelihood, with application to breast cancer clinical trials.

    PubMed

    Liu, Xiang; Peng, Yingwei; Tu, Dongsheng; Liang, Hua

    2012-10-30

    Survival data with a sizable cure fraction are commonly encountered in cancer research. The semiparametric proportional hazards cure model has been recently used to analyze such data. As seen in the analysis of data from a breast cancer study, a variable selection approach is needed to identify important factors in predicting the cure status and risk of breast cancer recurrence. However, no specific variable selection method for the cure model is available. In this paper, we present a variable selection approach with penalized likelihood for the cure model. The estimation can be implemented easily by combining the computational methods for penalized logistic regression and the penalized Cox proportional hazards models with the expectation-maximization algorithm. We illustrate the proposed approach on data from a breast cancer study. We conducted Monte Carlo simulations to evaluate the performance of the proposed method. We used and compared different penalty functions in the simulation studies. Copyright © 2012 John Wiley & Sons, Ltd.

  5. Constructing Proxy Variables to Measure Adult Learners' Time Management Strategies in LMS

    ERIC Educational Resources Information Center

    Jo, Il-Hyun; Kim, Dongho; Yoon, Meehyun

    2015-01-01

    This study describes the process of constructing proxy variables from recorded log data within a Learning Management System (LMS), which represents adult learners' time management strategies in an online course. Based on previous research, three variables of total login time, login frequency, and regularity of login interval were selected as…

  6. Function-Based Approach to Designing an Instructional Environment

    ERIC Educational Resources Information Center

    Park, Kristy; Pinkelman, Sarah

    2017-01-01

    Teachers are faced with the challenge of selecting interventions that are most likely to be effective and best matched to the function of problem behavior. This article will define aspects of the instructional environment and describe a decision-making logic to select environmental variables. A summary of commonly used function-based interventions…

  7. Annual variability of PAH concentrations in the Potomac River watershed

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maher, I.L.; Foster, G.D.

    1995-12-31

    Dynamics of organic contaminant transport in a large river system is influenced by annual variability in organic contaminant concentrations. Surface runoff and groundwater input control the flow of river waters. They are also the two major inputs of contaminants to river waters. The annual variability of contaminant concentrations in rivers may or may not represent similar trends to the flow changes of river waters. The purpose of the research is to define the annual variability in concentrations of polycyclic aromatic hydrocarbons (PAH) in riverine environment. To accomplish this, from March 1992 to March 1995 samples of Potomac River water weremore » collected monthly or bimonthly downstream of the Chesapeake Bay fall line (Chain Bridge) during base flow and main storm flow hydrologic conditions. Concentrations of selected PAHs were measured in the dissolved phase and the particulate phase via GC/MS. The study of the annual variability of PAH concentrations will be performed through comparisons of PAH concentrations seasonally, annually, and through study of PAH concentration river discharge dependency and rainfall dependency. For selected PAHs monthly and annual loadings will be estimated based on their measured concentrations and average daily river discharge. The monthly loadings of selected PAHs will be compared by seasons and annually.« less

  8. Mate choice theory and the mode of selection in sexual populations.

    PubMed

    Carson, Hampton L

    2003-05-27

    Indirect new data imply that mate and/or gamete choice are major selective forces driving genetic change in sexual populations. The system dictates nonrandom mating, an evolutionary process requiring both revised genetic theory and new data on heritability of characters underlying Darwinian fitness. Successfully reproducing individuals represent rare selections from among vigorous, competing survivors of preadult natural selection. Nonrandom mating has correlated demographic effects: reduced effective population size, inbreeding, low gene flow, and emphasis on deme structure. Characters involved in choice behavior at reproduction appear based on quantitative trait loci. This variability serves selection for fitness within the population, having only an incidental relationship to the origin of genetically based reproductive isolation between populations. The claim that extensive hybridization experiments with Drosophila indicate that selection favors a gradual progression of "isolating mechanisms" is flawed, because intra-group random mating is assumed. Over deep time, local sexual populations are strong, independent genetic systems that use rich fields of variable polygenic components of fitness. The sexual reproduction system thus particularizes, in small subspecific populations, the genetic basis of the grand adaptive sweep of selective evolutionary change, much as Darwin proposed.

  9. C-fuzzy variable-branch decision tree with storage and classification error rate constraints

    NASA Astrophysics Data System (ADS)

    Yang, Shiueng-Bien

    2009-10-01

    The C-fuzzy decision tree (CFDT), which is based on the fuzzy C-means algorithm, has recently been proposed. The CFDT is grown by selecting the nodes to be split according to its classification error rate. However, the CFDT design does not consider the classification time taken to classify the input vector. Thus, the CFDT can be improved. We propose a new C-fuzzy variable-branch decision tree (CFVBDT) with storage and classification error rate constraints. The design of the CFVBDT consists of two phases-growing and pruning. The CFVBDT is grown by selecting the nodes to be split according to the classification error rate and the classification time in the decision tree. Additionally, the pruning method selects the nodes to prune based on the storage requirement and the classification time of the CFVBDT. Furthermore, the number of branches of each internal node is variable in the CFVBDT. Experimental results indicate that the proposed CFVBDT outperforms the CFDT and other methods.

  10. Anthropometry

    NASA Technical Reports Server (NTRS)

    Mcconville, J. T.; Laubach, L. L.

    1978-01-01

    Data on body-size measurement are presented to aid in spacecraft design. Tabulated dimensional anthropometric data on 59 variables for 12 selected populations are given. The variables chosen were those judged most relevant to the manned space program. A glossary of anatomical and anthropometric terms is included. Selected body dimensions of males and females from the potential astronaut population projected to the 1980-1990 time frame are given. Illustrations of drawing-board manikins based on those anticipated body sizes are included.

  11. On the use of variability time-scales as an early classifier of radio transients and variables

    NASA Astrophysics Data System (ADS)

    Pietka, M.; Staley, T. D.; Pretorius, M. L.; Fender, R. P.

    2017-11-01

    We have shown previously that a broad correlation between the peak radio luminosity and the variability time-scales, approximately L ∝ τ5, exists for variable synchrotron emitting sources and that different classes of astrophysical sources occupy different regions of luminosity and time-scale space. Based on those results, we investigate whether the most basic information available for a newly discovered radio variable or transient - their rise and/or decline rate - can be used to set initial constraints on the class of events from which they originate. We have analysed a sample of ≈800 synchrotron flares, selected from light curves of ≈90 sources observed at 5-8 GHz, representing a wide range of astrophysical phenomena, from flare stars to supermassive black holes. Selection of outbursts from the noisy radio light curves has been done automatically in order to ensure reproducibility of results. The distribution of rise/decline rates for the selected flares is modelled as a Gaussian probability distribution for each class of object, and further convolved with estimated areal density of that class in order to correct for the strong bias in our sample. We show in this way that comparing the measured variability time-scale of a radio transient/variable of unknown origin can provide an early, albeit approximate, classification of the object, and could form part of a suite of measurements used to provide early categorization of such events. Finally, we also discuss the effect scintillating sources will have on our ability to classify events based on their variability time-scales.

  12. Selection of relevant input variables in storm water quality modeling by multiobjective evolutionary polynomial regression paradigm

    NASA Astrophysics Data System (ADS)

    Creaco, E.; Berardi, L.; Sun, Siao; Giustolisi, O.; Savic, D.

    2016-04-01

    The growing availability of field data, from information and communication technologies (ICTs) in "smart" urban infrastructures, allows data modeling to understand complex phenomena and to support management decisions. Among the analyzed phenomena, those related to storm water quality modeling have recently been gaining interest in the scientific literature. Nonetheless, the large amount of available data poses the problem of selecting relevant variables to describe a phenomenon and enable robust data modeling. This paper presents a procedure for the selection of relevant input variables using the multiobjective evolutionary polynomial regression (EPR-MOGA) paradigm. The procedure is based on scrutinizing the explanatory variables that appear inside the set of EPR-MOGA symbolic model expressions of increasing complexity and goodness of fit to target output. The strategy also enables the selection to be validated by engineering judgement. In such context, the multiple case study extension of EPR-MOGA, called MCS-EPR-MOGA, is adopted. The application of the proposed procedure to modeling storm water quality parameters in two French catchments shows that it was able to significantly reduce the number of explanatory variables for successive analyses. Finally, the EPR-MOGA models obtained after the input selection are compared with those obtained by using the same technique without benefitting from input selection and with those obtained in previous works where other data-modeling techniques were used on the same data. The comparison highlights the effectiveness of both EPR-MOGA and the input selection procedure.

  13. The importance of scale-dependent ravine characteristics on breeding-site selection by the Burrowing Parrot, Cyanoliseus patagonus

    PubMed Central

    Rios, Rodrigo S.; Vargas-Rodriguez, Renzo; Novoa-Jerez, Jose-Enrique; Squeo, Francisco A.

    2017-01-01

    In birds, the environmental variables and intrinsic characteristics of the nest have important fitness consequences through its influence on the selection of nesting sites. However, the extent to which these variables interact with variables that operate at the landscape scale, and whether there is a hierarchy among the different scales that influences nest-site selection, is unknown. This interaction could be crucial in burrowing birds, which depend heavily on the availability of suitable nesting locations. One representative of this group is the burrowing parrot, Cyanoliseus patagonus that breeds on specific ravines and forms large breeding colonies. At a particular site, breeding aggregations require the concentration of adequate environmental elements for cavity nesting, which are provided by within ravine characteristics. Therefore, intrinsic ravine characteristics should be more important in determining nest site selection compared to landscape level characteristics. Here, we assess this hypothesis by comparing the importance of ravine characteristics operating at different scales on nest-site selection and their interrelation with reproductive success. We quantified 12 characteristics of 105 ravines in their reproductive habitat. For each ravine we quantified morphological variables, distance to resources and disturbance as well as nest number and egg production in order to compare selected and non-selected ravines and determine the interrelationship among variables in explaining ravine differences. In addition, the number of nests and egg production for each reproductive ravine was related to ravine characteristics to assess their relation to reproductive success. We found significant differences between non-reproductive and reproductive ravines in both intrinsic and extrinsic characteristics. The multidimensional environmental gradient of variation between ravines, however, shows that differences are mainly related to intrinsic morphological characteristics followed by extrinsic variables associated to human disturbance. Likewise, within reproductive ravines, intrinsic characteristics are more strongly related to the number of nests. The probability of producing eggs, however, was related only to distance to roads and human settlements. Patterns suggest that C. patagonus mainly selects nesting sites based on intrinsic morphological characteristics of ravines. Scale differences in the importance of ravine characteristics could be a consequence of the particular orography of the breeding habitat. The arrangement of resources is associated to the location of the gullies rather than to individual ravines, determining the spatial availability and disposition of resources and disturbances. Thus, nest selection is influenced by intrinsic characteristics that maximize the fitness of individuals. Scaling in nest-selection is discussed under an optimality approach that partitions patch selection based on foraging theory. PMID:28462019

  14. Variable selection for distribution-free models for longitudinal zero-inflated count responses.

    PubMed

    Chen, Tian; Wu, Pan; Tang, Wan; Zhang, Hui; Feng, Changyong; Kowalski, Jeanne; Tu, Xin M

    2016-07-20

    Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  15. Discovering Cepheid and RR Lyrae Stars: Pan-STARRS Science Archive @ STScI and Robotically Controlled Telescopes

    NASA Astrophysics Data System (ADS)

    Johnson, Elizabeth; Strolger, Louis-Gregory; Engle, Scott G.; Anderson, Richard I.; Rest, Armin; Calamida, Annalisa; Dosovitz Fox, Ori; Laney, David

    2017-01-01

    Cepheid and RR Lyrae stars are an integral part of the cosmic distance ladder and are also useful for studying galactic structure and stellar ages. This project aims to greatly expand the number of known periodic variables in our galaxy by identifying candidates in the PanSTARRS-1 3pi catalog, and carrying out systematically targeted characterization with robotically controlled telescopes. Candidate targets are selected from available detection tables based on color and variability indices and are then fully vetted using robotic telescopes: the RCT 1.3 meter (Kitt Peak National Observatory) and RATIR 1.5 meter (Mexico). Here we present work to develop a full, semi-automated prescription for candidate selection, targeted follow-up photometry, cataloging, and classification, which allows the review of approximately 25 variable candidates every two weeks. We make comparisons of our sample selection and purity from a similar study based on Pan-STARRS data (Hernitschek et al. 2016), as well as candidates identified in Gaia DR1. The goal, through continued observation and analysis, is to identify at least 10,000 new variables, hundreds of which will be new Cepheid and RR Lyrae stars.

  16. Do insurers respond to risk adjustment? A long-term, nationwide analysis from Switzerland.

    PubMed

    von Wyl, Viktor; Beck, Konstantin

    2016-03-01

    Community rating in social health insurance calls for risk adjustment in order to eliminate incentives for risk selection. Swiss risk adjustment is known to be insufficient, and substantial risk selection incentives remain. This study develops five indicators to monitor residual risk selection. Three indicators target activities of conglomerates of insurers (with the same ownership), which steer enrollees into specific carriers based on applicants' risk profiles. As a proxy for their market power, those indicators estimate the amount of premium-, health care cost-, and risk-adjustment transfer variability that is attributable to conglomerates. Two additional indicators, derived from linear regression, describe the amount of residual cost differences between insurers that are not covered by risk adjustment. All indicators measuring conglomerate-based risk selection activities showed increases between 1996 and 2009, paralleling the establishment of new conglomerates. At their maxima in 2009, the indicator values imply that 56% of the net risk adjustment volume, 34% of premium variability, and 51% cost variability in the market were attributable to conglomerates. From 2010 onwards, all indicators decreased, coinciding with a pre-announced risk adjustment reform implemented in 2012. Likewise, the regression-based indicators suggest that the volume and variance of residual cost differences between insurers that are not equaled out by risk adjustment have decreased markedly since 2009 as a result of the latest reform. Our analysis demonstrates that risk-selection, especially by conglomerates, is a real phenomenon in Switzerland. However, insurers seem to have reduced risk selection activities to optimize their losses and gains from the latest risk adjustment reform.

  17. Genetic-evolution-based optimization methods for engineering design

    NASA Technical Reports Server (NTRS)

    Rao, S. S.; Pan, T. S.; Dhingra, A. K.; Venkayya, V. B.; Kumar, V.

    1990-01-01

    This paper presents the applicability of a biological model, based on genetic evolution, for engineering design optimization. Algorithms embodying the ideas of reproduction, crossover, and mutation are developed and applied to solve different types of structural optimization problems. Both continuous and discrete variable optimization problems are solved. A two-bay truss for maximum fundamental frequency is considered to demonstrate the continuous variable case. The selection of locations of actuators in an actively controlled structure, for minimum energy dissipation, is considered to illustrate the discrete variable case.

  18. ESTIMATING THE LIKELIHOOD OF OCCURRENCE OF SELECTED PESTICIDES AND NUTRIENTS EXCEEDING SPECIFIC CONCENTRATIONS IN COASTAL PLAIN STREAMS BASED ON LANDSCAPE CHARACTERISTICS

    EPA Science Inventory

    The occurrence of selected pesticides and nutrient compounds in nontidal headwater streams of the Mid-Atlantic Coastal Plain (North Carolina through New Jersey) during winter and spring base flow is related to land use, soils, and other geographic variables that reflect sources a...

  19. Individual treatment selection for patients with posttraumatic stress disorder.

    PubMed

    Deisenhofer, Anne-Katharina; Delgadillo, Jaime; Rubel, Julian A; Böhnke, Jan R; Zimmermann, Dirk; Schwartz, Brian; Lutz, Wolfgang

    2018-04-16

    Trauma-focused cognitive behavioral therapy (Tf-CBT) and eye movement desensitization and reprocessing (EMDR) are two highly effective treatment options for posttraumatic stress disorder (PTSD). Yet, on an individual level, PTSD patients vary substantially in treatment response. The aim of the paper is to test the application of a treatment selection method based on a personalized advantage index (PAI). The study used clinical data for patients accessing treatment for PTSD in a primary care mental health service in the north of England. PTSD patients received either EMDR (N = 75) or Tf-CBT (N = 242). The Patient Health Questionnaire (PHQ-9) was used as an outcome measure for depressive symptoms associated with PTSD. Variables predicting differential treatment response were identified using an automated variable selection approach (genetic algorithm) and afterwards included in regression models, allowing the calculation of each patient's PAI. Age, employment status, gender, and functional impairment were identified as relevant variables for Tf-CBT. For EMDR, baseline depressive symptoms as well as prescribed antidepressant medication were selected as predictor variables. Fifty-six percent of the patients (n = 125) had a PAI equal or higher than one standard deviation. From those patients, 62 (50%) did not receive their model-predicted treatment and could have benefited from a treatment assignment based on the PAI. Using a PAI-based algorithm has the potential to improve clinical decision making and to enhance individual patient outcomes, although further replication is necessary before such an approach can be implemented in prospective studies. © 2018 Wiley Periodicals, Inc.

  20. Analysis of Factors Influencing Energy Consumption at an Air Force Base.

    DTIC Science & Technology

    1995-12-01

    include them in energy consumption projections. 28 Table 2-3 Selected Independent Variables ( Morill , 1985) Dependent Variable Energy Conservation...most appropriate method for forecasting energy consumption (Weck, 1981; Tinsley, 1981; and Morill , 1985). This section will present a brief

  1. A fully synthetic human Fab antibody library based on fixed VH/VL framework pairings with favorable biophysical properties

    PubMed Central

    Tiller, Thomas; Schuster, Ingrid; Deppe, Dorothée; Siegers, Katja; Strohner, Ralf; Herrmann, Tanja; Berenguer, Marion; Poujol, Dominique; Stehle, Jennifer; Stark, Yvonne; Heßling, Martin; Daubert, Daniela; Felderer, Karin; Kaden, Stefan; Kölln, Johanna; Enzelberger, Markus; Urlinger, Stefanie

    2013-01-01

    This report describes the design, generation and testing of Ylanthia, a fully synthetic human Fab antibody library with 1.3E+11 clones. Ylanthia comprises 36 fixed immunoglobulin (Ig) variable heavy (VH)/variable light (VL) chain pairs, which cover a broad range of canonical complementarity-determining region (CDR) structures. The variable Ig heavy and Ig light (VH/VL) chain pairs were selected for biophysical characteristics favorable to manufacturing and development. The selection process included multiple parameters, e.g., assessment of protein expression yield, thermal stability and aggregation propensity in fragment antigen binding (Fab) and IgG1 formats, and relative Fab display rate on phage. The framework regions are fixed and the diversified CDRs were designed based on a systematic analysis of a large set of rearranged human antibody sequences. Care was taken to minimize the occurrence of potential posttranslational modification sites within the CDRs. Phage selection was performed against various antigens and unique antibodies with excellent biophysical properties were isolated. Our results confirm that quality can be built into an antibody library by prudent selection of unmodified, fully human VH/VL pairs as scaffolds. PMID:23571156

  2. Selected correlates of white nursing students' attitudes toward black American patients.

    PubMed

    Morgan, B S

    1983-01-01

    Multivariate analyses were used to examine the relationships between white nursing students' attitudes toward black American patients and variables selected within a theoretical framework of prejudice which included socialization factors and personality-based factors. The variables selected were: authoritarianism and self-esteem (personality-based factors), parents' attitudes toward black Americans, peer attitudes toward black Americans, interracial contact and socioeconomic status (socialization factors). The study also examined the differences in the relationship among white nursing students enrolled in baccalaureate degree, associate degree and diploma nursing programs. Data were collected from 201 senior nursing students enrolled in the three types of nursing programs in Rhode Island during the late fall and winter of 1979-1980. Although baccalaureate degree, associate degree and diploma students were similar in terms of peer attitudes toward black Americans, fathers' attitudes toward black Americans, self-esteem and attitudes toward black American patients, they were significantly different in terms of age, socioeconomic status, mothers' attitudes toward black Americans, interracial contact and authoritarianism. The major findings of this study indicate that the socialization explanation of prejudice is more significant than the personality-based explanation. The variables socioeconomic status, interracial contact and peer attitudes toward black Americans (all socialization variables) accounted for 22.0% of the total variance in attitudes toward black American patients for the total sample of nursing students. However, this relationship was not generalizable across the three different types of nursing programs.

  3. A Model for Investigating Predictive Validity at Highly Selective Institutions.

    ERIC Educational Resources Information Center

    Gross, Alan L.; And Others

    A statistical model for investigating predictive validity at highly selective institutions is described. When the selection ratio is small, one must typically deal with a data set containing relatively large amounts of missing data on both criterion and predictor variables. Standard statistical approaches are based on the strong assumption that…

  4. A review of selection-based tests of abiotic surrogates for species representation.

    PubMed

    Beier, Paul; Sutcliffe, Patricia; Hjort, Jan; Faith, Daniel P; Pressey, Robert L; Albuquerque, Fabio

    2015-06-01

    Because conservation planners typically lack data on where species occur, environmental surrogates--including geophysical settings and climate types--have been used to prioritize sites within a planning area. We reviewed 622 evaluations of the effectiveness of abiotic surrogates in representing species in 19 study areas. Sites selected using abiotic surrogates represented more species than an equal number of randomly selected sites in 43% of tests (55% for plants) and on average improved on random selection of sites by about 8% (21% for plants). Environmental diversity (ED) (42% median improvement on random selection) and biotically informed clusters showed promising results and merit additional testing. We suggest 4 ways to improve performance of abiotic surrogates. First, analysts should consider a broad spectrum of candidate variables to define surrogates, including rarely used variables related to geographic separation, distance from coast, hydrology, and within-site abiotic diversity. Second, abiotic surrogates should be defined at fine thematic resolution. Third, sites (the landscape units prioritized within a planning area) should be small enough to ensure that surrogates reflect species' environments and to produce prioritizations that match the spatial resolution of conservation decisions. Fourth, if species inventories are available for some planning units, planners should define surrogates based on the abiotic variables that most influence species turnover in the planning area. Although species inventories increase the cost of using abiotic surrogates, a modest number of inventories could provide the data needed to select variables and evaluate surrogates. Additional tests of nonclimate abiotic surrogates are needed to evaluate the utility of conserving nature's stage as a strategy for conservation planning in the face of climate change. © 2015 Society for Conservation Biology.

  5. [Winter wheat yield gap between field blocks based on comparative performance analysis].

    PubMed

    Chen, Jian; Wang, Zhong-Yi; Li, Liang-Tao; Zhang, Ke-Feng; Yu, Zhen-Rong

    2008-09-01

    Based on a two-year household survey data, the yield gap of winter wheat in Quzhou County of Hebei Province, China in 2003-2004 was studied through comparative performance analysis (CPA). The results showed that there was a greater yield gap (from 4.2 to 7.9 t x hm(-2)) between field blocks, with a variation coefficient of 0.14. Through stepwise forward linear multiple regression, it was found that the yield model with 8 selected variables could explain 63% variability of winter wheat yield. Among the variables selected, soil salinity, soil fertility, and irrigation water quality were the most important limiting factors, accounting for 52% of the total yield gap. Crop variety was another important limiting factor, accounting for 14%; while planting date, fertilizer type, disease and pest, and water press accounted for 7%, 14%, 10%, and 3%, respectively. Therefore, besides soil and climate conditions, management practices occupied the majority of yield variability in Quzhou County, suggesting that the yield gap could be reduced significantly through optimum field management.

  6. Classification of debtor credit status and determination amount of credit risk by using linier discriminant function

    NASA Astrophysics Data System (ADS)

    Aidi, Muhammad Nur; Sari, Resty Indah

    2012-05-01

    A decision of credit that given by bank or another creditur must have a risk and it called credit risk. Credit risk is an investor's risk of loss arising from a borrower who does not make payments as promised. The substantial of credit risk can lead to losses for the banks and the debtor. To minimize this problem need a further study to identify a potential new customer before the decision given. Identification of debtor can using various approaches analysis, one of them is by using discriminant analysis. Discriminant analysis in this study are used to classify whether belonging to the debtor's good credit or bad credit. The result of this study are two discriminant functions that can identify new debtor. Before step built the discriminant function, selection of explanatory variables should be done. Purpose of selection independent variable is to choose the variable that can discriminate the group maximally. Selection variables in this study using different test, for categoric variable selection of variable using proportion chi-square test, and stepwise discriminant for numeric variable. The result of this study are two discriminant functions that can identify new debtor. The selected variables that can discriminating two groups of debtor maximally are status of existing checking account, credit history, credit amount, installment rate in percentage of disposable income, sex, age in year, other installment plans, and number of people being liable to provide maintenance. This classification produce a classification accuracy rate is good enough, that is equal to 74,70%. Debtor classification using discriminant analysis has risk level that is small enough, and it ranged beetwen 14,992% and 17,608%. Based on that credit risk rate, using discriminant analysis on the classification of credit status can be used effectively.

  7. Variability aware compact model characterization for statistical circuit design optimization

    NASA Astrophysics Data System (ADS)

    Qiao, Ying; Qian, Kun; Spanos, Costas J.

    2012-03-01

    Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose an efficient variabilityaware compact model characterization methodology based on the linear propagation of variance. Hierarchical spatial variability patterns of selected compact model parameters are directly calculated from transistor array test structures. This methodology has been implemented and tested using transistor I-V measurements and the EKV-EPFL compact model. Calculation results compare well to full-wafer direct model parameter extractions. Further studies are done on the proper selection of both compact model parameters and electrical measurement metrics used in the method.

  8. Landscape effects on mallard habitat selection at multiple spatial scales during the non-breeding period

    USGS Publications Warehouse

    Beatty, William S.; Webb, Elisabeth B.; Kesler, Dylan C.; Raedeke, Andrew H.; Naylor, Luke W.; Humburg, Dale D.

    2014-01-01

    Previous studies that evaluated effects of landscape-scale habitat heterogeneity on migratory waterbird distributions were spatially limited and temporally restricted to one major life-history phase. However, effects of landscape-scale habitat heterogeneity on long-distance migratory waterbirds can be studied across the annual cycle using new technologies, including global positioning system satellite transmitters. We used Bayesian discrete choice models to examine the influence of local habitats and landscape composition on habitat selection by a generalist dabbling duck, the mallard (Anas platyrhynchos), in the midcontinent of North America during the non-breeding period. Using a previously published empirical movement metric, we separated the non-breeding period into three seasons, including autumn migration, winter, and spring migration. We defined spatial scales based on movement patterns such that movements >0.25 and <30.00 km were classified as local scale and movements >30.00 km were classified as relocation scale. Habitat selection at the local scale was generally influenced by local and landscape-level variables across all seasons. Variables in top models at the local scale included proximities to cropland, emergent wetland, open water, and woody wetland. Similarly, variables associated with area of cropland, emergent wetland, open water, and woody wetland were also included at the local scale. At the relocation scale, mallards selected resource units based on more generalized variables, including proximity to wetlands and total wetland area. Our results emphasize the role of landscape composition in waterbird habitat selection and provide further support for local wetland landscapes to be considered functional units of waterbird conservation and management.

  9. [Research on fast detecting tomato seedlings nitrogen content based on NIR characteristic spectrum selection].

    PubMed

    Wu, Jing-zhu; Wang, Feng-zhu; Wang, Li-li; Zhang, Xiao-chao; Mao, Wen-hua

    2015-01-01

    In order to improve the accuracy and robustness of detecting tomato seedlings nitrogen content based on near-infrared spectroscopy (NIR), 4 kinds of characteristic spectrum selecting methods were studied in the present paper, i. e. competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variables elimination (MCUVE), backward interval partial least squares (BiPLS) and synergy interval partial least squares (SiPLS). There were totally 60 tomato seedlings cultivated at 10 different nitrogen-treatment levels (urea concentration from 0 to 120 mg . L-1), with 6 samples at each nitrogen-treatment level. They are in different degrees of over nitrogen, moderate nitrogen, lack of nitrogen and no nitrogen status. Each sample leaves were collected to scan near-infrared spectroscopy from 12 500 to 3 600 cm-1. The quantitative models based on the above 4 methods were established. According to the experimental result, the calibration model based on CARS and MCUVE selecting methods show better performance than those based on BiPLS and SiPLS selecting methods, but their prediction ability is much lower than that of the latter. Among them, the model built by BiPLS has the best prediction performance. The correlation coefficient (r), root mean square error of prediction (RMSEP) and ratio of performance to standard derivate (RPD) is 0. 952 7, 0. 118 3 and 3. 291, respectively. Therefore, NIR technology combined with characteristic spectrum selecting methods can improve the model performance. But the characteristic spectrum selecting methods are not universal. For the built model based or single wavelength variables selection is more sensitive, it is more suitable for the uniform object. While the anti-interference ability of the model built based on wavelength interval selection is much stronger, it is more suitable for the uneven and poor reproducibility object. Therefore, the characteristic spectrum selection will only play a better role in building model, combined with the consideration of sample state and the model indexes.

  10. Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification

    PubMed Central

    Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang; Bushel, Pierre R.

    2013-01-01

    Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html. PMID:23966761

  11. Distributed Space Mission Design for Earth Observation Using Model-Based Performance Evaluation

    NASA Technical Reports Server (NTRS)

    Nag, Sreeja; LeMoigne-Stewart, Jacqueline; Cervantes, Ben; DeWeck, Oliver

    2015-01-01

    Distributed Space Missions (DSMs) are gaining momentum in their application to earth observation missions owing to their unique ability to increase observation sampling in multiple dimensions. DSM design is a complex problem with many design variables, multiple objectives determining performance and cost and emergent, often unexpected, behaviors. There are very few open-access tools available to explore the tradespace of variables, minimize cost and maximize performance for pre-defined science goals, and therefore select the most optimal design. This paper presents a software tool that can multiple DSM architectures based on pre-defined design variable ranges and size those architectures in terms of predefined science and cost metrics. The tool will help a user select Pareto optimal DSM designs based on design of experiments techniques. The tool will be applied to some earth observation examples to demonstrate its applicability in making some key decisions between different performance metrics and cost metrics early in the design lifecycle.

  12. Personality, Demographics, and Acculturation in North American Refugees.

    ERIC Educational Resources Information Center

    Smither, Robert; Rodriquez-Giegling, Marta

    This study predicts willingness of refugees to acculturate to North American society based on selected demographic and psychological variables. The hypothesis is that most previous research on refugee adaptation has overemphasized sociological variables such as age, time in the country, and level of education and underemphasized psychological…

  13. Development of variable LRFD \\0x03C6 factors for deep foundation design due to site variability.

    DOT National Transportation Integrated Search

    2012-04-01

    The current design guidelines of Load and Resistance Factor Design (LRFD) specifies constant values : for deep foundation design, based on analytical method selected and degree of redundancy of the pier. : However, investigation of multiple sites in ...

  14. The nature of instructional effects in color constancy.

    PubMed

    Radonjić, Ana; Brainard, David H

    2016-06-01

    The instructions subjects receive can have a large effect on experimentally measured color constancy, but the nature of these effects and how their existence should inform our understanding of color perception remains unclear. We used a factorial design to measure how instructional effects on constancy vary with experimental task and stimulus set. In each of 2 experiments, we employed both a classic adjustment-based asymmetric matching task and a novel color selection task. Four groups of naive subjects were instructed to make adjustments/selections based on (a) color (neutral instructions); (b) the light reaching the eye (physical spectrum instructions); (c) the actual surface reflectance of an object (objective reflectance instructions); or (d) the apparent surface reflectance of an object (apparent reflectance instructions). Across the 2 experiments we varied the naturalness of the stimuli. We find clear interactions between instructions, task, and stimuli. With simplified stimuli (Experiment 1), instructional effects were large and the data revealed 2 instruction-dependent patterns. In 1 (neutral and physical spectrum instructions) constancy was low, intersubject variability was also low, and adjustment-based and selection-based constancy were in agreement. In the other (reflectance instructions) constancy was high, intersubject variability was large, adjustment-based constancy deviated from selection-based constancy and for some subjects selection-based constancy increased across sessions. Similar patterns held for naturalistic stimuli (Experiment 2), although instructional effects were smaller. We interpret these 2 patterns as signatures of distinct task strategies-1 is perceptual, with judgments based primarily on the perceptual representation of color; the other involves explicit instruction-driven reasoning. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  15. Non-destructive technique for determining the viability of soybean (Glycine max) seeds using FT-NIR spectroscopy.

    PubMed

    Kusumaningrum, Dewi; Lee, Hoonsoo; Lohumi, Santosh; Mo, Changyeun; Kim, Moon S; Cho, Byoung-Kwan

    2018-03-01

    The viability of seeds is important for determining their quality. A high-quality seed is one that has a high capability of germination that is necessary to ensure high productivity. Hence, developing technology for the detection of seed viability is a high priority in agriculture. Fourier transform near-infrared (FT-NIR) spectroscopy is one of the most popular devices among other vibrational spectroscopies. This study aims to use FT-NIR spectroscopy to determine the viability of soybean seeds. Viable and artificial ageing seeds as non-viable soybeans were used in this research. The FT-NIR spectra of soybean seeds were collected and analysed using a partial least-squares discriminant analysis (PLS-DA) to classify viable and non-viable soybean seeds. Moreover, the variable importance in projection (VIP) method for variable selection combined with the PLS-DA was employed. The most effective wavelengths were selected by the VIP method, which selected 146 optimal variables from the full set of 1557 variables. The results demonstrated that the FT-NIR spectral analysis with the PLS-DA method that uses all variables or the selected variables showed good performance based on the high value of prediction accuracy for soybean viability with an accuracy close to 100%. Hence, FT-NIR techniques with a chemometric analysis have the potential for rapidly measuring soybean seed viability. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

  16. Selection of specific protein binders for pre-defined targets from an optimized library of artificial helicoidal repeat proteins (alphaRep).

    PubMed

    Guellouz, Asma; Valerio-Lepiniec, Marie; Urvoas, Agathe; Chevrel, Anne; Graille, Marc; Fourati-Kammoun, Zaineb; Desmadril, Michel; van Tilbeurgh, Herman; Minard, Philippe

    2013-01-01

    We previously designed a new family of artificial proteins named αRep based on a subgroup of thermostable helicoidal HEAT-like repeats. We have now assembled a large optimized αRep library. In this library, the side chains at each variable position are not fully randomized but instead encoded by a distribution of codons based on the natural frequency of side chains of the natural repeats family. The library construction is based on a polymerization of micro-genes and therefore results in a distribution of proteins with a variable number of repeats. We improved the library construction process using a "filtration" procedure to retain only fully coding modules that were recombined to recreate sequence diversity. The final library named Lib2.1 contains 1.7×10(9) independent clones. Here, we used phage display to select, from the previously described library or from the new library, new specific αRep proteins binding to four different non-related predefined protein targets. Specific binders were selected in each case. The results show that binders with various sizes are selected including relatively long sequences, with up to 7 repeats. ITC-measured affinities vary with Kd values ranging from micromolar to nanomolar ranges. The formation of complexes is associated with a significant thermal stabilization of the bound target protein. The crystal structures of two complexes between αRep and their cognate targets were solved and show that the new interfaces are established by the variable surfaces of the repeated modules, as well by the variable N-cap residues. These results suggest that αRep library is a new and versatile source of tight and specific binding proteins with favorable biophysical properties.

  17. Selection of Specific Protein Binders for Pre-Defined Targets from an Optimized Library of Artificial Helicoidal Repeat Proteins (alphaRep)

    PubMed Central

    Chevrel, Anne; Graille, Marc; Fourati-Kammoun, Zaineb; Desmadril, Michel; van Tilbeurgh, Herman; Minard, Philippe

    2013-01-01

    We previously designed a new family of artificial proteins named αRep based on a subgroup of thermostable helicoidal HEAT-like repeats. We have now assembled a large optimized αRep library. In this library, the side chains at each variable position are not fully randomized but instead encoded by a distribution of codons based on the natural frequency of side chains of the natural repeats family. The library construction is based on a polymerization of micro-genes and therefore results in a distribution of proteins with a variable number of repeats. We improved the library construction process using a “filtration” procedure to retain only fully coding modules that were recombined to recreate sequence diversity. The final library named Lib2.1 contains 1.7×109 independent clones. Here, we used phage display to select, from the previously described library or from the new library, new specific αRep proteins binding to four different non-related predefined protein targets. Specific binders were selected in each case. The results show that binders with various sizes are selected including relatively long sequences, with up to 7 repeats. ITC-measured affinities vary with Kd values ranging from micromolar to nanomolar ranges. The formation of complexes is associated with a significant thermal stabilization of the bound target protein. The crystal structures of two complexes between αRep and their cognate targets were solved and show that the new interfaces are established by the variable surfaces of the repeated modules, as well by the variable N-cap residues. These results suggest that αRep library is a new and versatile source of tight and specific binding proteins with favorable biophysical properties. PMID:24014183

  18. Automated retrieval of forest structure variables based on multi-scale texture analysis of VHR satellite imagery

    NASA Astrophysics Data System (ADS)

    Beguet, Benoit; Guyon, Dominique; Boukir, Samia; Chehata, Nesrine

    2014-10-01

    The main goal of this study is to design a method to describe the structure of forest stands from Very High Resolution satellite imagery, relying on some typical variables such as crown diameter, tree height, trunk diameter, tree density and tree spacing. The emphasis is placed on the automatization of the process of identification of the most relevant image features for the forest structure retrieval task, exploiting both spectral and spatial information. Our approach is based on linear regressions between the forest structure variables to be estimated and various spectral and Haralick's texture features. The main drawback of this well-known texture representation is the underlying parameters which are extremely difficult to set due to the spatial complexity of the forest structure. To tackle this major issue, an automated feature selection process is proposed which is based on statistical modeling, exploring a wide range of parameter values. It provides texture measures of diverse spatial parameters hence implicitly inducing a multi-scale texture analysis. A new feature selection technique, we called Random PRiF, is proposed. It relies on random sampling in feature space, carefully addresses the multicollinearity issue in multiple-linear regression while ensuring accurate prediction of forest variables. Our automated forest variable estimation scheme was tested on Quickbird and Pléiades panchromatic and multispectral images, acquired at different periods on the maritime pine stands of two sites in South-Western France. It outperforms two well-established variable subset selection techniques. It has been successfully applied to identify the best texture features in modeling the five considered forest structure variables. The RMSE of all predicted forest variables is improved by combining multispectral and panchromatic texture features, with various parameterizations, highlighting the potential of a multi-resolution approach for retrieving forest structure variables from VHR satellite images. Thus an average prediction error of ˜ 1.1 m is expected on crown diameter, ˜ 0.9 m on tree spacing, ˜ 3 m on height and ˜ 0.06 m on diameter at breast height.

  19. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    USGS Publications Warehouse

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.

  20. Comparing Selections of Environmental Variables for Ecological Studies: A Focus on Terrain Attributes.

    PubMed

    Lecours, Vincent; Brown, Craig J; Devillers, Rodolphe; Lucieer, Vanessa L; Edinger, Evan N

    2016-01-01

    Selecting appropriate environmental variables is a key step in ecology. Terrain attributes (e.g. slope, rugosity) are routinely used as abiotic surrogates of species distribution and to produce habitat maps that can be used in decision-making for conservation or management. Selecting appropriate terrain attributes for ecological studies may be a challenging process that can lead users to select a subjective, potentially sub-optimal combination of attributes for their applications. The objective of this paper is to assess the impacts of subjectively selecting terrain attributes for ecological applications by comparing the performance of different combinations of terrain attributes in the production of habitat maps and species distribution models. Seven different selections of terrain attributes, alone or in combination with other environmental variables, were used to map benthic habitats of German Bank (off Nova Scotia, Canada). 29 maps of potential habitats based on unsupervised classifications of biophysical characteristics of German Bank were produced, and 29 species distribution models of sea scallops were generated using MaxEnt. The performances of the 58 maps were quantified and compared to evaluate the effectiveness of the various combinations of environmental variables. One of the combinations of terrain attributes-recommended in a related study and that includes a measure of relative position, slope, two measures of orientation, topographic mean and a measure of rugosity-yielded better results than the other selections for both methodologies, confirming that they together best describe terrain properties. Important differences in performance (up to 47% in accuracy measurement) and spatial outputs (up to 58% in spatial distribution of habitats) highlighted the importance of carefully selecting variables for ecological applications. This paper demonstrates that making a subjective choice of variables may reduce map accuracy and produce maps that do not adequately represent habitats and species distributions, thus having important implications when these maps are used for decision-making.

  1. Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents

    PubMed Central

    Guo, Pi; Zeng, Fangfang; Hu, Xiaomin; Zhang, Dingmei; Zhu, Shuming; Deng, Yu; Hao, Yuantao

    2015-01-01

    Objectives In epidemiological studies, it is important to identify independent associations between collective exposures and a health outcome. The current stepwise selection technique ignores stochastic errors and suffers from a lack of stability. The alternative LASSO-penalized regression model can be applied to detect significant predictors from a pool of candidate variables. However, this technique is prone to false positives and tends to create excessive biases. It remains challenging to develop robust variable selection methods and enhance predictability. Material and methods Two improved algorithms denoted the two-stage hybrid and bootstrap ranking procedures, both using a LASSO-type penalty, were developed for epidemiological association analysis. The performance of the proposed procedures and other methods including conventional LASSO, Bolasso, stepwise and stability selection models were evaluated using intensive simulation. In addition, methods were compared by using an empirical analysis based on large-scale survey data of hepatitis B infection-relevant factors among Guangdong residents. Results The proposed procedures produced comparable or less biased selection results when compared to conventional variable selection models. In total, the two newly proposed procedures were stable with respect to various scenarios of simulation, demonstrating a higher power and a lower false positive rate during variable selection than the compared methods. In empirical analysis, the proposed procedures yielding a sparse set of hepatitis B infection-relevant factors gave the best predictive performance and showed that the procedures were able to select a more stringent set of factors. The individual history of hepatitis B vaccination, family and individual history of hepatitis B infection were associated with hepatitis B infection in the studied residents according to the proposed procedures. Conclusions The newly proposed procedures improve the identification of significant variables and enable us to derive a new insight into epidemiological association analysis. PMID:26214802

  2. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

    NASA Astrophysics Data System (ADS)

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-04-01

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  3. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra.

    PubMed

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-03-13

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  4. Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data

    PubMed Central

    2013-01-01

    Background High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. Results We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. Conclusions We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches. PMID:23302187

  5. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

    PubMed Central

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models. PMID:26890307

  6. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

    PubMed

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.

  7. Influence of olfactory and visual cover on nest site selection and nest success for grassland-nesting birds.

    PubMed

    Fogarty, Dillon T; Elmore, R Dwayne; Fuhlendorf, Samuel D; Loss, Scott R

    2017-08-01

    Habitat selection by animals is influenced by and mitigates the effects of predation and environmental extremes. For birds, nest site selection is crucial to offspring production because nests are exposed to extreme weather and predation pressure. Predators that forage using olfaction often dominate nest predator communities; therefore, factors that influence olfactory detection (e.g., airflow and weather variables, including turbulence and moisture) should influence nest site selection and survival. However, few studies have assessed the importance of olfactory cover for habitat selection and survival. We assessed whether ground-nesting birds select nest sites based on visual and/or olfactory cover. Additionally, we assessed the importance of visual cover and airflow and weather variables associated with olfactory cover in influencing nest survival. In managed grasslands in Oklahoma, USA, we monitored nests of Northern Bobwhite ( Colinus virginianus ), Eastern Meadowlark ( Sturnella magna ), and Grasshopper Sparrow ( Ammodramus savannarum ) during 2015 and 2016. To assess nest site selection, we compared cover variables between nests and random points. To assess factors influencing nest survival, we used visual cover and olfactory-related measurements (i.e., airflow and weather variables) to model daily nest survival. For nest site selection, nest sites had greater overhead visual cover than random points, but no other significant differences were found. Weather variables hypothesized to influence olfactory detection, specifically precipitation and relative humidity, were the best predictors of and were positively related to daily nest survival. Selection for overhead cover likely contributed to mitigation of thermal extremes and possibly reduced detectability of nests. For daily nest survival, we hypothesize that major nest predators focused on prey other than the monitored species' nests during high moisture conditions, thus increasing nest survival on these days. Our study highlights how mechanistic approaches to studying cover informs which dimensions are perceived and selected by animals and which dimensions confer fitness-related benefits.

  8. [Study on Application of NIR Spectral Information Screening in Identification of Maca Origin].

    PubMed

    Wang, Yuan-zhong; Zhao, Yan-li; Zhang, Ji; Jin, Hang

    2016-02-01

    Medicinal and edible plant Maca is rich in various nutrients and owns great medicinal value. Based on near infrared diffuse reflectance spectra, 139 Maca samples collected from Peru and Yunnan were used to identify their geographical origins. Multiplication signal correction (MSC) coupled with second derivative (SD) and Norris derivative filter (ND) was employed in spectral pretreatment. Spectrum range (7,500-4,061 cm⁻¹) was chosen by spectrum standard deviation. Combined with principal component analysis-mahalanobis distance (PCA-MD), the appropriate number of principal components was selected as 5. Based on the spectrum range and the number of principal components selected, two abnormal samples were eliminated by modular group iterative singular sample diagnosis method. Then, four methods were used to filter spectral variable information, competitive adaptive reweighted sampling (CARS), monte carlo-uninformative variable elimination (MC-UVE), genetic algorithm (GA) and subwindow permutation analysis (SPA). The spectral variable information filtered was evaluated by model population analysis (MPA). The results showed that RMSECV(SPA) > RMSECV(CARS) > RMSECV(MC-UVE) > RMSECV(GA), were 2. 14, 2. 05, 2. 02, and 1. 98, and the spectral variables were 250, 240, 250 and 70, respectively. According to the spectral variable filtered, partial least squares discriminant analysis (PLS-DA) was used to build the model, with random selection of 97 samples as training set, and the other 40 samples as validation set. The results showed that, R²: GA > MC-UVE > CARS > SPA, RMSEC and RMSEP: GA < MC-UVE < CARS

  9. Generating a Simulated Fluid Flow over a Surface Using Anisotropic Diffusion

    NASA Technical Reports Server (NTRS)

    Rodriguez, David L. (Inventor); Sturdza, Peter (Inventor)

    2016-01-01

    A fluid-flow simulation over a computer-generated surface is generated using a diffusion technique. The surface is comprised of a surface mesh of polygons. A boundary-layer fluid property is obtained for a subset of the polygons of the surface mesh. A gradient vector is determined for a selected polygon, the selected polygon belonging to the surface mesh but not one of the subset of polygons. A maximum and minimum diffusion rate is determined along directions determined using the gradient vector corresponding to the selected polygon. A diffusion-path vector is defined between a point in the selected polygon and a neighboring point in a neighboring polygon. An updated fluid property is determined for the selected polygon using a variable diffusion rate, the variable diffusion rate based on the minimum diffusion rate, maximum diffusion rate, and the gradient vector.

  10. Identifying public water facilities with low spatial variability of disinfection by-products for epidemiological investigations

    PubMed Central

    Hinckley, A; Bachand, A; Nuckols, J; Reif, J

    2005-01-01

    Background and Aims: Epidemiological studies of disinfection by-products (DBPs) and reproductive outcomes have been hampered by misclassification of exposure. In most epidemiological studies conducted to date, all persons living within the boundaries of a water distribution system have been assigned a common exposure value based on facility-wide averages of trihalomethane (THM) concentrations. Since THMs do not develop uniformly throughout a distribution system, assignment of facility-wide averages may be inappropriate. One approach to mitigate this potential for misclassification is to select communities for epidemiological investigations that are served by distribution systems with consistently low spatial variability of THMs. Methods and Results: A feasibility study was conducted to develop methods for community selection using the Information Collection Rule (ICR) database, assembled by the US Environmental Protection Agency. The ICR database contains quarterly DBP concentrations collected between 1997 and 1998 from the distribution systems of 198 public water facilities with minimum service populations of 100 000 persons. Facilities with low spatial variation of THMs were identified using two methods; 33 facilities were found with low spatial variability based on one or both methods. Because brominated THMs may be important predictors of risk for adverse reproductive outcomes, sites were categorised into three exposure profiles according to proportion of brominated THM species and average TTHM concentration. The correlation between THMs and haloacetic acids (HAAs) in these facilities was evaluated to see whether selection by total trihalomethanes (TTHMs) corresponds to low spatial variability for HAAs. TTHMs were only moderately correlated with HAAs (r = 0.623). Conclusions: Results provide a simple method for a priori selection of sites with low spatial variability from state or national public water facility datasets as a means to reduce exposure misclassification in epidemiological studies of DBPs. PMID:15961627

  11. Relationships between habitat quality and measured condition variables in Gulf of Mexico mangroves

    EPA Science Inventory

    Abstract Ecosystem condition assessments were conducted for 12 mangrove sites in the northern Gulf of Mexico. Nine sites were selected randomly; three were selected a priori based on best professional judgment to represent a poor, intermediate and good environmental condition. D...

  12. An Update on Statistical Boosting in Biomedicine.

    PubMed

    Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

    2017-01-01

    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

  13. Role of environmental variability in the evolution of life history strategies.

    PubMed

    Hastings, A; Caswell, H

    1979-09-01

    We reexamine the role of environmental variability in the evolution of life history strategies. We show that normally distributed deviations in the quality of the environment should lead to normally distributed deviations in the logarithm of year-to-year survival probabilities, which leads to interesting consequences for the evolution of annual and perennial strategies and reproductive effort. We also examine the effects of using differing criteria to determine the outcome of selection. Some predictions of previous theory are reversed, allowing distinctions between r and K theory and a theory based on variability. However, these distinctions require information about both the environment and the selection process not required by current theory.

  14. A novel porous framework as variable chemo-sensor: from response of specific carcinogenic alkyl-aromatic to selective detection of explosive nitro-aromatics.

    PubMed

    Chen, Qihui

    2018-06-07

    Selective probing one molecule from one class similar molecules is highly challenging due to their similar chemical and physical properties. Here, a novel metal-organic framework FJI-H15 with flexible porous cages has been designed and synthesized, which can specifically recognize ethyl-benzene with ultrahigh enhancement efficiency from series of alkyl-aromatics, in which an unusual size-dependent interaction has been found and proved. While it also can selectively detect phenolic-nitroaromatics among series of nitro-aromatics based on energy transferring and electrostatic interaction. Such unusual specificity and variable mechanisms responding to different type molecules has not been reported, which will provide a new strategy for developing more effective chemo-sensor based on MOFs for probing small structural differences in molecules. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Selection of Optimal Auxiliary Soil Nutrient Variables for Cokriging Interpolation

    PubMed Central

    Song, Genxin; Zhang, Jing; Wang, Ke

    2014-01-01

    In order to explore the selection of the best auxiliary variables (BAVs) when using the Cokriging method for soil attribute interpolation, this paper investigated the selection of BAVs from terrain parameters, soil trace elements, and soil nutrient attributes when applying Cokriging interpolation to soil nutrients (organic matter, total N, available P, and available K). In total, 670 soil samples were collected in Fuyang, and the nutrient and trace element attributes of the soil samples were determined. Based on the spatial autocorrelation of soil attributes, the Digital Elevation Model (DEM) data for Fuyang was combined to explore the coordinate relationship among terrain parameters, trace elements, and soil nutrient attributes. Variables with a high correlation to soil nutrient attributes were selected as BAVs for Cokriging interpolation of soil nutrients, and variables with poor correlation were selected as poor auxiliary variables (PAVs). The results of Cokriging interpolations using BAVs and PAVs were then compared. The results indicated that Cokriging interpolation with BAVs yielded more accurate results than Cokriging interpolation with PAVs (the mean absolute error of BAV interpolation results for organic matter, total N, available P, and available K were 0.020, 0.002, 7.616, and 12.4702, respectively, and the mean absolute error of PAV interpolation results were 0.052, 0.037, 15.619, and 0.037, respectively). The results indicated that Cokriging interpolation with BAVs can significantly improve the accuracy of Cokriging interpolation for soil nutrient attributes. This study provides meaningful guidance and reference for the selection of auxiliary parameters for the application of Cokriging interpolation to soil nutrient attributes. PMID:24927129

  16. Total sulfur determination in residues of crude oil distillation using FT-IR/ATR and variable selection methods

    NASA Astrophysics Data System (ADS)

    Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; Mello, Paola de Azevedo; Ferrão, Marco Flores; dos Santos, Maria de Fátima Pereira; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes

    2012-04-01

    Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm-1). This model produced a RMSECV of 400 mg kg-1 S and RMSEP of 420 mg kg-1 S, showing a correlation coefficient of 0.990.

  17. Data re-arranging techniques leading to proper variable selections in high energy physics

    NASA Astrophysics Data System (ADS)

    Kůs, Václav; Bouř, Petr

    2017-12-01

    We introduce a new data based approach to homogeneity testing and variable selection carried out in high energy physics experiments, where one of the basic tasks is to test the homogeneity of weighted samples, mainly the Monte Carlo simulations (weighted) and real data measurements (unweighted). This technique is called ’data re-arranging’ and it enables variable selection performed by means of the classical statistical homogeneity tests such as Kolmogorov-Smirnov, Anderson-Darling, or Pearson’s chi-square divergence test. P-values of our variants of homogeneity tests are investigated and the empirical verification through 46 dimensional high energy particle physics data sets is accomplished under newly proposed (equiprobable) quantile binning. Particularly, the procedure of homogeneity testing is applied to re-arranged Monte Carlo samples and real DATA sets measured at the particle accelerator Tevatron in Fermilab at DØ experiment originating from top-antitop quark pair production in two decay channels (electron, muon) with 2, 3, or 4+ jets detected. Finally, the variable selections in the electron and muon channels induced by the re-arranging procedure for homogeneity testing are provided for Tevatron top-antitop quark data sets.

  18. Linear data mining the Wichita clinical matrix suggests sleep and allostatic load involvement in chronic fatigue syndrome.

    PubMed

    Gurbaxani, Brian M; Jones, James F; Goertzel, Benjamin N; Maloney, Elizabeth M

    2006-04-01

    To provide a mathematical introduction to the Wichita (KS, USA) clinical dataset, which is all of the nongenetic data (no microarray or single nucleotide polymorphism data) from the 2-day clinical evaluation, and show the preliminary findings and limitations, of popular, matrix algebra-based data mining techniques. An initial matrix of 440 variables by 227 human subjects was reduced to 183 variables by 164 subjects. Variables were excluded that strongly correlated with chronic fatigue syndrome (CFS) case classification by design (for example, the multidimensional fatigue inventory [MFI] data), that were otherwise self reporting in nature and also tended to correlate strongly with CFS classification, or were sparse or nonvarying between case and control. Subjects were excluded if they did not clearly fall into well-defined CFS classifications, had comorbid depression with melancholic features, or other medical or psychiatric exclusions. The popular data mining techniques, principle components analysis (PCA) and linear discriminant analysis (LDA), were used to determine how well the data separated into groups. Two different feature selection methods helped identify the most discriminating parameters. Although purely biological features (variables) were found to separate CFS cases from controls, including many allostatic load and sleep-related variables, most parameters were not statistically significant individually. However, biological correlates of CFS, such as heart rate and heart rate variability, require further investigation. Feature selection of a limited number of variables from the purely biological dataset produced better separation between groups than a PCA of the entire dataset. Feature selection highlighted the importance of many of the allostatic load variables studied in more detail by Maloney and colleagues in this issue [1] , as well as some sleep-related variables. Nonetheless, matrix linear algebra-based data mining approaches appeared to be of limited utility when compared with more sophisticated nonlinear analyses on richer data types, such as those found in Maloney and colleagues [1] and Goertzel and colleagues [2] in this issue.

  19. Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

    PubMed Central

    Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.

    2013-01-01

    Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933

  20. Choosing and Leaving Science in Highly Selective Institutions.

    ERIC Educational Resources Information Center

    Strenta, A. Christopher; And Others

    1994-01-01

    A study investigated causes of initial interest in and attrition from natural sciences and engineering among 5,320 students entering 4 highly selective institutions in 1988, with attention to probable causes of disproportionate attrition of women. Reasons for high attrition were based on cognitive variables or the perceived "chilly"…

  1. A critique of assumptions about selecting chemical-resistant gloves: a case for workplace evaluation of glove efficacy.

    PubMed

    Klingner, Thomas D; Boeniger, Mark F

    2002-05-01

    Wearing chemical-resistant gloves and clothing is the primary method used to prevent skin exposure to toxic chemicals in the workplace. The process for selecting gloves is usually based on manufacturers' laboratory-generated chemical permeation data. However, such data may not reflect conditions in the workplace where many variables are encountered (e.g., elevated temperature, flexing, pressure, and product variation between suppliers). Thus, the reliance on this selection process is questionable. Variables that may influence the performance of chemical-resistant gloves are identified and discussed. Passive dermal monitoring is recommended to evaluate glove performance under actual-use conditions and can bridge the gap between laboratory data and real-world performance.

  2. Data Mine and Forget It?: A Cautionary Tale

    NASA Technical Reports Server (NTRS)

    Tada, Yuri; Kraft, Norbert Otto; Orasanu, Judith M.

    2011-01-01

    With the development of new technologies, data mining has become increasingly popular. However, caution should be exercised in choosing the variables to include in data mining. A series of regression trees was created to demonstrate the change in the selection by the program of significant predictors based on the nature of variables.

  3. Structural Relationships between Variables of Elementary School Students' Intention of Accepting Digital Textbooks

    ERIC Educational Resources Information Center

    Joo, Young Ju; Joung, Sunyoung; Choi, Se-Bin; Lim, Eugene; Go, Kyung Yi

    2014-01-01

    The purpose of this study is to explore variables affecting the acceptance of digital textbooks in the elementary school environment and provide basic information and resources to increase the intention of acceptance. Based on the above research purposes. Surveys were conducted using Google Docs targeting randomly selected elementary school…

  4. Developing the formula for state subsidies for health care in Finland.

    PubMed

    Häkkinen, Unto; Järvelin, Jutta

    2004-01-01

    The aim was to generate a research-based proposal for a new subsidy formula for municipal healthcare services in Finland. Small-area data on potential need variables, supply of and access to services, and age-, sex- and case-mix-standardised service utilisation per capita were used. Utilisation was regressed in order to identify need variables and the cost weights for the selected need variables were subsequently derived using various multilevel models and structural equation methods. The variables selected for the subsidy formula were as follows: age- and sex-standardised mortality (age under 65 years) and income for outpatient primary health services; age- and sex-standardised mortality (all ages) and index of overcrowded housing for elderly care and long-term inpatient care; index of disability pensions for those aged 15-55 years and migration for specialised non-psychiatric care; and index of living alone and income for psychiatric care. Decisions on the amount of state subsidies can be divided into three stages, of which the first two are mainly political and the third is based on the results of this study.

  5. JCDSA: a joint covariate detection tool for survival analysis on tumor expression profiles.

    PubMed

    Wu, Yiming; Liu, Yanan; Wang, Yueming; Shi, Yan; Zhao, Xudong

    2018-05-29

    Survival analysis on tumor expression profiles has always been a key issue for subsequent biological experimental validation. It is crucial how to select features which closely correspond to survival time. Furthermore, it is important how to select features which best discriminate between low-risk and high-risk group of patients. Common features derived from the two aspects may provide variable candidates for prognosis of cancer. Based on the provided two-step feature selection strategy, we develop a joint covariate detection tool for survival analysis on tumor expression profiles. Significant features, which are not only consistent with survival time but also associated with the categories of patients with different survival risks, are chosen. Using the miRNA expression data (Level 3) of 548 patients with glioblastoma multiforme (GBM) as an example, miRNA candidates for prognosis of cancer are selected. The reliability of selected miRNAs using this tool is demonstrated by 100 simulations. Furthermore, It is discovered that significant covariates are not directly composed of individually significant variables. Joint covariate detection provides a viewpoint for selecting variables which are not individually but jointly significant. Besides, it helps to select features which are not only consistent with survival time but also associated with prognosis risk. The software is available at http://bio-nefu.com/resource/jcdsa .

  6. Natural image classification driven by human brain activity

    NASA Astrophysics Data System (ADS)

    Zhang, Dai; Peng, Hanyang; Wang, Jinqiao; Tang, Ming; Xue, Rong; Zuo, Zhentao

    2016-03-01

    Natural image classification has been a hot topic in computer vision and pattern recognition research field. Since the performance of an image classification system can be improved by feature selection, many image feature selection methods have been developed. However, the existing supervised feature selection methods are typically driven by the class label information that are identical for different samples from the same class, ignoring with-in class image variability and therefore degrading the feature selection performance. In this study, we propose a novel feature selection method, driven by human brain activity signals collected using fMRI technique when human subjects were viewing natural images of different categories. The fMRI signals associated with subjects viewing different images encode the human perception of natural images, and therefore may capture image variability within- and cross- categories. We then select image features with the guidance of fMRI signals from brain regions with active response to image viewing. Particularly, bag of words features based on GIST descriptor are extracted from natural images for classification, and a sparse regression base feature selection method is adapted to select image features that can best predict fMRI signals. Finally, a classification model is built on the select image features to classify images without fMRI signals. The validation experiments for classifying images from 4 categories of two subjects have demonstrated that our method could achieve much better classification performance than the classifiers built on image feature selected by traditional feature selection methods.

  7. Estimating stand structure using discrete-return lidar: an example from low density, fire prone ponderosa pine forests

    USGS Publications Warehouse

    Hall, S. A.; Burke, I.C.; Box, D. O.; Kaufmann, M. R.; Stoker, Jason M.

    2005-01-01

    The ponderosa pine forests of the Colorado Front Range, USA, have historically been subjected to wildfires. Recent large burns have increased public interest in fire behavior and effects, and scientific interest in the carbon consequences of wildfires. Remote sensing techniques can provide spatially explicit estimates of stand structural characteristics. Some of these characteristics can be used as inputs to fire behavior models, increasing our understanding of the effect of fuels on fire behavior. Others provide estimates of carbon stocks, allowing us to quantify the carbon consequences of fire. Our objective was to use discrete-return lidar to estimate such variables, including stand height, total aboveground biomass, foliage biomass, basal area, tree density, canopy base height and canopy bulk density. We developed 39 metrics from the lidar data, and used them in limited combinations in regression models, which we fit to field estimates of the stand structural variables. We used an information–theoretic approach to select the best model for each variable, and to select the subset of lidar metrics with most predictive potential. Observed versus predicted values of stand structure variables were highly correlated, with r2 ranging from 57% to 87%. The most parsimonious linear models for the biomass structure variables, based on a restricted dataset, explained between 35% and 58% of the observed variability. Our results provide us with useful estimates of stand height, total aboveground biomass, foliage biomass and basal area. There is promise for using this sensor to estimate tree density, canopy base height and canopy bulk density, though more research is needed to generate robust relationships. We selected 14 lidar metrics that showed the most potential as predictors of stand structure. We suggest that the focus of future lidar studies should broaden to include low density forests, particularly systems where the vertical structure of the canopy is important, such as fire prone forests.

  8. Urban Heat Wave Vulnerability Analysis Considering Climate Change

    NASA Astrophysics Data System (ADS)

    JE, M.; KIM, H.; Jung, S.

    2017-12-01

    Much attention has been paid to thermal environments in Seoul City in South Korea since 2016 when the worst heatwave in 22 years. It is necessary to provide a selective measure by singling out vulnerable regions in advance to cope with the heat wave-related damage. This study aims to analyze and categorize vulnerable regions of thermal environments in the Seoul and analyzes and discusses the factors and risk factors for each type. To do this, this study conducted the following processes: first, based on the analyzed various literature reviews, indices that can evaluate vulnerable regions of thermal environment are collated. The indices were divided into climate exposure index related to temperature, sensitivity index including demographic, social, and economic indices, and adaptation index related to urban environment and climate adaptation policy status. Second, significant variables were derived to evaluate a vulnerable region of thermal environment based on the summarized indices in the above. this study analyzed a relationship between the number of heat-related patients in Seoul and variables that affected the number using multi-variate statistical analysis to derive significant variables. Third, the importance of each variable was calculated quantitatively by integrating the statistical analysis results and analytic hierarchy process (AHP) method. Fourth, a distribution of data for each index was identified based on the selected variables and indices were normalized and overlapped. Fifth, For the climate exposure index, evaluations were conducted as same as the current vulnerability evaluation method by selecting future temperature of Seoul predicted through the representative concentration pathways (RCPs) climate change scenarios as an evaluation variable. The results of this study can be utilized as foundational data to establish a countermeasure against heatwave in Seoul. Although it is limited to control heatwave occurrences itself completely, improvements on environment for heatwave alleviation and response can be done. In particular, if vulnerable regions of heatwave can be identified and managed in advance, the study results are expected to be utilized as a basis of policy utilization in local communities accordingly.

  9. Variable selection based near infrared spectroscopy quantitative and qualitative analysis on wheat wet gluten

    NASA Astrophysics Data System (ADS)

    Lü, Chengxu; Jiang, Xunpeng; Zhou, Xingfan; Zhang, Yinqiao; Zhang, Naiqian; Wei, Chongfeng; Mao, Wenhua

    2017-10-01

    Wet gluten is a useful quality indicator for wheat, and short wave near infrared spectroscopy (NIRS) is a high performance technique with the advantage of economic rapid and nondestructive test. To study the feasibility of short wave NIRS analyzing wet gluten directly from wheat seed, 54 representative wheat seed samples were collected and scanned by spectrometer. 8 spectral pretreatment method and genetic algorithm (GA) variable selection method were used to optimize analysis. Both quantitative and qualitative model of wet gluten were built by partial least squares regression and discriminate analysis. For quantitative analysis, normalization is the optimized pretreatment method, 17 wet gluten sensitive variables are selected by GA, and GA model performs a better result than that of all variable model, with R2V=0.88, and RMSEV=1.47. For qualitative analysis, automatic weighted least squares baseline is the optimized pretreatment method, all variable models perform better results than those of GA models. The correct classification rates of 3 class of <24%, 24-30%, >30% wet gluten content are 95.45, 84.52, and 90.00%, respectively. The short wave NIRS technique shows potential for both quantitative and qualitative analysis of wet gluten for wheat seed.

  10. Efficient Variable Selection Method for Exposure Variables on Binary Data

    NASA Astrophysics Data System (ADS)

    Ohno, Manabu; Tarumi, Tomoyuki

    In this paper, we propose a new variable selection method for "robust" exposure variables. We define "robust" as property that the same variable can select among original data and perturbed data. There are few studies of effective for the selection method. The problem that selects exposure variables is almost the same as a problem that extracts correlation rules without robustness. [Brin 97] is suggested that correlation rules are possible to extract efficiently using chi-squared statistic of contingency table having monotone property on binary data. But the chi-squared value does not have monotone property, so it's is easy to judge the method to be not independent with an increase in the dimension though the variable set is completely independent, and the method is not usable in variable selection for robust exposure variables. We assume anti-monotone property for independent variables to select robust independent variables and use the apriori algorithm for it. The apriori algorithm is one of the algorithms which find association rules from the market basket data. The algorithm use anti-monotone property on the support which is defined by association rules. But independent property does not completely have anti-monotone property on the AIC of independent probability model, but the tendency to have anti-monotone property is strong. Therefore, selected variables with anti-monotone property on the AIC have robustness. Our method judges whether a certain variable is exposure variable for the independent variable using previous comparison of the AIC. Our numerical experiments show that our method can select robust exposure variables efficiently and precisely.

  11. Generating a Simulated Fluid Flow Over an Aircraft Surface Using Anisotropic Diffusion

    NASA Technical Reports Server (NTRS)

    Rodriguez, David L. (Inventor); Sturdza, Peter (Inventor)

    2013-01-01

    A fluid-flow simulation over a computer-generated aircraft surface is generated using a diffusion technique. The surface is comprised of a surface mesh of polygons. A boundary-layer fluid property is obtained for a subset of the polygons of the surface mesh. A pressure-gradient vector is determined for a selected polygon, the selected polygon belonging to the surface mesh but not one of the subset of polygons. A maximum and minimum diffusion rate is determined along directions determined using a pressure gradient vector corresponding to the selected polygon. A diffusion-path vector is defined between a point in the selected polygon and a neighboring point in a neighboring polygon. An updated fluid property is determined for the selected polygon using a variable diffusion rate, the variable diffusion rate based on the minimum diffusion rate, maximum diffusion rate, and angular difference between the diffusion-path vector and the pressure-gradient vector.

  12. Polymorphism and selection in the major histocompatibility complex DRA and DQA genes in the family Equidae.

    PubMed

    Janova, Eva; Matiasovic, Jan; Vahala, Jiri; Vodicka, Roman; Van Dyk, Enette; Horin, Petr

    2009-07-01

    The major histocompatibility complex genes coding for antigen binding and presenting molecules are the most polymorphic genes in the vertebrate genome. We studied the DRA and DQA gene polymorphism of the family Equidae. In addition to 11 previously reported DRA and 24 DQA alleles, six new DRA sequences and 13 new DQA alleles were identified in the genus Equus. Phylogenetic analysis of both DRA and DQA sequences provided evidence for trans-species polymorphism in the family Equidae. The phylogenetic trees differed from species relationships defined by standard taxonomy of Equidae and from trees based on mitochondrial or neutral gene sequence data. Analysis of selection showed differences between the less variable DRA and more variable DQA genes. DRA alleles were more often shared by more species. The DQA sequences analysed showed strong amongst-species positive selection; the selected amino acid positions mostly corresponded to selected positions in rodent and human DQA genes.

  13. Improvement of SET variability in TaO x based resistive RAM devices

    NASA Astrophysics Data System (ADS)

    Schönhals, Alexander; Waser, Rainer; Wouters, Dirk J.

    2017-11-01

    Improvement or at least control of variability is one of the key challenges for Redox based resistive switching memory technology. In this paper, we investigate the impact of a serial resistor as a voltage divider on the SET variability in Pt/Ta2O5/Ta/Pt nano crossbar devices. A partial RESET in a competing complementary switching (CS) mode is identified as a possible failure mechanism of bipolar switching SET in our devices. Due to a voltage divider effect, serial resistance value shows unequal impact on switching voltages of both modes which allows for a selective suppression of the CS mode. The impact of voltage divider on SET variability is demonstrated. A combination of appropriate write voltage and serial resistance allows for a significant improvement of the SET variability.

  14. PCA-LBG-based algorithms for VQ codebook generation

    NASA Astrophysics Data System (ADS)

    Tsai, Jinn-Tsong; Yang, Po-Yuan

    2015-04-01

    Vector quantisation (VQ) codebooks are generated by combining principal component analysis (PCA) algorithms with Linde-Buzo-Gray (LBG) algorithms. All training vectors are grouped according to the projected values of the principal components. The PCA-LBG-based algorithms include (1) PCA-LBG-Median, which selects the median vector of each group, (2) PCA-LBG-Centroid, which adopts the centroid vector of each group, and (3) PCA-LBG-Random, which randomly selects a vector of each group. The LBG algorithm finds a codebook based on the better vectors sent to an initial codebook by the PCA. The PCA performs an orthogonal transformation to convert a set of potentially correlated variables into a set of variables that are not linearly correlated. Because the orthogonal transformation efficiently distinguishes test image vectors, the proposed PCA-LBG-based algorithm is expected to outperform conventional algorithms in designing VQ codebooks. The experimental results confirm that the proposed PCA-LBG-based algorithms indeed obtain better results compared to existing methods reported in the literature.

  15. Spectroscopic follow-up of variability-selected active galactic nuclei in the Chandra Deep Field South

    NASA Astrophysics Data System (ADS)

    Boutsia, K.; Leibundgut, B.; Trevese, D.; Vagnetti, F.

    2009-04-01

    Context: Supermassive black holes with masses of 10^5-109 M⊙ are believed to inhabit most, if not all, nuclear regions of galaxies, and both observational evidence and theoretical models suggest a scenario where galaxy and black hole evolution are tightly related. Luminous AGNs are usually selected by their non-stellar colours or their X-ray emission. Colour selection cannot be used to select low-luminosity AGNs, since their emission is dominated by the host galaxy. Objects with low X-ray to optical ratio escape even the deepest X-ray surveys performed so far. In a previous study we presented a sample of candidates selected through optical variability in the Chandra Deep Field South, where repeated optical observations were performed in the framework of the STRESS supernova survey. Aims: The analysis is devoted to breaking down the sample in AGNs, starburst galaxies, and low-ionisation narrow-emission line objects, to providing new information about the possible dependence of the emission mechanisms on nuclear luminosity and black-hole mass, and eventually studying the evolution in cosmic time of the different populations. Methods: We obtained new optical spectroscopy for a sample of variability selected candidates with the ESO NTT telescope. We analysed the new spectra, together with those existing in the literature and studied the distribution of the objects in U-B and B-V colours, optical and X-ray luminosity, and variability amplitude. Results: A large fraction (17/27) of the observed candidates are broad-line luminous AGNs, confirming the efficiency of variability in detecting quasars. We detect: i) extended objects which would have escaped the colour selection and ii) objects of very low X-ray to optical ratio, in a few cases without any X-ray detection at all. Several objects resulted to be narrow-emission line galaxies where variability indicates nuclear activity, while no emission lines were detected in others. Some of these galaxies have variability and X-ray to optical ratio close to active galactic nuclei, while others have much lower variability and X-ray to optical ratio. This result can be explained by the dilution of the nuclear light due to the host galaxy. Conclusions: Our results demonstrate the effectiveness of supernova search programmes to detect large samples of low-luminosity AGNs. A sizable fraction of the AGN in our variability sample had escaped X-ray detection (5/47) and/or colour selection (9/48). Spectroscopic follow-up to fainter flux limits is strongly encouraged. Based on observations collected at the European Southern Observatory, Chile, 080.B-0187(A).

  16. [A meta-analysis of the variables related to depression in Korean patients with a stroke].

    PubMed

    Park, Eun Young; Shin, In Soo; Kim, Jung Hee

    2012-08-01

    The purpose of this study was to use meta-analysis to evaluate the variables related to depression in patients who have had a stroke. The materials of this study were based on 16 variables obtained from 26 recent studies over a span of 10 years which were selected from doctoral dissertations, master's thesis and published articles. Related variables were categorized into sixteen variables and six variable groups which included general characteristics of the patients, disease characteristics, psychological state, physical function, basic needs, and social variables. Also, the classification of six defensive and three risk variables group was based on the negative or positive effect of depression. The quality of life (ES=-.79) and acceptance of disability (ES=-.64) were highly correlated with depression in terms of defensive variables. For risk variables, anxiety (ES=.66), stress (ES=.53) showed high correlation effect size among the risk variables. These findings showed that defensive and risk variables were related to depression among stroke patients. Psychological interventions and improvement in physical functions should be effective in decreasing depression among stroke patients.

  17. Development of design guidelines for proper selection of graded aggregate base in Maryland state highways : [research summary].

    DOT National Transportation Integrated Search

    2015-01-01

    Millions of tons of graded aggregate base (GAB) materials are used in construction of : highway base layers in Maryland due to their satisfactory mechanical properties. The : fines content of a GAB material is highly variable and is often related to ...

  18. A soft computing based approach using modified selection strategy for feature reduction of medical systems.

    PubMed

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  19. A Soft Computing Based Approach Using Modified Selection Strategy for Feature Reduction of Medical Systems

    PubMed Central

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data. PMID:23573172

  20. Selection of single chain variable fragments (scFv) against Xylella fastidiosa subsp. pauca by phage display

    USDA-ARS?s Scientific Manuscript database

    Xylella fastidiosa is a gram-negative member of the gamma proteobacteria. Xylella fastidiosa subsp pauca causes citrus variegated chlorosis in Brazil and enjoys ‘select agent’ status in the United States. Antibody based detection assays are commercially available for Xylella fastidiosa, and are ef...

  1. Treatment Selection in Depression.

    PubMed

    Cohen, Zachary D; DeRubeis, Robert J

    2018-05-07

    Mental health researchers and clinicians have long sought answers to the question "What works for whom?" The goal of precision medicine is to provide evidence-based answers to this question. Treatment selection in depression aims to help each individual receive the treatment, among the available options, that is most likely to lead to a positive outcome for them. Although patient variables that are predictive of response to treatment have been identified, this knowledge has not yet translated into real-world treatment recommendations. The Personalized Advantage Index (PAI) and related approaches combine information obtained prior to the initiation of treatment into multivariable prediction models that can generate individualized predictions to help clinicians and patients select the right treatment. With increasing availability of advanced statistical modeling approaches, as well as novel predictive variables and big data, treatment selection models promise to contribute to improved outcomes in depression.

  2. The Cramér-Rao Bounds and Sensor Selection for Nonlinear Systems with Uncertain Observations.

    PubMed

    Wang, Zhiguo; Shen, Xiaojing; Wang, Ping; Zhu, Yunmin

    2018-04-05

    This paper considers the problems of the posterior Cramér-Rao bound and sensor selection for multi-sensor nonlinear systems with uncertain observations. In order to effectively overcome the difficulties caused by uncertainty, we investigate two methods to derive the posterior Cramér-Rao bound. The first method is based on the recursive formula of the Cramér-Rao bound and the Gaussian mixture model. Nevertheless, it needs to compute a complex integral based on the joint probability density function of the sensor measurements and the target state. The computation burden of this method is relatively high, especially in large sensor networks. Inspired by the idea of the expectation maximization algorithm, the second method is to introduce some 0-1 latent variables to deal with the Gaussian mixture model. Since the regular condition of the posterior Cramér-Rao bound is unsatisfied for the discrete uncertain system, we use some continuous variables to approximate the discrete latent variables. Then, a new Cramér-Rao bound can be achieved by a limiting process of the Cramér-Rao bound of the continuous system. It avoids the complex integral, which can reduce the computation burden. Based on the new posterior Cramér-Rao bound, the optimal solution of the sensor selection problem can be derived analytically. Thus, it can be used to deal with the sensor selection of a large-scale sensor networks. Two typical numerical examples verify the effectiveness of the proposed methods.

  3. The education of attention as explanation of variability of practice effects: learning the final approach phase in a flight simulator.

    PubMed

    Huet, Michaël; Jacobs, David M; Camachon, Cyril; Missenard, Olivier; Gray, Rob; Montagne, Gilles

    2011-12-01

    The present study reports two experiments in which a total of 20 participants without prior flight experience practiced the final approach phase in a fixed-base simulator. All participants received self-controlled concurrent feedback during 180 practice trials. Experiment 1 shows that participants learn more quickly under variable practice conditions than under constant practice conditions. This finding is attributed to the education of attention to the more useful informational variables: Variability of practice reduces the usefulness of initially used informational variables, which leads to a quicker change in variable use, and hence to a larger improvement in performance. In the practice phase of Experiment 2 variability was selectively applied to some experimental factors but not to others. Participants tended to converge toward the variables that were useful in the specific conditions that they encountered during practice. This indicates that an explanation for variability of practice effects in terms of the education of attention is a useful alternative to traditional explanations based on the notion of the generalized motor program and to explanations based on the notions of noise and local minima.

  4. Genetic signatures of natural selection in a model invasive ascidian

    NASA Astrophysics Data System (ADS)

    Lin, Yaping; Chen, Yiyong; Yi, Changho; Fong, Jonathan J.; Kim, Won; Rius, Marc; Zhan, Aibin

    2017-03-01

    Invasive species represent promising models to study species’ responses to rapidly changing environments. Although local adaptation frequently occurs during contemporary range expansion, the associated genetic signatures at both population and genomic levels remain largely unknown. Here, we use genome-wide gene-associated microsatellites to investigate genetic signatures of natural selection in a model invasive ascidian, Ciona robusta. Population genetic analyses of 150 individuals sampled in Korea, New Zealand, South Africa and Spain showed significant genetic differentiation among populations. Based on outlier tests, we found high incidence of signatures of directional selection at 19 loci. Hitchhiking mapping analyses identified 12 directional selective sweep regions, and all selective sweep windows on chromosomes were narrow (~8.9 kb). Further analyses indentified 132 candidate genes under selection. When we compared our genetic data and six crucial environmental variables, 16 putatively selected loci showed significant correlation with these environmental variables. This suggests that the local environmental conditions have left significant signatures of selection at both population and genomic levels. Finally, we identified “plastic” genomic regions and genes that are promising regions to investigate evolutionary responses to rapid environmental change in C. robusta.

  5. The Effect of Latent Binary Variables on the Uncertainty of the Prediction of a Dichotomous Outcome Using Logistic Regression Based Propensity Score Matching.

    PubMed

    Szekér, Szabolcs; Vathy-Fogarassy, Ágnes

    2018-01-01

    Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.

  6. Selection of specific interactors from phage display library based on sea lamprey variable lymphocyte receptor sequences.

    PubMed

    Wezner-Ptasinska, Magdalena; Otlewski, Jacek

    2015-12-01

    Variable lymphocyte receptors (VLRs) are non-immunoglobulin components of adaptive immunity in jawless vertebrates. These proteins composed of leucine-rich repeat modules offer some advantages over antibodies in target binding and therefore are attractive candidates for biotechnological applications. In this paper we report the design and characterization of a phage display library based on a previously proposed dVLR scaffold containing six LRR modules [Wezner-Ptasinska et al., 2011]. Our library was designed based on a consensus approach in which the randomization scheme reflects the frequencies of amino acids naturally occurring in respective positions responsible for antigen recognition. We demonstrate general applicability of the scaffold by selecting dVLRs specific for lysozyme and S100A7 protein with KD values in the micromolar range. The dVLR library could be used as a convenient alternative to antibodies for effective isolation of high affinity binders.

  7. Strategies for minimizing sample size for use in airborne LiDAR-based forest inventory

    USGS Publications Warehouse

    Junttila, Virpi; Finley, Andrew O.; Bradford, John B.; Kauranne, Tuomo

    2013-01-01

    Recently airborne Light Detection And Ranging (LiDAR) has emerged as a highly accurate remote sensing modality to be used in operational scale forest inventories. Inventories conducted with the help of LiDAR are most often model-based, i.e. they use variables derived from LiDAR point clouds as the predictive variables that are to be calibrated using field plots. The measurement of the necessary field plots is a time-consuming and statistically sensitive process. Because of this, current practice often presumes hundreds of plots to be collected. But since these plots are only used to calibrate regression models, it should be possible to minimize the number of plots needed by carefully selecting the plots to be measured. In the current study, we compare several systematic and random methods for calibration plot selection, with the specific aim that they be used in LiDAR based regression models for forest parameters, especially above-ground biomass. The primary criteria compared are based on both spatial representativity as well as on their coverage of the variability of the forest features measured. In the former case, it is important also to take into account spatial auto-correlation between the plots. The results indicate that choosing the plots in a way that ensures ample coverage of both spatial and feature space variability improves the performance of the corresponding models, and that adequate coverage of the variability in the feature space is the most important condition that should be met by the set of plots collected.

  8. Multivariate Bayesian variable selection exploiting dependence structure among outcomes: Application to air pollution effects on DNA methylation.

    PubMed

    Lee, Kyu Ha; Tadesse, Mahlet G; Baccarelli, Andrea A; Schwartz, Joel; Coull, Brent A

    2017-03-01

    The analysis of multiple outcomes is becoming increasingly common in modern biomedical studies. It is well-known that joint statistical models for multiple outcomes are more flexible and more powerful than fitting a separate model for each outcome; they yield more powerful tests of exposure or treatment effects by taking into account the dependence among outcomes and pooling evidence across outcomes. It is, however, unlikely that all outcomes are related to the same subset of covariates. Therefore, there is interest in identifying exposures or treatments associated with particular outcomes, which we term outcome-specific variable selection. In this work, we propose a variable selection approach for multivariate normal responses that incorporates not only information on the mean model, but also information on the variance-covariance structure of the outcomes. The approach effectively leverages evidence from all correlated outcomes to estimate the effect of a particular covariate on a given outcome. To implement this strategy, we develop a Bayesian method that builds a multivariate prior for the variable selection indicators based on the variance-covariance of the outcomes. We show via simulation that the proposed variable selection strategy can boost power to detect subtle effects without increasing the probability of false discoveries. We apply the approach to the Normative Aging Study (NAS) epigenetic data and identify a subset of five genes in the asthma pathway for which gene-specific DNA methylations are associated with exposures to either black carbon, a marker of traffic pollution, or sulfate, a marker of particles generated by power plants. © 2016, The International Biometric Society.

  9. Volume-based response evaluation with consensual lesion selection: a pilot study by using cloud solutions and comparison to RECIST 1.1.

    PubMed

    Oubel, Estanislao; Bonnard, Eric; Sueoka-Aragane, Naoko; Kobayashi, Naomi; Charbonnier, Colette; Yamamichi, Junta; Mizobe, Hideaki; Kimura, Shinya

    2015-02-01

    Lesion volume is considered as a promising alternative to Response Evaluation Criteria in Solid Tumors (RECIST) to make tumor measurements more accurate and consistent, which would enable an earlier detection of temporal changes. In this article, we report the results of a pilot study aiming at evaluating the effects of a consensual lesion selection on volume-based response (VBR) assessments. Eleven patients with lung computed tomography scans acquired at three time points were selected from Reference Image Database to Evaluate Response to therapy in lung cancer (RIDER) and proprietary databases. Images were analyzed according to RECIST 1.1 and VBR criteria by three readers working in different geographic locations. Cloud solutions were used to connect readers and carry out a consensus process on the selection of lesions used for computing response. Because there are not currently accepted thresholds for computing VBR, we have applied a set of thresholds based on measurement variability (-35% and +55%). The benefit of this consensus was measured in terms of multiobserver agreement by using Fleiss kappa (κfleiss) and corresponding standard errors (SE). VBR after consensual selection of target lesions allowed to obtain κfleiss = 0.85 (SE = 0.091), which increases up to 0.95 (SE = 0.092), if an extra consensus on new lesions is added. As a reference, the agreement when applying RECIST without consensus was κfleiss = 0.72 (SE = 0.088). These differences were found to be statistically significant according to a z-test. An agreement on the selection of lesions allows reducing the inter-reader variability when computing VBR. Cloud solutions showed to be an interesting and feasible strategy for standardizing response evaluations, reducing variability, and increasing consistency of results in multicenter clinical trials. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.

  10. Variability Selected Low-Luminosity Active Galactic Nuclei in the 4 Ms Chandra Deep Field-South

    NASA Technical Reports Server (NTRS)

    Young, M.; Brandt, W. N.; Xue, Y. Q.; Paolillo, D. M.; Alexander, F. E.; Bauer, F. E.; Lehmer, B. D.; Luo, B.; Shemmer, O.; Schneider, D. P.; hide

    2012-01-01

    The 4 Ms Chandra Deep Field-South (CDF-S) and other deep X-ray surveys have been highly effective at selecting active galactic nuclei (AGN). However, cosmologically distant low-luminosity AGN (LLAGN) have remained a challenge to identify due to significant contribution from the host galaxy. We identify long-term X ray variability (approx. month years, observed frame) in 20 of 92 CDF-S galaxies spanning redshifts approx equals 00.8 - 1.02 that do not meet other AGN selection criteria. We show that the observed variability cannot be explained by X-ray binary populations or ultraluminous X-ray sources, so the variability is most likely caused by accretion onto a supermassive black hole. The variable galaxies are not heavily obscured in general, with a stacked effective power-law photon index of Gamma(sub Stack) approx equals 1.93 +/- 0.13, and arc therefore likely LLAGN. The LLAGN tend to lie it factor of approx equal 6-89 below the extrapolated linear variability-luminosity relation measured for luminous AGN. This may he explained by their lower accretion rates. Variability-independent black-hole mass and accretion-rate estimates for variable galaxies show that they sample a significantly different black hole mass-accretion-rate space, with masses a factor of 2.4 lower and accretion rates a factor of 22.5 lower than variable luminous AGNs at the same redshift. We find that an empirical model based on a universal broken power-law power spectral density function, where the break frequency depends on SMBH mass and accretion rate, roughly reproduces the shape, but not the normalization, of the variability-luminosity trends measured for variable galaxies and more luminous AGNs.

  11. Short- and long-term variability of radon progeny concentration in dwellings in the Czech Republic.

    PubMed

    Slezáková, M; Navrátilová Rovenská, K; Tomásek, L; Holecek, J

    2013-03-01

    In this paper, repeated measurements of radon progeny concentration in dwellings in the Czech Republic are described. Two distinct data sets are available: one based on present measurements in 170 selected dwellings in the Central Bohemian Pluton with a primary measurement carried out in the 1990s and the other based on 1920 annual measurements in 960 single-family houses in the Czech Republic in 1992 and repeatedly in 1993. The analysis of variance model with random effects is applied to data to evaluate the variability of measurements. The calculated variability attributable to repeated measurements is compared with results from other countries. In epidemiological studies, ignoring the variability of measurements may lead to biased estimates of risk of lung cancer.

  12. Influence of variable selection on partial least squares discriminant analysis models for explosive residue classification

    NASA Astrophysics Data System (ADS)

    De Lucia, Frank C., Jr.; Gottfried, Jennifer L.

    2011-02-01

    Using a series of thirteen organic materials that includes novel high-nitrogen energetic materials, conventional organic military explosives, and benign organic materials, we have demonstrated the importance of variable selection for maximizing residue discrimination with partial least squares discriminant analysis (PLS-DA). We built several PLS-DA models using different variable sets based on laser induced breakdown spectroscopy (LIBS) spectra of the organic residues on an aluminum substrate under an argon atmosphere. The model classification results for each sample are presented and the influence of the variables on these results is discussed. We found that using the whole spectra as the data input for the PLS-DA model gave the best results. However, variables due to the surrounding atmosphere and the substrate contribute to discrimination when the whole spectra are used, indicating this may not be the most robust model. Further iterative testing with additional validation data sets is necessary to determine the most robust model.

  13. Total sulfur determination in residues of crude oil distillation using FT-IR/ATR and variable selection methods.

    PubMed

    Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; de Azevedo Mello, Paola; Ferrão, Marco Flores; de Fátima Pereira dos Santos, Maria; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes

    2012-04-01

    Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm(-1)). This model produced a RMSECV of 400 mg kg(-1) S and RMSEP of 420 mg kg(-1) S, showing a correlation coefficient of 0.990. Copyright © 2011 Elsevier B.V. All rights reserved.

  14. A Bayesian random effects discrete-choice model for resource selection: Population-level selection inference

    USGS Publications Warehouse

    Thomas, D.L.; Johnson, D.; Griffith, B.

    2006-01-01

    Modeling the probability of use of land units characterized by discrete and continuous measures, we present a Bayesian random-effects model to assess resource selection. This model provides simultaneous estimation of both individual- and population-level selection. Deviance information criterion (DIC), a Bayesian alternative to AIC that is sample-size specific, is used for model selection. Aerial radiolocation data from 76 adult female caribou (Rangifer tarandus) and calf pairs during 1 year on an Arctic coastal plain calving ground were used to illustrate models and assess population-level selection of landscape attributes, as well as individual heterogeneity of selection. Landscape attributes included elevation, NDVI (a measure of forage greenness), and land cover-type classification. Results from the first of a 2-stage model-selection procedure indicated that there is substantial heterogeneity among cow-calf pairs with respect to selection of the landscape attributes. In the second stage, selection of models with heterogeneity included indicated that at the population-level, NDVI and land cover class were significant attributes for selection of different landscapes by pairs on the calving ground. Population-level selection coefficients indicate that the pairs generally select landscapes with higher levels of NDVI, but the relationship is quadratic. The highest rate of selection occurs at values of NDVI less than the maximum observed. Results for land cover-class selections coefficients indicate that wet sedge, moist sedge, herbaceous tussock tundra, and shrub tussock tundra are selected at approximately the same rate, while alpine and sparsely vegetated landscapes are selected at a lower rate. Furthermore, the variability in selection by individual caribou for moist sedge and sparsely vegetated landscapes is large relative to the variability in selection of other land cover types. The example analysis illustrates that, while sometimes computationally intense, a Bayesian hierarchical discrete-choice model for resource selection can provide managers with 2 components of population-level inference: average population selection and variability of selection. Both components are necessary to make sound management decisions based on animal selection.

  15. Identification of solid state fermentation degree with FT-NIR spectroscopy: Comparison of wavelength variable selection methods of CARS and SCARS.

    PubMed

    Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai

    2015-01-01

    The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Identification of solid state fermentation degree with FT-NIR spectroscopy: Comparison of wavelength variable selection methods of CARS and SCARS

    NASA Astrophysics Data System (ADS)

    Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai

    2015-10-01

    The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree.

  17. Incorporating abundance information and guiding variable selection for climate-based ensemble forecasting of species' distributional shifts.

    PubMed

    Tanner, Evan P; Papeş, Monica; Elmore, R Dwayne; Fuhlendorf, Samuel D; Davis, Craig A

    2017-01-01

    Ecological niche models (ENMs) have increasingly been used to estimate the potential effects of climate change on species' distributions worldwide. Recently, predictions of species abundance have also been obtained with such models, though knowledge about the climatic variables affecting species abundance is often lacking. To address this, we used a well-studied guild (temperate North American quail) and the Maxent modeling algorithm to compare model performance of three variable selection approaches: correlation/variable contribution (CVC), biological (i.e., variables known to affect species abundance), and random. We then applied the best approach to forecast potential distributions, under future climatic conditions, and analyze future potential distributions in light of available abundance data and presence-only occurrence data. To estimate species' distributional shifts we generated ensemble forecasts using four global circulation models, four representative concentration pathways, and two time periods (2050 and 2070). Furthermore, we present distributional shifts where 75%, 90%, and 100% of our ensemble models agreed. The CVC variable selection approach outperformed our biological approach for four of the six species. Model projections indicated species-specific effects of climate change on future distributions of temperate North American quail. The Gambel's quail (Callipepla gambelii) was the only species predicted to gain area in climatic suitability across all three scenarios of ensemble model agreement. Conversely, the scaled quail (Callipepla squamata) was the only species predicted to lose area in climatic suitability across all three scenarios of ensemble model agreement. Our models projected future loss of areas for the northern bobwhite (Colinus virginianus) and scaled quail in portions of their distributions which are currently areas of high abundance. Climatic variables that influence local abundance may not always scale up to influence species' distributions. Special attention should be given to selecting variables for ENMs, and tests of model performance should be used to validate the choice of variables.

  18. Advances in variable selection methods I: Causal selection methods versus stepwise regression and principal component analysis on data of known and unknown functional relationships

    EPA Science Inventory

    Hydrological predictions at a watershed scale are commonly based on extrapolation and upscaling of hydrological behavior at plot and hillslope scales. Yet, dominant hydrological drivers at a hillslope may not be as dominant at the watershed scale because of the heterogeneity of w...

  19. Lower life satisfaction, active coping and cardiovascular disease risk factors in older African Americans: outcomes of a longitudinal church-based intervention.

    PubMed

    Mendez, Yesenia P; Ralston, Penny A; Wickrama, Kandauda K A S; Bae, Dayoung; Young-Clark, Iris; Ilich, Jasminka Z

    2018-06-01

    This study examined lower life satisfaction, active coping and cardiovascular disease risk factors (diastolic and systolic blood pressure, body mass index, and circumferences) in older African Americans over the phases of an 18-month church-based intervention, using a quasi-experimental design. Participants (n = 89) were 45 years of age and older from six churches (three treatment, three comparison) in North Florida. Lower life satisfaction had a persistent unfavorable effect on weight variables. Active coping showed a direct beneficial effect on selected weight variables. However, active coping was adversely associated with blood pressure, and did not moderate the association between lower life satisfaction and cardiovascular risk factors. The intervention had a beneficial moderating influence on the association between lower life satisfaction and weight variables and on the association between active coping and these variables. Yet, this pattern did not hold for the association between active coping and blood pressure. The relationship of lower life satisfaction and selected cardiovascular risk factors and the positive effect of active coping were established, but findings regarding blood pressure suggest further study is needed.

  20. Analyses of the most influential factors for vibration monitoring of planetary power transmissions in pellet mills by adaptive neuro-fuzzy technique

    NASA Astrophysics Data System (ADS)

    Milovančević, Miloš; Nikolić, Vlastimir; Anđelković, Boban

    2017-01-01

    Vibration-based structural health monitoring is widely recognized as an attractive strategy for early damage detection in civil structures. Vibration monitoring and prediction is important for any system since it can save many unpredictable behaviors of the system. If the vibration monitoring is properly managed, that can ensure economic and safe operations. Potentials for further improvement of vibration monitoring lie in the improvement of current control strategies. One of the options is the introduction of model predictive control. Multistep ahead predictive models of vibration are a starting point for creating a successful model predictive strategy. For the purpose of this article, predictive models of are created for vibration monitoring of planetary power transmissions in pellet mills. The models were developed using the novel method based on ANFIS (adaptive neuro fuzzy inference system). The aim of this study is to investigate the potential of ANFIS for selecting the most relevant variables for predictive models of vibration monitoring of pellet mills power transmission. The vibration data are collected by PIC (Programmable Interface Controller) microcontrollers. The goal of the predictive vibration monitoring of planetary power transmissions in pellet mills is to indicate deterioration in the vibration of the power transmissions before the actual failure occurs. The ANFIS process for variable selection was implemented in order to detect the predominant variables affecting the prediction of vibration monitoring. It was also used to select the minimal input subset of variables from the initial set of input variables - current and lagged variables (up to 11 steps) of vibration. The obtained results could be used for simplification of predictive methods so as to avoid multiple input variables. It was preferable to used models with less inputs because of overfitting between training and testing data. While the obtained results are promising, further work is required in order to get results that could be directly applied in practice.

  1. Influence of BMI and dietary restraint on self-selected portions of prepared meals in US women.

    PubMed

    Labbe, David; Rytz, Andréas; Brunstrom, Jeffrey M; Forde, Ciarán G; Martin, Nathalie

    2017-04-01

    The rise of obesity prevalence has been attributed in part to an increase in food and beverage portion sizes selected and consumed among overweight and obese consumers. Nevertheless, evidence from observations of adults is mixed and contradictory findings might reflect the use of small or unrepresentative samples. The objective of this study was i) to determine the extent to which BMI and dietary restraint predict self-selected portion sizes for a range of commercially available prepared savoury meals and ii) to consider the importance of these variables relative to two previously established predictors of portion selection, expected satiation and expected liking. A representative sample of female consumers (N = 300, range 18-55 years) evaluated 15 frozen savoury prepared meals. For each meal, participants rated their expected satiation and expected liking, and selected their ideal portion using a previously validated computer-based task. Dietary restraint was quantified using the Dutch Eating Behaviour Questionnaire (DEBQ-R). Hierarchical multiple regression was performed on self-selected portions with age, hunger level, and meal familiarity entered as control variables in the first step of the model, expected satiation and expected liking as predictor variables in the second step, and DEBQ-R and BMI as exploratory predictor variables in the third step. The second and third steps significantly explained variance in portion size selection (18% and 4%, respectively). Larger portion selections were significantly associated with lower dietary restraint and with lower expected satiation. There was a positive relationship between BMI and portion size selection (p = 0.06) and between expected liking and portion size selection (p = 0.06). Our discussion considers future research directions, the limited variance explained by our model, and the potential for portion size underreporting by overweight participants. Copyright © 2016 Nestec S.A. Published by Elsevier Ltd.. All rights reserved.

  2. Selection and characterization of naturally occurring single-domain (IgNAR) antibody fragments from immunized sharks by phage display.

    PubMed

    Dooley, Helen; Flajnik, Martin F; Porter, Andrew J

    2003-09-01

    The novel immunoglobulin isotype novel antigen receptor (IgNAR) is found in cartilaginous fish and is composed of a heavy-chain homodimer that does not associate with light chains. The variable regions of IgNAR function as independent domains similar to those found in the heavy-chain immunoglobulins of Camelids. Here, we describe the successful cloning and generation of a phage-displayed, single-domain library based upon the variable domain of IgNAR. Selection of such a library generated from nurse sharks (Ginglymostoma cirratum) immunized with the model antigen hen egg-white lysozyme (HEL) enabled the successful isolation of intact antigen-specific binders matured in vivo. The selected variable domains were shown to be functionally expressed in Escherichia coli, extremely stable, and bind to antigen specifically with an affinity in the nanomolar range. This approach can therefore be considered as an alternative route for the isolation of minimal antigen-binding fragments with favorable characteristics.

  3. Reduced Lung Cancer Mortality With Lower Atmospheric Pressure.

    PubMed

    Merrill, Ray M; Frutos, Aaron

    2018-01-01

    Research has shown that higher altitude is associated with lower risk of lung cancer and improved survival among patients. The current study assessed the influence of county-level atmospheric pressure (a measure reflecting both altitude and temperature) on age-adjusted lung cancer mortality rates in the contiguous United States, with 2 forms of spatial regression. Ordinary least squares regression and geographically weighted regression models were used to evaluate the impact of climate and other selected variables on lung cancer mortality, based on 2974 counties. Atmospheric pressure was significantly positively associated with lung cancer mortality, after controlling for sunlight, precipitation, PM2.5 (µg/m 3 ), current smoker, and other selected variables. Positive county-level β coefficient estimates ( P < .05) for atmospheric pressure were observed throughout the United States, higher in the eastern half of the country. The spatial regression models showed that atmospheric pressure is positively associated with age-adjusted lung cancer mortality rates, after controlling for other selected variables.

  4. [Effect of stock abundance and environmental factors on the recruitment success of small yellow croaker in the East China Sea].

    PubMed

    Liu, Zun-lei; Yuan, Xing-wei; Yang, Lin-lin; Yan, Li-ping; Zhang, Hui; Cheng, Jia-hua

    2015-02-01

    Multiple hypotheses are available to explain recruitment rate. Model selection methods can be used to identify the best model that supports a particular hypothesis. However, using a single model for estimating recruitment success is often inadequate for overexploited population because of high model uncertainty. In this study, stock-recruitment data of small yellow croaker in the East China Sea collected from fishery dependent and independent surveys between 1992 and 2012 were used to examine density-dependent effects on recruitment success. Model selection methods based on frequentist (AIC, maximum adjusted R2 and P-values) and Bayesian (Bayesian model averaging, BMA) methods were applied to identify the relationship between recruitment and environment conditions. Interannual variability of the East China Sea environment was indicated by sea surface temperature ( SST) , meridional wind stress (MWS), zonal wind stress (ZWS), sea surface pressure (SPP) and runoff of Changjiang River ( RCR). Mean absolute error, mean squared predictive error and continuous ranked probability score were calculated to evaluate the predictive performance of recruitment success. The results showed that models structures were not consistent based on three kinds of model selection methods, predictive variables of models were spawning abundance and MWS by AIC, spawning abundance by P-values, spawning abundance, MWS and RCR by maximum adjusted R2. The recruitment success decreased linearly with stock abundance (P < 0.01), suggesting overcompensation effect in the recruitment success might be due to cannibalism or food competition. Meridional wind intensity showed marginally significant and positive effects on the recruitment success (P = 0.06), while runoff of Changjiang River showed a marginally negative effect (P = 0.07). Based on mean absolute error and continuous ranked probability score, predictive error associated with models obtained from BMA was the smallest amongst different approaches, while that from models selected based on the P-value of the independent variables was the highest. However, mean squared predictive error from models selected based on the maximum adjusted R2 was highest. We found that BMA method could improve the prediction of recruitment success, derive more accurate prediction interval and quantitatively evaluate model uncertainty.

  5. Economic evaluation of genomic selection in small ruminants: a sheep meat breeding program.

    PubMed

    Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Lemarié, S; Fugeray-Scarbel, A; Elsen, J M

    2016-06-01

    Recent genomic evaluation studies using real data and predicting genetic gain by modeling breeding programs have reported moderate expected benefits from the replacement of classic selection schemes by genomic selection (GS) in small ruminants. The objectives of this study were to compare the cost, monetary genetic gain and economic efficiency of classic selection and GS schemes in the meat sheep industry. Deterministic methods were used to model selection based on multi-trait indices from a sheep meat breeding program. Decisional variables related to male selection candidates and progeny testing were optimized to maximize the annual monetary genetic gain (AMGG), that is, a weighted sum of meat and maternal traits annual genetic gains. For GS, a reference population of 2000 individuals was assumed and genomic information was available for evaluation of male candidates only. In the classic selection scheme, males breeding values were estimated from own and offspring phenotypes. In GS, different scenarios were considered, differing by the information used to select males (genomic only, genomic+own performance, genomic+offspring phenotypes). The results showed that all GS scenarios were associated with higher total variable costs than classic selection (if the cost of genotyping was 123 euros/animal). In terms of AMGG and economic returns, GS scenarios were found to be superior to classic selection only if genomic information was combined with their own meat phenotypes (GS-Pheno) or with their progeny test information. The predicted economic efficiency, defined as returns (proportional to number of expressions of AMGG in the nucleus and commercial flocks) minus total variable costs, showed that the best GS scenario (GS-Pheno) was up to 15% more efficient than classic selection. For all selection scenarios, optimization increased the overall AMGG, returns and economic efficiency. As a conclusion, our study shows that some forms of GS strategies are more advantageous than classic selection, provided that GS is already initiated (i.e. the initial reference population is available). Optimizing decisional variables of the classic selection scheme could be of greater benefit than including genomic information in optimized designs.

  6. Towards an automatic statistical model for seasonal precipitation prediction and its application to Central and South Asian headwater catchments

    NASA Astrophysics Data System (ADS)

    Gerlitz, Lars; Gafurov, Abror; Apel, Heiko; Unger-Sayesteh, Katy; Vorogushyn, Sergiy; Merz, Bruno

    2016-04-01

    Statistical climate forecast applications typically utilize a small set of large scale SST or climate indices, such as ENSO, PDO or AMO as predictor variables. If the predictive skill of these large scale modes is insufficient, specific predictor variables such as customized SST patterns are frequently included. Hence statistically based climate forecast models are either based on a fixed number of climate indices (and thus might not consider important predictor variables) or are highly site specific and barely transferable to other regions. With the aim of developing an operational seasonal forecast model, which is easily transferable to any region in the world, we present a generic data mining approach which automatically selects potential predictors from gridded SST observations and reanalysis derived large scale atmospheric circulation patterns and generates robust statistical relationships with posterior precipitation anomalies for user selected target regions. Potential predictor variables are derived by means of a cellwise correlation analysis of precipitation anomalies with gridded global climate variables under consideration of varying lead times. Significantly correlated grid cells are subsequently aggregated to predictor regions by means of a variability based cluster analysis. Finally for every month and lead time, an individual random forest based forecast model is automatically calibrated and evaluated by means of the preliminary generated predictor variables. The model is exemplarily applied and evaluated for selected headwater catchments in Central and South Asia. Particularly the for winter and spring precipitation (which is associated with westerly disturbances in the entire target domain) the model shows solid results with correlation coefficients up to 0.7, although the variability of precipitation rates is highly underestimated. Likewise for the monsoonal precipitation amounts in the South Asian target areas a certain skill of the model could be detected. The skill of the model for the dry summer season in Central Asia and the transition seasons over South Asia is found to be low. A sensitivity analysis by means on well known climate indices reveals the major large scale controlling mechanisms for the seasonal precipitation climate of each target area. For the Central Asian target areas, both, the El Nino Southern Oscillation and the North Atlantic Oscillation are identified as important controlling factors for precipitation totals during moist spring season. Drought conditions are found to be triggered by a warm ENSO phase in combination with a positive phase of the NAO. For the monsoonal summer precipitation amounts over Southern Asia, the model suggests a distinct negative response to El Nino events.

  7. Random forest feature selection approach for image segmentation

    NASA Astrophysics Data System (ADS)

    Lefkovits, László; Lefkovits, Szidónia; Emerich, Simina; Vaida, Mircea Florin

    2017-03-01

    In the field of image segmentation, discriminative models have shown promising performance. Generally, every such model begins with the extraction of numerous features from annotated images. Most authors create their discriminative model by using many features without using any selection criteria. A more reliable model can be built by using a framework that selects the important variables, from the point of view of the classification, and eliminates the unimportant once. In this article we present a framework for feature selection and data dimensionality reduction. The methodology is built around the random forest (RF) algorithm and its variable importance evaluation. In order to deal with datasets so large as to be practically unmanageable, we propose an algorithm based on RF that reduces the dimension of the database by eliminating irrelevant features. Furthermore, this framework is applied to optimize our discriminative model for brain tumor segmentation.

  8. Recurrent personality dimensions in inclusive lexical studies: indications for a big six structure.

    PubMed

    Saucier, Gerard

    2009-10-01

    Previous evidence for both the Big Five and the alternative six-factor model has been drawn from lexical studies with relatively narrow selections of attributes. This study examined factors from previous lexical studies using a wider selection of attributes in 7 languages (Chinese, English, Filipino, Greek, Hebrew, Spanish, and Turkish) and found 6 recurrent factors, each with common conceptual content across most of the studies. The previous narrow-selection-based six-factor model outperformed the Big Five in capturing the content of the 6 recurrent wideband factors. Adjective markers of the 6 recurrent wideband factors showed substantial incremental prediction of important criterion variables over and above the Big Five. Correspondence between wideband 6 and narrowband 6 factors indicate they are variants of a "Big Six" model that is more general across variable-selection procedures and may be more general across languages and populations.

  9. Do birds of a feather flock together? The variable bases for African American, Asian American, and European American adolescents' selection of similar friends.

    PubMed

    Hamm, J V

    2000-03-01

    Variability in adolescent-friend similarity is documented in a diverse sample of African American, Asian American, and European American adolescents. Similarity was greatest for substance use, modest for academic orientations, and low for ethnic identity. Compared with Asian American and European American adolescents, African American adolescents chose friends who were less similar with respect to academic orientation or substance use but more similar with respect to ethnic identity. For all three ethnic groups, personal endorsement of the dimension in question and selection of cross-ethnic-group friends heightened similarity. Similarity was a relative rather than an absolute selection criterion: Adolescents did not choose friends with identical orientations. These findings call for a comprehensive theory of friendship selection sensitive to diversity in adolescents' experiences. Implications for peer influence and self-development are discussed.

  10. Limitations to mapping habitat-use areas in changing landscapes using the Mahalanobis distance statistic

    USGS Publications Warehouse

    Knick, Steven T.; Rotenberry, J.T.

    1998-01-01

    We tested the potential of a GIS mapping technique, using a resource selection model developed for black-tailed jackrabbits (Lepus californicus) and based on the Mahalanobis distance statistic, to track changes in shrubsteppe habitats in southwestern Idaho. If successful, the technique could be used to predict animal use areas, or those undergoing change, in different regions from the same selection function and variables without additional sampling. We determined the multivariate mean vector of 7 GIS variables that described habitats used by jackrabbits. We then ranked the similarity of all cells in the GIS coverage from their Mahalanobis distance to the mean habitat vector. The resulting map accurately depicted areas where we sighted jackrabbits on verification surveys. We then simulated an increase in shrublands (which are important habitats). Contrary to expectation, the new configurations were classified as lower similarity relative to the original mean habitat vector. Because the selection function is based on a unimodal mean, any deviation, even if biologically positive, creates larger Malanobis distances and lower similarity values. We recommend the Mahalanobis distance technique for mapping animal use areas when animals are distributed optimally, the landscape is well-sampled to determine the mean habitat vector, and distributions of the habitat variables does not change.

  11. Predicting Conflict Management Based on Organizational Commitment and Selected Demographic Variables

    ERIC Educational Resources Information Center

    Balay, Refik

    2007-01-01

    The purpose of this study is to investigate the relationship between different levels of organizational commitment (compliance, identification, internalization) of teachers and their different conflict management strategies (compromising, problem solving, forcing, yielding, avoiding). Based on a questionnaire survey of 418 teachers, this study…

  12. Proxies for soil organic carbon derived from remote sensing

    NASA Astrophysics Data System (ADS)

    Rasel, S. M. M.; Groen, T. A.; Hussin, Y. A.; Diti, I. J.

    2017-07-01

    The possibility of carbon storage in soils is of interest because compared to vegetation it contains more carbon. Estimation of soil carbon through remote sensing based techniques can be a cost effective approach, but is limited by available methods. This study aims to develop a model based on remotely sensed variables (elevation, forest type and above ground biomass) to estimate soil carbon stocks. Field observations on soil organic carbon, species composition, and above ground biomass were recorded in the subtropical forest of Chitwan, Nepal. These variables were also estimated using LiDAR data and a WorldView 2 image. Above ground biomass was estimated from the LiDAR image using a novel approach where the image was segmented to identify individual trees, and for these trees estimates of DBH and Height were made. Based on AIC (Akaike Information Criterion) a regression model with above ground biomass derived from LiDAR data, and forest type derived from WorldView 2 imagery was selected to estimate soil organic carbon (SOC) stocks. The selected model had a coefficient of determination (R2) of 0.69. This shows the scope of estimating SOC with remote sensing derived variables in sub-tropical forests.

  13. Regional regression equations for estimation of natural streamflow statistics in Colorado

    USGS Publications Warehouse

    Capesius, Joseph P.; Stephens, Verlin C.

    2009-01-01

    The U.S. Geological Survey (USGS), in cooperation with the Colorado Water Conservation Board and the Colorado Department of Transportation, developed regional regression equations for estimation of various streamflow statistics that are representative of natural streamflow conditions at ungaged sites in Colorado. The equations define the statistical relations between streamflow statistics (response variables) and basin and climatic characteristics (predictor variables). The equations were developed using generalized least-squares and weighted least-squares multilinear regression reliant on logarithmic variable transformation. Streamflow statistics were derived from at least 10 years of streamflow data through about 2007 from selected USGS streamflow-gaging stations in the study area that are representative of natural-flow conditions. Basin and climatic characteristics used for equation development are drainage area, mean watershed elevation, mean watershed slope, percentage of drainage area above 7,500 feet of elevation, mean annual precipitation, and 6-hour, 100-year precipitation. For each of five hydrologic regions in Colorado, peak-streamflow equations that are based on peak-streamflow data from selected stations are presented for the 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year instantaneous-peak streamflows. For four of the five hydrologic regions, equations based on daily-mean streamflow data from selected stations are presented for 7-day minimum 2-, 10-, and 50-year streamflows and for 7-day maximum 2-, 10-, and 50-year streamflows. Other equations presented for the same four hydrologic regions include those for estimation of annual- and monthly-mean streamflow and streamflow-duration statistics for exceedances of 10, 25, 50, 75, and 90 percent. All equations are reported along with salient diagnostic statistics, ranges of basin and climatic characteristics on which each equation is based, and commentary of potential bias, which is not otherwise removed by log-transformation of the variables of the equations from interpretation of residual plots. The predictor-variable ranges can be used to assess equation applicability for ungaged sites in Colorado.

  14. Automatic design of basin-specific drought indexes for highly regulated water systems

    NASA Astrophysics Data System (ADS)

    Zaniolo, Marta; Giuliani, Matteo; Castelletti, Andrea Francesco; Pulido-Velazquez, Manuel

    2018-04-01

    Socio-economic costs of drought are progressively increasing worldwide due to undergoing alterations of hydro-meteorological regimes induced by climate change. Although drought management is largely studied in the literature, traditional drought indexes often fail at detecting critical events in highly regulated systems, where natural water availability is conditioned by the operation of water infrastructures such as dams, diversions, and pumping wells. Here, ad hoc index formulations are usually adopted based on empirical combinations of several, supposed-to-be significant, hydro-meteorological variables. These customized formulations, however, while effective in the design basin, can hardly be generalized and transferred to different contexts. In this study, we contribute FRIDA (FRamework for Index-based Drought Analysis), a novel framework for the automatic design of basin-customized drought indexes. In contrast to ad hoc empirical approaches, FRIDA is fully automated, generalizable, and portable across different basins. FRIDA builds an index representing a surrogate of the drought conditions of the basin, computed by combining all the relevant available information about the water circulating in the system identified by means of a feature extraction algorithm. We used the Wrapper for Quasi-Equally Informative Subset Selection (W-QEISS), which features a multi-objective evolutionary algorithm to find Pareto-efficient subsets of variables by maximizing the wrapper accuracy, minimizing the number of selected variables, and optimizing relevance and redundancy of the subset. The preferred variable subset is selected among the efficient solutions and used to formulate the final index according to alternative model structures. We apply FRIDA to the case study of the Jucar river basin (Spain), a drought-prone and highly regulated Mediterranean water resource system, where an advanced drought management plan relying on the formulation of an ad hoc state index is used for triggering drought management measures. The state index was constructed empirically with a trial-and-error process begun in the 1980s and finalized in 2007, guided by the experts from the Confederación Hidrográfica del Júcar (CHJ). Our results show that the automated variable selection outcomes align with CHJ's 25-year-long empirical refinement. In addition, the resultant FRIDA index outperforms the official State Index in terms of accuracy in reproducing the target variable and cardinality of the selected inputs set.

  15. Resolving the Conflict Between Associative Overdominance and Background Selection

    PubMed Central

    Zhao, Lei; Charlesworth, Brian

    2016-01-01

    In small populations, genetic linkage between a polymorphic neutral locus and loci subject to selection, either against partially recessive mutations or in favor of heterozygotes, may result in an apparent selective advantage to heterozygotes at the neutral locus (associative overdominance) and a retardation of the rate of loss of variability by genetic drift at this locus. In large populations, selection against deleterious mutations has previously been shown to reduce variability at linked neutral loci (background selection). We describe analytical, numerical, and simulation studies that shed light on the conditions under which retardation vs. acceleration of loss of variability occurs at a neutral locus linked to a locus under selection. We consider a finite, randomly mating population initiated from an infinite population in equilibrium at a locus under selection. With mutation and selection, retardation occurs only when S, the product of twice the effective population size and the selection coefficient, is of order 1. With S >> 1, background selection always causes an acceleration of loss of variability. Apparent heterozygote advantage at the neutral locus is, however, always observed when mutations are partially recessive, even if there is an accelerated rate of loss of variability. With heterozygote advantage at the selected locus, loss of variability is nearly always retarded. The results shed light on experiments on the loss of variability at marker loci in laboratory populations and on the results of computer simulations of the effects of multiple selected loci on neutral variability. PMID:27182952

  16. Vis-NIR spectrometric determination of Brix and sucrose in sugar production samples using kernel partial least squares with interval selection based on the successive projections algorithm.

    PubMed

    de Almeida, Valber Elias; de Araújo Gomes, Adriano; de Sousa Fernandes, David Douglas; Goicoechea, Héctor Casimiro; Galvão, Roberto Kawakami Harrop; Araújo, Mario Cesar Ugulino

    2018-05-01

    This paper proposes a new variable selection method for nonlinear multivariate calibration, combining the Successive Projections Algorithm for interval selection (iSPA) with the Kernel Partial Least Squares (Kernel-PLS) modelling technique. The proposed iSPA-Kernel-PLS algorithm is employed in a case study involving a Vis-NIR spectrometric dataset with complex nonlinear features. The analytical problem consists of determining Brix and sucrose content in samples from a sugar production system, on the basis of transflectance spectra. As compared to full-spectrum Kernel-PLS, the iSPA-Kernel-PLS models involve a smaller number of variables and display statistically significant superiority in terms of accuracy and/or bias in the predictions. Published by Elsevier B.V.

  17. The effect of artificial selection on phenotypic plasticity in maize.

    PubMed

    Gage, Joseph L; Jarquin, Diego; Romay, Cinta; Lorenz, Aaron; Buckler, Edward S; Kaeppler, Shawn; Alkhalifah, Naser; Bohn, Martin; Campbell, Darwin A; Edwards, Jode; Ertl, David; Flint-Garcia, Sherry; Gardiner, Jack; Good, Byron; Hirsch, Candice N; Holland, Jim; Hooker, David C; Knoll, Joseph; Kolkman, Judith; Kruger, Greg; Lauter, Nick; Lawrence-Dill, Carolyn J; Lee, Elizabeth; Lynch, Jonathan; Murray, Seth C; Nelson, Rebecca; Petzoldt, Jane; Rocheford, Torbert; Schnable, James; Schnable, Patrick S; Scully, Brian; Smith, Margaret; Springer, Nathan M; Srinivasan, Srikant; Walton, Renee; Weldekidan, Teclemariam; Wisser, Randall J; Xu, Wenwei; Yu, Jianming; de Leon, Natalia

    2017-11-07

    Remarkable productivity has been achieved in crop species through artificial selection and adaptation to modern agronomic practices. Whether intensive selection has changed the ability of improved cultivars to maintain high productivity across variable environments is unknown. Understanding the genetic control of phenotypic plasticity and genotype by environment (G × E) interaction will enhance crop performance predictions across diverse environments. Here we use data generated from the Genomes to Fields (G2F) Maize G × E project to assess the effect of selection on G × E variation and characterize polymorphisms associated with plasticity. Genomic regions putatively selected during modern temperate maize breeding explain less variability for yield G × E than unselected regions, indicating that improvement by breeding may have reduced G × E of modern temperate cultivars. Trends in genomic position of variants associated with stability reveal fewer genic associations and enrichment of variants 0-5000 base pairs upstream of genes, hypothetically due to control of plasticity by short-range regulatory elements.

  18. ARCAS (ACACIA Regional Climate-data Access System) -- a Web Access System for Climate Model Data Access, Visualization and Comparison

    NASA Astrophysics Data System (ADS)

    Hakkarinen, C.; Brown, D.; Callahan, J.; hankin, S.; de Koningh, M.; Middleton-Link, D.; Wigley, T.

    2001-05-01

    A Web-based access system to climate model output data sets for intercomparison and analysis has been produced, using the NOAA-PMEL developed Live Access Server software as host server and Ferret as the data serving and visualization engine. Called ARCAS ("ACACIA Regional Climate-data Access System"), and publicly accessible at http://dataserver.ucar.edu/arcas, the site currently serves climate model outputs from runs of the NCAR Climate System Model for the 21st century, for Business as Usual and Stabilization of Greenhouse Gas Emission scenarios. Users can select, download, and graphically display single variables or comparisons of two variables from either or both of the CSM model runs, averaged for monthly, seasonal, or annual time resolutions. The time length of the averaging period, and the geographical domain for download and display, are fully selectable by the user. A variety of arithmetic operations on the data variables can be computed "on-the-fly", as defined by the user. Expansions of the user-selectable options for defining analysis options, and for accessing other DOD-compatible ("Distributed Ocean Data System-compatible") data sets, residing at locations other than the NCAR hardware server on which ARCAS operates, are planned for this year. These expansions are designed to allow users quick and easy-to-operate web-based access to the largest possible selection of climate model output data sets available throughout the world.

  19. Optimal timing in biological processes

    USGS Publications Warehouse

    Williams, B.K.; Nichols, J.D.

    1984-01-01

    A general approach for obtaining solutions to a class of biological optimization problems is provided. The general problem is one of determining the appropriate time to take some action, when the action can be taken only once during some finite time frame. The approach can also be extended to cover a number of other problems involving animal choice (e.g., mate selection, habitat selection). Returns (assumed to index fitness) are treated as random variables with time-specific distributions, and can be either observable or unobservable at the time action is taken. In the case of unobservable returns, the organism is assumed to base decisions on some ancillary variable that is associated with returns. Optimal policies are derived for both situations and their properties are discussed. Various extensions are also considered, including objective functions based on functions of returns other than the mean, nonmonotonic relationships between the observable variable and returns; possible death of the organism before action is taken; and discounting of future returns. A general feature of the optimal solutions for many of these problems is that an organism should be very selective (i.e., should act only when returns or expected returns are relatively high) at the beginning of the time frame and should become less and less selective as time progresses. An example of the application of optimal timing to a problem involving the timing of bird migration is discussed, and a number of other examples for which the approach is applicable are described.

  20. Effect of breakfast on selected serum and cardiovascular variables

    NASA Technical Reports Server (NTRS)

    Frey, Mary A. B.; Merz, Marion P.; Hoffler, G. W.

    1992-01-01

    In view of the objections of many subjects to overnight fasting prior to their blood being drawn for analyses, the effect of eating breakfast on the results of subsequent blood analyses of selected blood constituents and on cardiovascular variables was investigated in 47 men and 34 women who were subjected to blood analyses on two occasions, one week apart: once fasting and once after breakfast. Results suggest that subjects need not fast overnight before blood is being drawn for determinations of the HDL-C levels, hemoglobin, hematocrit, total cholesterol, or phosphorus. However, based on other studies, it is suggested breakfast should not have a high content of fat.

  1. Cost-effectiveness of different strategies for selecting and treating individuals at increased risk of osteoporosis or osteopenia: a systematic review.

    PubMed

    Müller, Dirk; Pulm, Jannis; Gandjour, Afschin

    2012-01-01

    To compare cost-effectiveness modeling analyses of strategies to prevent osteoporotic and osteopenic fractures either based on fixed thresholds using bone mineral density or based on variable thresholds including bone mineral density and clinical risk factors. A systematic review was performed by using the MEDLINE database and reference lists from previous reviews. On the basis of predefined inclusion/exclusion criteria, we identified relevant studies published since January 2006. Articles included for the review were assessed for their methodological quality and results. The literature search resulted in 24 analyses, 14 of them using a fixed-threshold approach and 10 using a variable-threshold approach. On average, 70% of the criteria for methodological quality were fulfilled, but almost half of the analyses did not include medication adherence in the base case. The results of variable-threshold strategies were more homogeneous and showed more favorable incremental cost-effectiveness ratios compared with those based on a fixed threshold with bone mineral density. For analyses with fixed thresholds, incremental cost-effectiveness ratios varied from €80,000 per quality-adjusted life-year in women aged 55 years to cost saving in women aged 80 years. For analyses with variable thresholds, the range was €47,000 to cost savings. Risk assessment using variable thresholds appears to be more cost-effective than selecting high-risk individuals by fixed thresholds. Although the overall quality of the studies was fairly good, future economic analyses should further improve their methods, particularly in terms of including more fracture types, incorporating medication adherence, and including or discussing unrelated costs during added life-years. Copyright © 2012 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

  2. Novel high-resolution computed tomography-based radiomic classifier for screen-identified pulmonary nodules in the National Lung Screening Trial.

    PubMed

    Peikert, Tobias; Duan, Fenghai; Rajagopalan, Srinivasan; Karwoski, Ronald A; Clay, Ryan; Robb, Richard A; Qin, Ziling; Sicks, JoRean; Bartholmai, Brian J; Maldonado, Fabien

    2018-01-01

    Optimization of the clinical management of screen-detected lung nodules is needed to avoid unnecessary diagnostic interventions. Herein we demonstrate the potential value of a novel radiomics-based approach for the classification of screen-detected indeterminate nodules. Independent quantitative variables assessing various radiologic nodule features such as sphericity, flatness, elongation, spiculation, lobulation and curvature were developed from the NLST dataset using 726 indeterminate nodules (all ≥ 7 mm, benign, n = 318 and malignant, n = 408). Multivariate analysis was performed using least absolute shrinkage and selection operator (LASSO) method for variable selection and regularization in order to enhance the prediction accuracy and interpretability of the multivariate model. The bootstrapping method was then applied for the internal validation and the optimism-corrected AUC was reported for the final model. Eight of the originally considered 57 quantitative radiologic features were selected by LASSO multivariate modeling. These 8 features include variables capturing Location: vertical location (Offset carina centroid z), Size: volume estimate (Minimum enclosing brick), Shape: flatness, Density: texture analysis (Score Indicative of Lesion/Lung Aggression/Abnormality (SILA) texture), and surface characteristics: surface complexity (Maximum shape index and Average shape index), and estimates of surface curvature (Average positive mean curvature and Minimum mean curvature), all with P<0.01. The optimism-corrected AUC for these 8 features is 0.939. Our novel radiomic LDCT-based approach for indeterminate screen-detected nodule characterization appears extremely promising however independent external validation is needed.

  3. Improving permafrost distribution modelling using feature selection algorithms

    NASA Astrophysics Data System (ADS)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2016-04-01

    The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its overall operation. It operates by constructing a large collection of decorrelated classification trees, and then predicts the permafrost occurrence through a majority vote. With the so-called out-of-bag (OOB) error estimate, the classification of permafrost data can be validated as well as the contribution of each predictor can be assessed. The performances of compared permafrost distribution models (computed on independent testing sets) increased with the application of FS algorithms on the original dataset and irrelevant or redundant variables were removed. As a consequence, the process provided faster and more cost-effective predictors and a better understanding of the underlying structures residing in permafrost data. Our work demonstrates the usefulness of a feature selection step prior to applying a machine learning algorithm. In fact, permafrost predictors could be ranked not only based on their heuristic and subjective importance (expert knowledge), but also based on their statistical relevance in relation of the permafrost distribution.

  4. Focus of attention in an activity-based scheduler

    NASA Technical Reports Server (NTRS)

    Sadeh, Norman; Fox, Mark S.

    1989-01-01

    Earlier research in job shop scheduling has demonstrated the advantages of opportunistically combining order-based and resource-based scheduling techniques. An even more flexible approach is investigated where each activity is considered a decision point by itself. Heuristics to opportunistically select the next decision point on which to focus attention (i.e., variable ordering heuristics) and the next decision to be tried at this point (i.e., value ordering heuristics) are described that probabilistically account for both activity precedence and resource requirement interactions. Preliminary experimental results indicate that the variable ordering heuristic greatly increases search efficiency. While least constraining value ordering heuristics have been advocated in the literature, the experimental results suggest that other value ordering heuristics combined with our variable-ordering heuristic can produce much better schedules without significantly increasing search.

  5. Hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) and its application to predicting key process variables.

    PubMed

    He, Yan-Lin; Xu, Yuan; Geng, Zhi-Qiang; Zhu, Qun-Xiong

    2016-03-01

    In this paper, a hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) is proposed. Firstly, an improved functional link neural network with small norm of expanded weights and high input-output correlation (SNEWHIOC-FLNN) was proposed for enhancing the generalization performance of FLNN. Unlike the traditional FLNN, the expanded variables of the original inputs are not directly used as the inputs in the proposed SNEWHIOC-FLNN model. The original inputs are attached to some small norm of expanded weights. As a result, the correlation coefficient between some of the expanded variables and the outputs is enhanced. The larger the correlation coefficient is, the more relevant the expanded variables tend to be. In the end, the expanded variables with larger correlation coefficient are selected as the inputs to improve the performance of the traditional FLNN. In order to test the proposed SNEWHIOC-FLNN model, three UCI (University of California, Irvine) regression datasets named Housing, Concrete Compressive Strength (CCS), and Yacht Hydro Dynamics (YHD) are selected. Then a hybrid model based on the improved FLNN integrating with partial least square (IFLNN-PLS) was built. In IFLNN-PLS model, the connection weights are calculated using the partial least square method but not the error back propagation algorithm. Lastly, IFLNN-PLS was developed as an intelligent measurement model for accurately predicting the key variables in the Purified Terephthalic Acid (PTA) process and the High Density Polyethylene (HDPE) process. Simulation results illustrated that the IFLNN-PLS could significant improve the prediction performance. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  6. Predictability of Seasonal Rainfall over the Greater Horn of Africa

    NASA Astrophysics Data System (ADS)

    Ngaina, J. N.

    2016-12-01

    The El Nino-Southern Oscillation (ENSO) is a primary mode of climate variability in the Greater of Africa (GHA). The expected impacts of climate variability and change on water, agriculture, and food resources in GHA underscore the importance of reliable and accurate seasonal climate predictions. The study evaluated different model selection criteria which included the Coefficient of determination (R2), Akaike's Information Criterion (AIC), Bayesian Information Criterion (BIC), and the Fisher information approximation (FIA). A forecast scheme based on the optimal model was developed to predict the October-November-December (OND) and March-April-May (MAM) rainfall. The predictability of GHA rainfall based on ENSO was quantified based on composite analysis, correlations and contingency tables. A test for field-significance considering the properties of finiteness and interdependence of the spatial grid was applied to avoid correlations by chance. The study identified FIA as the optimal model selection criterion. However, complex model selection criteria (FIA followed by BIC) performed better compared to simple approach (R2 and AIC). Notably, operational seasonal rainfall predictions over the GHA makes of simple model selection procedures e.g. R2. Rainfall is modestly predictable based on ENSO during OND and MAM seasons. El Nino typically leads to wetter conditions during OND and drier conditions during MAM. The correlations of ENSO indices with rainfall are statistically significant for OND and MAM seasons. Analysis based on contingency tables shows higher predictability of OND rainfall with the use of ENSO indices derived from the Pacific and Indian Oceans sea surfaces showing significant improvement during OND season. The predictability based on ENSO for OND rainfall is robust on a decadal scale compared to MAM. An ENSO-based scheme based on an optimal model selection criterion can thus provide skillful rainfall predictions over GHA. This study concludes that the negative phase of ENSO (La Niña) leads to dry conditions while the positive phase of ENSO (El Niño) anticipates enhanced wet conditions

  7. The Geometry of Selected U.S. Tidal Inlets.

    DTIC Science & Technology

    1980-05-01

    wrong cluster. Table 6. Descriminant analysis results for three variables (DCC, EM2, and EM3). a. Coefficients for di i iminant functions based on three...47.95/ 3 07.72 0.975, 4,1 0 .310. 1 C1297 0,)1r, 55,94 o, D, C 24.07: 1. C5, 75 Iable-9. Descriminant analysis results for six variables (DMX, DMA, W

  8. The Effects of Age, Years of Experience, and Type of Experience in the Teacher Selection Process

    ERIC Educational Resources Information Center

    Place, A. William; Vail, David S.

    2013-01-01

    Paper screening in the pre-selection process of hiring teachers has been an established line of research starting with Young and Allison (1982). Administrators were asked to rate hypothetical candidates based on the information provided by the researcher. The dependent variable in several of these studies (e.g. Young & Fox, 2002; Young & Schmidt,…

  9. Sexual differences in telomere selection in the wild.

    PubMed

    Olsson, Mats; Pauliny, Angela; Wapstra, Erik; Uller, Tobias; Schwartz, Tonia; Miller, Emily; Blomqvist, Donald

    2011-05-01

    Telomere length is restored primarily through the action of the reverse transcriptase telomerase, which may contribute to a prolonged lifespan in some but not all species and may result in longer telomeres in one sex than the other. To what extent this is an effect of proximate mechanisms (e.g. higher stress in males, higher oestradiol/oestrogen levels in females), or is an evolved adaptation (stronger selection for telomere length in one sex), usually remains unknown. Sand lizard (Lacerta agilis) females have longer telomeres than males and better maintain telomere length through life than males do. We also show that telomere length more strongly contributes to life span and lifetime reproductive success in females than males and that telomere length is under sexually diversifying selection in the wild. Finally, we performed a selection analysis with number of recruited offspring into the adult population as a response variable with telomere length, life span and body size as predictor variables. This showed significant differences in selection pressures between the sexes with strong ongoing selection in females, with these three predictors explaining 63% of the variation in recruitment. Thus, the sexually dimorphic telomere dynamics with longer telomeres in females is a result of past and ongoing selection in sand lizards. Finally, we compared the results from our selection analyses based on Telometric-derived data to the results based on data generated by the software ImageJ. ImageJ resulted in shorter average telomere length, but this difference had virtually no qualitative effect on the patterns of ongoing selection. © 2011 Blackwell Publishing Ltd.

  10. What variables influence the ability of an AFO to improve function and when are they indicated?

    PubMed

    Malas, Bryan S

    2011-05-01

    Children with spina bifida often present with functional deficits of the lower limb associated with neurosegmental lesion levels and require orthotic management. The most used orthosis for children with spina bifida is the ankle-foot orthosis (AFO). The AFO improves ambulation and reduces energy cost while walking. Despite the apparent benefits of using an AFO, limited evidence documents the influence of factors predicting the ability of an AFO to improve function and when they are indicated. These variables include AFO design, footwear, AFO-footwear combination, and data acquisition. When these variables are not adequately considered in clinical decision-making, there is a risk the AFO will be abandoned prematurely or the patient's stability, function, and safety compromised. The purposes of this study are to (1) describe the functional deficits based on lesion levels; (2) identify and describe variables that influence the ability of an AFO to control deformities; and (3) describe what variables are indicated for the AFO to control knee flexion during stance, hyperpronation, and valgus stress at the knee. A selective literature review was undertaken searching MEDLINE and Cochrane databases using terms related to "orthosis" and "spina bifida." Based on previous studies and gait analysis data, suggestions can be made regarding material selection/geometric configuration, sagittal alignment, footplate length, and trim lines of an AFO for reducing knee flexion, hyperpronation, and valgus stress at the knee. Further research is required to determine what variables allow an AFO to improve function.

  11. Genetic signatures of natural selection in a model invasive ascidian

    PubMed Central

    Lin, Yaping; Chen, Yiyong; Yi, Changho; Fong, Jonathan J.; Kim, Won; Rius, Marc; Zhan, Aibin

    2017-01-01

    Invasive species represent promising models to study species’ responses to rapidly changing environments. Although local adaptation frequently occurs during contemporary range expansion, the associated genetic signatures at both population and genomic levels remain largely unknown. Here, we use genome-wide gene-associated microsatellites to investigate genetic signatures of natural selection in a model invasive ascidian, Ciona robusta. Population genetic analyses of 150 individuals sampled in Korea, New Zealand, South Africa and Spain showed significant genetic differentiation among populations. Based on outlier tests, we found high incidence of signatures of directional selection at 19 loci. Hitchhiking mapping analyses identified 12 directional selective sweep regions, and all selective sweep windows on chromosomes were narrow (~8.9 kb). Further analyses indentified 132 candidate genes under selection. When we compared our genetic data and six crucial environmental variables, 16 putatively selected loci showed significant correlation with these environmental variables. This suggests that the local environmental conditions have left significant signatures of selection at both population and genomic levels. Finally, we identified “plastic” genomic regions and genes that are promising regions to investigate evolutionary responses to rapid environmental change in C. robusta. PMID:28266616

  12. Predicting cognitive function from clinical measures of physical function and health status in older adults.

    PubMed

    Bolandzadeh, Niousha; Kording, Konrad; Salowitz, Nicole; Davis, Jennifer C; Hsu, Liang; Chan, Alison; Sharma, Devika; Blohm, Gunnar; Liu-Ambrose, Teresa

    2015-01-01

    Current research suggests that the neuropathology of dementia-including brain changes leading to memory impairment and cognitive decline-is evident years before the onset of this disease. Older adults with cognitive decline have reduced functional independence and quality of life, and are at greater risk for developing dementia. Therefore, identifying biomarkers that can be easily assessed within the clinical setting and predict cognitive decline is important. Early recognition of cognitive decline could promote timely implementation of preventive strategies. We included 89 community-dwelling adults aged 70 years and older in our study, and collected 32 measures of physical function, health status and cognitive function at baseline. We utilized an L1-L2 regularized regression model (elastic net) to identify which of the 32 baseline measures were strongly predictive of cognitive function after one year. We built three linear regression models: 1) based on baseline cognitive function, 2) based on variables consistently selected in every cross-validation loop, and 3) a full model based on all the 32 variables. Each of these models was carefully tested with nested cross-validation. Our model with the six variables consistently selected in every cross-validation loop had a mean squared prediction error of 7.47. This number was smaller than that of the full model (115.33) and the model with baseline cognitive function (7.98). Our model explained 47% of the variance in cognitive function after one year. We built a parsimonious model based on a selected set of six physical function and health status measures strongly predictive of cognitive function after one year. In addition to reducing the complexity of the model without changing the model significantly, our model with the top variables improved the mean prediction error and R-squared. These six physical function and health status measures can be easily implemented in a clinical setting.

  13. Multivariate Analysis, Retrieval, and Storage System (MARS). Volume 6: MARS System - A Sample Problem (Gross Weight of Subsonic Transports)

    NASA Technical Reports Server (NTRS)

    Hague, D. S.; Woodbury, N. W.

    1975-01-01

    The Mars system is a tool for rapid prediction of aircraft or engine characteristics based on correlation-regression analysis of past designs stored in the data bases. An example of output obtained from the MARS system, which involves derivation of an expression for gross weight of subsonic transport aircraft in terms of nine independent variables is given. The need is illustrated for careful selection of correlation variables and for continual review of the resulting estimation equations. For Vol. 1, see N76-10089.

  14. Future of the Pacific: Inspiring the Next Generation of Scientists and Engineers Through Place-Based Problem-Solving Using Innovative STEM Curriculum and Technology Tools

    DTIC Science & Technology

    2016-03-30

    lesson 8.4, " Wind Turbine Design Inquiry." 13 The goal of her project was to combine a1t and science in project-based learning. Although pmt of an...challenged to design, test, and redesign wind turbine blades, defining variables and measuring performance. Their goal was to optimize perfonnance through...hydroelectric. In each model there are more than one variable. For example, the wind farm activity enables the user to select number of turbines

  15. Reducing Dropout in Treatment for Depression: Translating Dropout Predictors Into Individualized Treatment Recommendations.

    PubMed

    Zilcha-Mano, Sigal; Keefe, John R; Chui, Harold; Rubin, Avinadav; Barrett, Marna S; Barber, Jacques P

    2016-12-01

    Premature discontinuation of therapy is a widespread problem that hampers the delivery of mental health treatment. A high degree of variability has been found among rates of premature treatment discontinuation, suggesting that rates may differ depending on potential moderators. In the current study, our aim was to identify demographic and interpersonal variables that moderate the association between treatment assignment and dropout. Data from a randomized controlled trial conducted from November 2001 through June 2007 (N = 156) comparing supportive-expressive therapy, antidepressant medication, and placebo for the treatment of depression (based on DSM-IV criteria) were used. Twenty prerandomization variables were chosen based on previous literature. These variables were subjected to exploratory bootstrapped variable selection and included in the logistic regression models if they passed variable selection. Three variables were found to moderate the association between treatment assignment and dropout: age, pretreatment therapeutic alliance expectations, and the presence of vindictive tendencies in interpersonal relationships. When patients were divided into those randomly assigned to their optimal treatment and those assigned to their least optimal treatment, dropout rates in the optimal treatment group (24.4%) were significantly lower than those in the least optimal treatment group (47.4%; P = .03). Present findings suggest that a patient's age and pretreatment interpersonal characteristics predict the association between common depression treatments and dropout rate. If validated by further studies, these characteristics can assist in reducing dropout through targeted treatment assignment. Secondary analysis of data from ClinicalTrials.gov identifier: NCT00043550. © Copyright 2016 Physicians Postgraduate Press, Inc.

  16. A Study of Quasar Selection in the Supernova Fields of the Dark Energy Survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tie, S. S.; Martini, P.; Mudd, D.

    In this paper, we present a study of quasar selection using the supernova fields of the Dark Energy Survey (DES). We used a quasar catalog from an overlapping portion of the SDSS Stripe 82 region to quantify the completeness and efficiency of selection methods involving color, probabilistic modeling, variability, and combinations of color/probabilistic modeling with variability. In all cases, we considered only objects that appear as point sources in the DES images. We examine color selection methods based on the Wide-field Infrared Survey Explorer (WISE) mid-IR W1-W2 color, a mixture of WISE and DES colors (g - i and i-W1),more » and a mixture of Vista Hemisphere Survey and DES colors (g - i and i - K). For probabilistic quasar selection, we used XDQSO, an algorithm that employs an empirical multi-wavelength flux model of quasars to assign quasar probabilities. Our variability selection uses the multi-band χ 2-probability that sources are constant in the DES Year 1 griz-band light curves. The completeness and efficiency are calculated relative to an underlying sample of point sources that are detected in the required selection bands and pass our data quality and photometric error cuts. We conduct our analyses at two magnitude limits, i < 19.8 mag and i < 22 mag. For the subset of sources with W1 and W2 detections, the W1-W2 color or XDQSOz method combined with variability gives the highest completenesses of >85% for both i-band magnitude limits and efficiencies of >80% to the bright limit and >60% to the faint limit; however, the giW1 and giW1+variability methods give the highest quasar surface densities. The XDQSOz method and combinations of W1W2/giW1/XDQSOz with variability are among the better selection methods when both high completeness and high efficiency are desired. We also present the OzDES Quasar Catalog of 1263 spectroscopically confirmed quasars from three years of OzDES observation in the 30 deg 2 of the DES supernova fields. Finally, the catalog includes quasars with redshifts up to z ~ 4 and brighter than i = 22 mag, although the catalog is not complete up to this magnitude limit.« less

  17. A Study of Quasar Selection in the Supernova Fields of the Dark Energy Survey

    DOE PAGES

    Tie, S. S.; Martini, P.; Mudd, D.; ...

    2017-02-15

    In this paper, we present a study of quasar selection using the supernova fields of the Dark Energy Survey (DES). We used a quasar catalog from an overlapping portion of the SDSS Stripe 82 region to quantify the completeness and efficiency of selection methods involving color, probabilistic modeling, variability, and combinations of color/probabilistic modeling with variability. In all cases, we considered only objects that appear as point sources in the DES images. We examine color selection methods based on the Wide-field Infrared Survey Explorer (WISE) mid-IR W1-W2 color, a mixture of WISE and DES colors (g - i and i-W1),more » and a mixture of Vista Hemisphere Survey and DES colors (g - i and i - K). For probabilistic quasar selection, we used XDQSO, an algorithm that employs an empirical multi-wavelength flux model of quasars to assign quasar probabilities. Our variability selection uses the multi-band χ 2-probability that sources are constant in the DES Year 1 griz-band light curves. The completeness and efficiency are calculated relative to an underlying sample of point sources that are detected in the required selection bands and pass our data quality and photometric error cuts. We conduct our analyses at two magnitude limits, i < 19.8 mag and i < 22 mag. For the subset of sources with W1 and W2 detections, the W1-W2 color or XDQSOz method combined with variability gives the highest completenesses of >85% for both i-band magnitude limits and efficiencies of >80% to the bright limit and >60% to the faint limit; however, the giW1 and giW1+variability methods give the highest quasar surface densities. The XDQSOz method and combinations of W1W2/giW1/XDQSOz with variability are among the better selection methods when both high completeness and high efficiency are desired. We also present the OzDES Quasar Catalog of 1263 spectroscopically confirmed quasars from three years of OzDES observation in the 30 deg 2 of the DES supernova fields. Finally, the catalog includes quasars with redshifts up to z ~ 4 and brighter than i = 22 mag, although the catalog is not complete up to this magnitude limit.« less

  18. Selecting minimum dataset soil variables using PLSR as a regressive multivariate method

    NASA Astrophysics Data System (ADS)

    Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.

    2017-04-01

    Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP) statistics was used to quantitatively assess the predictors most relevant for response variable estimation and then for variable selection (Andersen and Bro, 2010). PCA and SDA returned TOC and RFC as influential variables both on the set of chemical and physical data analyzed separately as well as on the whole dataset (Stellacci et al., 2016). Highly weighted variables in PCA were also TEC, followed by K, and AC, followed by Pmac and BD, in the first PC (41.2% of total variance); Olsen P and HA-FA in the second PC (12.6%), Ca in the third (10.6%) component. Variables enabling maximum discrimination among treatments for SDA were WEOC, on the whole dataset, humic substances, followed by Olsen P, EC and clay, in the separate data analyses. The highest PLS-VIP statistics were recorded for Olsen P and Pmac, followed by TOC, TEC, pH and Mg for chemical variables and clay, RFC and AC for the physical variables. Results show that different methods may provide different ranking of the selected variables and the presence of a response variable, in regressive techniques, may affect variable selection. Further investigation with different response variables and with multi-year datasets would allow to better define advantages and limits of single or combined approaches. Acknowledgment The work was supported by the projects "BIOTILLAGE, approcci innovative per il miglioramento delle performances ambientali e produttive dei sistemi cerealicoli no-tillage", financed by PSR-Basilicata 2007-2013, and "DESERT, Low-cost water desalination and sensor technology compact module" financed by ERANET-WATERWORKS 2014. References Andersen C.M. and Bro R., 2010. Variable selection in regression - a tutorial. Journal of Chemometrics, 24 728-737. Armenise et al., 2013. Developing a soil quality index to compare soil fitness for agricultural use under different managements in the mediterranean environment. Soil and Tillage Research, 130:91-98. de Paul Obade et al., 2016. A standardized soil quality index for diverse field conditions. Sci. Total Env. 541:424-434. Pulido Moncada et al., 2014. Data-driven analysis of soil quality indicators using limited data. Geoderma, 235:271-278. Stellacci et al., 2016. Comparison of different multivariate methods to select key soil variables for soil quality indices computation. XLV Congress of the Italian Society of Agronomy (SIA), Sassari, 20-22 September 2016.

  19. VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS

    PubMed Central

    Huang, Jian; Horowitz, Joel L.; Wei, Fengrong

    2010-01-01

    We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is “small” relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method. PMID:21127739

  20. Rapid isolation of IgNAR variable single-domain antibody fragments from a shark synthetic library.

    PubMed

    Shao, Cui-Ying; Secombes, Chris J; Porter, Andrew J

    2007-01-01

    The immunoglobulin isotype IgNAR (Novel Antigen Receptor) was discovered in the serum of the nurse shark (Ginglymostoma cirratum) and wobbegong shark (Orectolobus maculates) as a homodimer of two protein chains, each composed of a single variable domain (V) domain and five constant domains. The IgNAR variable domain contains an intact antigen-binding site and functions as an independent domain able to react to antigen with both high specificity and affinity. Here we describe the successful construction of a synthetic phage-displayed library based upon a single anti-lysozyme clone HEL-5A7 scaffold, which was previously selected from an immune IgNAR variable domain library. The complementarity-determining region 3 (CDR3) loop of this clone was varied in both length and composition and the derived library was used to pan against two model proteins, lysozyme and leptin. A single anti-lysozyme clone (Ly-X20) and anti-leptin clone (Lep-12E1) were selected for further study. Both clones were shown to be functionally expressed in Escherichia coli, extremely thermostable and bind to corresponding antigens specifically. The results here demonstrate that a synthetic IgNAR variable domain library based on a single framework scaffold can be used as a route to generate antigen binders quickly, easily and without the need of immunization.

  1. Application of the Trend Filtering Algorithm for Photometric Time Series Data

    NASA Astrophysics Data System (ADS)

    Gopalan, Giri; Plavchan, Peter; van Eyken, Julian; Ciardi, David; von Braun, Kaspar; Kane, Stephen R.

    2016-08-01

    Detecting transient light curves (e.g., transiting planets) requires high-precision data, and thus it is important to effectively filter systematic trends affecting ground-based wide-field surveys. We apply an implementation of the Trend Filtering Algorithm (TFA) to the 2MASS calibration catalog and select Palomar Transient Factory (PTF) photometric time series data. TFA is successful at reducing the overall dispersion of light curves, however, it may over-filter intrinsic variables and increase “instantaneous” dispersion when a template set is not judiciously chosen. In an attempt to rectify these issues we modify the original TFA from the literature by including measurement uncertainties in its computation, including ancillary data correlated with noise, and algorithmically selecting a template set using clustering algorithms as suggested by various authors. This approach may be particularly useful for appropriately accounting for variable photometric precision surveys and/or combined data sets. In summary, our contributions are to provide a MATLAB software implementation of TFA and a number of modifications tested on synthetics and real data, summarize the performance of TFA and various modifications on real ground-based data sets (2MASS and PTF), and assess the efficacy of TFA and modifications using synthetic light curve tests consisting of transiting and sinusoidal variables. While the transiting variables test indicates that these modifications confer no advantage to transit detection, the sinusoidal variables test indicates potential improvements in detection accuracy.

  2. Multivariate calibration on NIR data: development of a model for the rapid evaluation of ethanol content in bakery products.

    PubMed

    Bello, Alessandra; Bianchi, Federica; Careri, Maria; Giannetto, Marco; Mori, Giovanni; Musci, Marilena

    2007-11-05

    A new NIR method based on multivariate calibration for determination of ethanol in industrially packed wholemeal bread was developed and validated. GC-FID was used as reference method for the determination of actual ethanol concentration of different samples of wholemeal bread with proper content of added ethanol, ranging from 0 to 3.5% (w/w). Stepwise discriminant analysis was carried out on the NIR dataset, in order to reduce the number of original variables by selecting those that were able to discriminate between the samples of different ethanol concentrations. With the so selected variables a multivariate calibration model was then obtained by multiple linear regression. The prediction power of the linear model was optimized by a new "leave one out" method, so that the number of original variables resulted further reduced.

  3. Robust nonlinear variable selective control for networked systems

    NASA Astrophysics Data System (ADS)

    Rahmani, Behrooz

    2016-10-01

    This paper is concerned with the networked control of a class of uncertain nonlinear systems. In this way, Takagi-Sugeno (T-S) fuzzy modelling is used to extend the previously proposed variable selective control (VSC) methodology to nonlinear systems. This extension is based upon the decomposition of the nonlinear system to a set of fuzzy-blended locally linearised subsystems and further application of the VSC methodology to each subsystem. To increase the applicability of the T-S approach for uncertain nonlinear networked control systems, this study considers the asynchronous premise variables in the plant and the controller, and then introduces a robust stability analysis and control synthesis. The resulting optimal switching-fuzzy controller provides a minimum guaranteed cost on an H2 performance index. Simulation studies on three nonlinear benchmark problems demonstrate the effectiveness of the proposed method.

  4. A predictive model for diagnosing bipolar disorder based on the clinical characteristics of major depressive episodes in Chinese population.

    PubMed

    Gan, Zhaoyu; Diao, Feici; Wei, Qinling; Wu, Xiaoli; Cheng, Minfeng; Guan, Nianhong; Zhang, Ming; Zhang, Jinbei

    2011-11-01

    A correct timely diagnosis of bipolar depression remains a big challenge for clinicians. This study aimed to develop a clinical characteristic based model to predict the diagnosis of bipolar disorder among patients with current major depressive episodes. A prospective study was carried out on 344 patients with current major depressive episodes, with 268 completing 1-year follow-up. Data were collected through structured interviews. Univariate binary logistic regression was conducted to select potential predictive variables among 19 initial variables, and then multivariate binary logistic regression was performed to analyze the combination of risk factors and build a predictive model. Receiver operating characteristic (ROC) curve was plotted. Of 19 initial variables, 13 variables were preliminarily selected, and then forward stepwise exercise produced a final model consisting of 6 variables: age at first onset, maximum duration of depressive episodes, somatalgia, hypersomnia, diurnal variation of mood, irritability. The correct prediction rate of this model was 78% (95%CI: 75%-86%) and the area under the ROC curve was 0.85 (95%CI: 0.80-0.90). The cut-off point for age at first onset was 28.5 years old, while the cut-off point for maximum duration of depressive episode was 7.5 months. The limitations of this study include small sample size, relatively short follow-up period and lack of treatment information. Our predictive models based on six clinical characteristics of major depressive episodes prove to be robust and can help differentiate bipolar depression from unipolar depression. Copyright © 2011 Elsevier B.V. All rights reserved.

  5. Evaluation of a Teleform-based data collection system: a multi-center obesity research case study.

    PubMed

    Jenkins, Todd M; Wilson Boyce, Tawny; Akers, Rachel; Andringa, Jennifer; Liu, Yanhong; Miller, Rosemary; Powers, Carolyn; Ralph Buncher, C

    2014-06-01

    Utilizing electronic data capture (EDC) systems in data collection and management allows automated validation programs to preemptively identify and correct data errors. For our multi-center, prospective study we chose to use TeleForm, a paper-based data capture software that uses recognition technology to create case report forms (CRFs) with similar functionality to EDC, including custom scripts to identify entry errors. We quantified the accuracy of the optimized system through a data audit of CRFs and the study database, examining selected critical variables for all subjects in the study, as well as an audit of all variables for 25 randomly selected subjects. Overall we found 6.7 errors per 10,000 fields, with similar estimates for critical (6.9/10,000) and non-critical (6.5/10,000) variables-values that fall below the acceptable quality threshold of 50 errors per 10,000 established by the Society for Clinical Data Management. However, error rates were found to widely vary by type of data field, with the highest rate observed with open text fields. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. TRANPLAN and GIS support for agencies in Alabama

    DOT National Transportation Integrated Search

    2001-08-06

    Travel demand models are computerized programs intended to forecast future roadway traffic volumes for a community based on selected socioeconomic variables and travel behavior algorithms. Software to operate these travel demand models is currently a...

  7. Multi-atlas segmentation of subcortical brain structures via the AutoSeg software pipeline

    PubMed Central

    Wang, Jiahui; Vachet, Clement; Rumple, Ashley; Gouttard, Sylvain; Ouziel, Clémentine; Perrot, Emilie; Du, Guangwei; Huang, Xuemei; Gerig, Guido; Styner, Martin

    2014-01-01

    Automated segmenting and labeling of individual brain anatomical regions, in MRI are challenging, due to the issue of individual structural variability. Although atlas-based segmentation has shown its potential for both tissue and structure segmentation, due to the inherent natural variability as well as disease-related changes in MR appearance, a single atlas image is often inappropriate to represent the full population of datasets processed in a given neuroimaging study. As an alternative for the case of single atlas segmentation, the use of multiple atlases alongside label fusion techniques has been introduced using a set of individual “atlases” that encompasses the expected variability in the studied population. In our study, we proposed a multi-atlas segmentation scheme with a novel graph-based atlas selection technique. We first paired and co-registered all atlases and the subject MR scans. A directed graph with edge weights based on intensity and shape similarity between all MR scans is then computed. The set of neighboring templates is selected via clustering of the graph. Finally, weighted majority voting is employed to create the final segmentation over the selected atlases. This multi-atlas segmentation scheme is used to extend a single-atlas-based segmentation toolkit entitled AutoSeg, which is an open-source, extensible C++ based software pipeline employing BatchMake for its pipeline scripting, developed at the Neuro Image Research and Analysis Laboratories of the University of North Carolina at Chapel Hill. AutoSeg performs N4 intensity inhomogeneity correction, rigid registration to a common template space, automated brain tissue classification based skull-stripping, and the multi-atlas segmentation. The multi-atlas-based AutoSeg has been evaluated on subcortical structure segmentation with a testing dataset of 20 adult brain MRI scans and 15 atlas MRI scans. The AutoSeg achieved mean Dice coefficients of 81.73% for the subcortical structures. PMID:24567717

  8. Applying an intelligent model and sensitivity analysis to inspect mass transfer kinetics, shrinkage and crust color changes of deep-fat fried ostrich meat cubes.

    PubMed

    Amiryousefi, Mohammad Reza; Mohebbi, Mohebbat; Khodaiyan, Faramarz

    2014-01-01

    The objectives of this study were to use image analysis and artificial neural network (ANN) to predict mass transfer kinetics as well as color changes and shrinkage of deep-fat fried ostrich meat cubes. Two generalized feedforward networks were separately developed by using the operation conditions as inputs. Results based on the highest numerical quantities of the correlation coefficients between the experimental versus predicted values, showed proper fitting. Sensitivity analysis results of selected ANNs showed that among the input variables, frying temperature was the most sensitive to moisture content (MC) and fat content (FC) compared to other variables. Sensitivity analysis results of selected ANNs showed that MC and FC were the most sensitive to frying temperature compared to other input variables. Similarly, for the second ANN architecture, microwave power density was the most impressive variable having the maximum influence on both shrinkage percentage and color changes. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. Hierarchical analysis of cardiovascular risk factors in relation to the development of acute coronary syndromes, in different parts of Greece: the CARDIO2000 study.

    PubMed

    Panagiotakos, Demosthenes B; Pitsavos, Christos; Chrysohoou, Christine; Stefanadis, Christodoulos

    2008-01-01

    During 2000 to 2002, 700 men (59 +/- 10 years) and 148 women (65 +/- 9 years) patients with first event of an ACS were randomly selected from cardiology clinics of Greek regions. Afterwards, 1078 population-based, age-matched and sex-matched controls were randomly selected from the same hospitals. The frequency ratio between men and women in the case series of patients was about 4:1, in both south and north Greek areas. Hierarchical classification analysis showed that for north Greek areas family history of coronary heart disease, hypercholesterolemia, hypertension, diabetes (explained variability 35%), and less significantly, dietary habits, smoking, body mass index, and physical activity status (explained variability 4%) were associated with the development of ACS, whereas for south Greek areas hypercholesterolemia, family history of coronary heart disease, diabetes, smoking, hypertension, dietary habits, physical activity (explained variability 34%), and less significantly body mass index (explained variability <1%), were associated with the development of the disease.

  10. The impact of extracurricular activities participation on youth delinquent behaviors: An instrumental variables approach.

    PubMed

    Han, Sehee; Lee, Jonathan; Park, Kyung-Gook

    2017-07-01

    The purpose of this study was to examine the association between extracurricular activities (EA) participation and youth delinquency while tackling an endogeneity problem of EA participation. Using survey data of 12th graders in South Korea (n = 1943), this study employed an instrumental variables approach to address the self-selection problem of EA participation as the data for this study was based on an observational study design. We found a positive association between EA participation and youth delinquency based on conventional regression analysis. By contrast, we found a negative association between EA participation and youth delinquency based on an instrumental variables approach. These results indicate that caution should be exercised when we interpret the effect of EA participation on youth delinquency based on observational study designs. Copyright © 2017 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.

  11. Advanced colorectal neoplasia risk stratification by penalized logistic regression.

    PubMed

    Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F

    2016-08-01

    Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.

  12. Interdisciplinary research in global biogeochemical cycling Nitrous oxide in terrestrial ecosystems

    NASA Technical Reports Server (NTRS)

    Norman, S. D.; Peterson, D. L.

    1984-01-01

    NASA has begun an interdisciplinary research program to investigate various aspects of Global Biology and Global Habitability. An important element selected for the study of global phenomena is related to biogeochemical cycling. The studies involve a collaboration with recognized scientists in the areas of plant physiology, microbiology, nutrient cycling theory, and related areas. Selected subjects of study include nitrogen cycling dynamics in terrestrial ecosystems with special attention to biosphere/atmosphere interactions, and an identification of sensitive response variables which can be used in ecosystem models based on parameters derived from remotely sensed variables. A description is provided of the progress and findings over the past two years. Attention is given to the characteristics of nitrous oxide emissions, the approach followed in the investigations, the selection of study sites, radiometric measurements, and research in Sequoia.

  13. Cider fermentation process monitoring by Vis-NIR sensor system and chemometrics.

    PubMed

    Villar, Alberto; Vadillo, Julen; Santos, Jose I; Gorritxategi, Eneko; Mabe, Jon; Arnaiz, Aitor; Fernández, Luis A

    2017-04-15

    Optimization of a multivariate calibration process has been undertaken for a Visible-Near Infrared (400-1100nm) sensor system, applied in the monitoring of the fermentation process of the cider produced in the Basque Country (Spain). The main parameters that were monitored included alcoholic proof, l-lactic acid content, glucose+fructose and acetic acid content. The multivariate calibration was carried out using a combination of different variable selection techniques and the most suitable pre-processing strategies were selected based on the spectra characteristics obtained by the sensor system. The variable selection techniques studied in this work include Martens Uncertainty test, interval Partial Least Square Regression (iPLS) and Genetic Algorithm (GA). This procedure arises from the need to improve the calibration models prediction ability for cider monitoring. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Application-Dedicated Selection of Filters (ADSF) using covariance maximization and orthogonal projection.

    PubMed

    Hadoux, Xavier; Kumar, Dinesh Kant; Sarossy, Marc G; Roger, Jean-Michel; Gorretta, Nathalie

    2016-05-19

    Visible and near-infrared (Vis-NIR) spectra are generated by the combination of numerous low resolution features. Spectral variables are thus highly correlated, which can cause problems for selecting the most appropriate ones for a given application. Some decomposition bases such as Fourier or wavelet generally help highlighting spectral features that are important, but are by nature constraint to have both positive and negative components. Thus, in addition to complicating the selected features interpretability, it impedes their use for application-dedicated sensors. In this paper we have proposed a new method for feature selection: Application-Dedicated Selection of Filters (ADSF). This method relaxes the shape constraint by enabling the selection of any type of user defined custom features. By considering only relevant features, based on the underlying nature of the data, high regularization of the final model can be obtained, even in the small sample size context often encountered in spectroscopic applications. For larger scale deployment of application-dedicated sensors, these predefined feature constraints can lead to application specific optical filters, e.g., lowpass, highpass, bandpass or bandstop filters with positive only coefficients. In a similar fashion to Partial Least Squares, ADSF successively selects features using covariance maximization and deflates their influences using orthogonal projection in order to optimally tune the selection to the data with limited redundancy. ADSF is well suited for spectroscopic data as it can deal with large numbers of highly correlated variables in supervised learning, even with many correlated responses. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. The impact of confounder selection in propensity scores when applied to prospective cohort studies in pregnancy.

    PubMed

    Xu, Ronghui; Hou, Jue; Chambers, Christina D

    2018-06-01

    Our work was motivated by small cohort studies on the risk of birth defects in infants born to pregnant women exposed to medications. We controlled for confounding using propensity scores (PS). The extremely rare events setting renders the matching or stratification infeasible. In addition, the PS itself may be formed via different approaches to select confounders from a relatively long list of potential confounders. We carried out simulation experiments to compare different combinations of approaches: IPW or regression adjustment, with 1) including all potential confounders without selection, 2) selection based on univariate association between the candidate variable and the outcome, 3) selection based on change in effects (CIE). The simulation showed that IPW without selection leads to extremely large variances in the estimated odds ratio, which help to explain the empirical data analysis results that we had observed. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. Awareness of the Faculty Members at Al-Balqa' Applied University to the Concept of Time Management and Its Relation to Some Variables

    ERIC Educational Resources Information Center

    Sabha, Raed Adel; Al-Assaf, Jamal Abdel-Fattah

    2012-01-01

    The study aims to investigate how extent is the time management awareness of the faculty members of the Al-Balqa' Applied university, and its relation to some variables. The study conducted on (150) teachers were selected randomly. For achieving the study goals an appropriate instrument has been built up based on the educational literature and…

  17. A randomized evaluation of a computer-based physician's workstation: design considerations and baseline results.

    PubMed Central

    Rotman, B. L.; Sullivan, A. N.; McDonald, T.; DeSmedt, P.; Goodnature, D.; Higgins, M.; Suermondt, H. J.; Young, C. Y.; Owens, D. K.

    1995-01-01

    We are performing a randomized, controlled trial of a Physician's Workstation (PWS), an ambulatory care information system, developed for use in the General Medical Clinic (GMC) of the Palo Alto VA. Goals for the project include selecting appropriate outcome variables and developing a statistically powerful experimental design with a limited number of subjects. As PWS provides real-time drug-ordering advice, we retrospectively examined drug costs and drug-drug interactions in order to select outcome variables sensitive to our short-term intervention as well as to estimate the statistical efficiency of alternative design possibilities. Drug cost data revealed the mean daily cost per physician per patient was 99.3 cents +/- 13.4 cents, with a range from 0.77 cent to 1.37 cents. The rate of major interactions per prescription for each physician was 2.9% +/- 1%, with a range from 1.5% to 4.8%. Based on these baseline analyses, we selected a two-period parallel design for the evaluation, which maximized statistical power while minimizing sources of bias. PMID:8563376

  18. Utilizing multiple state variables to improve the dynamic range of analog switching in a memristor

    NASA Astrophysics Data System (ADS)

    Jeong, YeonJoo; Kim, Sungho; Lu, Wei D.

    2015-10-01

    Memristors and memristive systems have been extensively studied for data storage and computing applications such as neuromorphic systems. To act as synapses in neuromorphic systems, the memristor needs to exhibit analog resistive switching (RS) behavior with incremental conductance change. In this study, we show that the dynamic range of the analog RS behavior can be significantly enhanced in a tantalum-oxide-based memristor. By controlling different state variables enabled by different physical effects during the RS process, the gradual filament expansion stage can be selectively enhanced without strongly affecting the abrupt filament length growth stage. Detailed physics-based modeling further verified the observed experimental effects and revealed the roles of oxygen vacancy drift and diffusion processes, and how the diffusion process can be selectively enhanced during the filament expansion stage. These findings lead to more desirable and reliable memristor behaviors for analog computing applications. Additionally, the ability to selectively control different internal physical processes demonstrated in the current study provides guidance for continued device optimization of memristor devices in general.

  19. ASTRAL-R score predicts non-recanalisation after intravenous thrombolysis in acute ischaemic stroke.

    PubMed

    Vanacker, Peter; Heldner, Mirjam R; Seiffge, David; Mueller, Hubertus; Eskandari, Ashraf; Traenka, Christopher; Ntaios, George; Mosimann, Pascal J; Sztajzel, Roman; Mendes Pereira, Vitor; Cras, Patrick; Engelter, Stefan; Lyrer, Philippe; Fischer, Urs; Lambrou, Dimitris; Arnold, Marcel; Michel, Patrik

    2015-05-01

    Intravenous thrombolysis (IVT) as treatment in acute ischaemic strokes may be insufficient to achieve recanalisation in certain patients. Predicting probability of non-recanalisation after IVT may have the potential to influence patient selection to more aggressive management strategies. We aimed at deriving and internally validating a predictive score for post-thrombolytic non-recanalisation, using clinical and radiological variables. In thrombolysis registries from four Swiss academic stroke centres (Lausanne, Bern, Basel and Geneva), patients were selected with large arterial occlusion on acute imaging and with repeated arterial assessment at 24 hours. Based on a logistic regression analysis, an integer-based score for each covariate of the fitted multivariate model was generated. Performance of integer-based predictive model was assessed by bootstrapping available data and cross validation (delete-d method). In 599 thrombolysed strokes, five variables were identified as independent predictors of absence of recanalisation: Acute glucose > 7 mmol/l (A), significant extracranial vessel STenosis (ST), decreased Range of visual fields (R), large Arterial occlusion (A) and decreased Level of consciousness (L). All variables were weighted 1, except for (L) which obtained 2 points based on β-coefficients on the logistic scale. ASTRAL-R scores 0, 3 and 6 corresponded to non-recanalisation probabilities of 18, 44 and 74 % respectively. Predictive ability showed AUC of 0.66 (95 %CI, 0.61-0.70) when using bootstrap and 0.66 (0.63-0.68) when using delete-d cross validation. In conclusion, the 5-item ASTRAL-R score moderately predicts non-recanalisation at 24 hours in thrombolysed ischaemic strokes. If its performance can be confirmed by external validation and its clinical usefulness can be proven, the score may influence patient selection for more aggressive revascularisation strategies in routine clinical practice.

  20. Attitudes toward Task-Based Language Learning: A Study of College Korean Language Learners

    ERIC Educational Resources Information Center

    Pyun, Danielle Ooyoung

    2013-01-01

    This study explores second/foreign language (L2) learners' attitudes toward task-based language learning (TBLL) and how these attitudes relate to selected learner variables, namely anxiety, integrated motivation, instrumental motivation, and self-efficacy. Ninety-one college students of Korean as a foreign language, who received task-based…

  1. Cloud cover estimation: Use of GOES imagery in development of cloud cover data base for insolation assessment

    NASA Technical Reports Server (NTRS)

    Huning, J. R.; Logan, T. L.; Smith, J. H.

    1982-01-01

    The potential of using digital satellite data to establish a cloud cover data base for the United States, one that would provide detailed information on the temporal and spatial variability of cloud development are studied. Key elements include: (1) interfacing GOES data from the University of Wisconsin Meteorological Data Facility with the Jet Propulsion Laboratory's VICAR image processing system and IBIS geographic information system; (2) creation of a registered multitemporal GOES data base; (3) development of a simple normalization model to compensate for sun angle; (4) creation of a variable size georeference grid that provides detailed cloud information in selected areas and summarized information in other areas; and (5) development of a cloud/shadow model which details the percentage of each grid cell that is cloud and shadow covered, and the percentage of cloud or shadow opacity. In addition, comparison of model calculations of insolation with measured values at selected test sites was accomplished, as well as development of preliminary requirements for a large scale data base of cloud cover statistics.

  2. An image-based search for pulsars among Fermi unassociated LAT sources

    NASA Astrophysics Data System (ADS)

    Frail, D. A.; Ray, P. S.; Mooley, K. P.; Hancock, P.; Burnett, T. H.; Jagannathan, P.; Ferrara, E. C.; Intema, H. T.; de Gasperin, F.; Demorest, P. B.; Stovall, K.; McKinnon, M. M.

    2018-03-01

    We describe an image-based method that uses two radio criteria, compactness, and spectral index, to identify promising pulsar candidates among Fermi Large Area Telescope (LAT) unassociated sources. These criteria are applied to those radio sources from the Giant Metrewave Radio Telescope all-sky survey at 150 MHz (TGSS ADR1) found within the error ellipses of unassociated sources from the 3FGL catalogue and a preliminary source list based on 7 yr of LAT data. After follow-up interferometric observations to identify extended or variable sources, a list of 16 compact, steep-spectrum candidates is generated. An ongoing search for pulsations in these candidates, in gamma rays and radio, has found 6 ms pulsars and one normal pulsar. A comparison of this method with existing selection criteria based on gamma-ray spectral and variability properties suggests that the pulsar discovery space using Fermi may be larger than previously thought. Radio imaging is a hitherto underutilized source selection method that can be used, as with other multiwavelength techniques, in the search for Fermi pulsars.

  3. An improved partial least-squares regression method for Raman spectroscopy

    NASA Astrophysics Data System (ADS)

    Momenpour Tehran Monfared, Ali; Anis, Hanan

    2017-10-01

    It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.

  4. A Bayesian hierarchical model with spatial variable selection: the effect of weather on insurance claims

    PubMed Central

    Scheel, Ida; Ferkingstad, Egil; Frigessi, Arnoldo; Haug, Ola; Hinnerichsen, Mikkel; Meze-Hausken, Elisabeth

    2013-01-01

    Climate change will affect the insurance industry. We develop a Bayesian hierarchical statistical approach to explain and predict insurance losses due to weather events at a local geographic scale. The number of weather-related insurance claims is modelled by combining generalized linear models with spatially smoothed variable selection. Using Gibbs sampling and reversible jump Markov chain Monte Carlo methods, this model is fitted on daily weather and insurance data from each of the 319 municipalities which constitute southern and central Norway for the period 1997–2006. Precise out-of-sample predictions validate the model. Our results show interesting regional patterns in the effect of different weather covariates. In addition to being useful for insurance pricing, our model can be used for short-term predictions based on weather forecasts and for long-term predictions based on downscaled climate models. PMID:23396890

  5. Sensitivity study of Space Station Freedom operations cost and selected user resources

    NASA Technical Reports Server (NTRS)

    Accola, Anne; Fincannon, H. J.; Williams, Gregory J.; Meier, R. Timothy

    1990-01-01

    The results of sensitivity studies performed to estimate probable ranges for four key Space Station parameters using the Space Station Freedom's Model for Estimating Space Station Operations Cost (MESSOC) are discussed. The variables examined are grouped into five main categories: logistics, crew, design, space transportation system, and training. The modification of these variables implies programmatic decisions in areas such as orbital replacement unit (ORU) design, investment in repair capabilities, and crew operations policies. The model utilizes a wide range of algorithms and an extensive trial logistics data base to represent Space Station operations. The trial logistics data base consists largely of a collection of the ORUs that comprise the mature station, and their characteristics based on current engineering understanding of the Space Station. A nondimensional approach is used to examine the relative importance of variables on parameters.

  6. Selecting predictors for discriminant analysis of species performance: an example from an amphibious softwater plant.

    PubMed

    Vanderhaeghe, F; Smolders, A J P; Roelofs, J G M; Hoffmann, M

    2012-03-01

    Selecting an appropriate variable subset in linear multivariate methods is an important methodological issue for ecologists. Interest often exists in obtaining general predictive capacity or in finding causal inferences from predictor variables. Because of a lack of solid knowledge on a studied phenomenon, scientists explore predictor variables in order to find the most meaningful (i.e. discriminating) ones. As an example, we modelled the response of the amphibious softwater plant Eleocharis multicaulis using canonical discriminant function analysis. We asked how variables can be selected through comparison of several methods: univariate Pearson chi-square screening, principal components analysis (PCA) and step-wise analysis, as well as combinations of some methods. We expected PCA to perform best. The selected methods were evaluated through fit and stability of the resulting discriminant functions and through correlations between these functions and the predictor variables. The chi-square subset, at P < 0.05, followed by a step-wise sub-selection, gave the best results. In contrast to expectations, PCA performed poorly, as so did step-wise analysis. The different chi-square subset methods all yielded ecologically meaningful variables, while probable noise variables were also selected by PCA and step-wise analysis. We advise against the simple use of PCA or step-wise discriminant analysis to obtain an ecologically meaningful variable subset; the former because it does not take into account the response variable, the latter because noise variables are likely to be selected. We suggest that univariate screening techniques are a worthwhile alternative for variable selection in ecology. © 2011 German Botanical Society and The Royal Botanical Society of the Netherlands.

  7. A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.

    PubMed

    Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger

    2018-04-19

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.

  8. Improve projections of changes in southern African summer rainfall through comprehensive multi-timescale empirical statistical downscaling

    NASA Astrophysics Data System (ADS)

    Dieppois, B.; Pohl, B.; Eden, J.; Crétat, J.; Rouault, M.; Keenlyside, N.; New, M. G.

    2017-12-01

    The water management community has hitherto neglected or underestimated many of the uncertainties in climate impact scenarios, in particular, uncertainties associated with decadal climate variability. Uncertainty in the state-of-the-art global climate models (GCMs) is time-scale-dependant, e.g. stronger at decadal than at interannual timescales, in response to the different parameterizations and to internal climate variability. In addition, non-stationarity in statistical downscaling is widely recognized as a key problem, in which time-scale dependency of predictors plays an important role. As with global climate modelling, therefore, the selection of downscaling methods must proceed with caution to avoid unintended consequences of over-correcting the noise in GCMs (e.g. interpreting internal climate variability as a model bias). GCM outputs from the Coupled Model Intercomparison Project 5 (CMIP5) have therefore first been selected based on their ability to reproduce southern African summer rainfall variability and their teleconnections with Pacific sea-surface temperature across the dominant timescales. In observations, southern African summer rainfall has recently been shown to exhibit significant periodicities at the interannual timescale (2-8 years), quasi-decadal (8-13 years) and inter-decadal (15-28 years) timescales, which can be interpret as the signature of ENSO, the IPO, and the PDO over the region. Most of CMIP5 GCMs underestimate southern African summer rainfall variability and their teleconnections with Pacific SSTs at these three timescales. In addition, according to a more in-depth analysis of historical and pi-control runs, this bias is might result from internal climate variability in some of the CMIP5 GCMs, suggesting potential for bias-corrected prediction based empirical statistical downscaling. A multi-timescale regression based downscaling procedure, which determines the predictors across the different timescales, has thus been used to simulate southern African summer rainfall. This multi-timescale procedure shows much better skills in simulating decadal timescales of variability compared to commonly used statistical downscaling approaches.

  9. Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Singh, Renu; Steinley, Douglas

    2009-01-01

    The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the…

  10. Social inequality, lifestyles and health - a non-linear canonical correlation analysis based on the approach of Pierre Bourdieu.

    PubMed

    Grosse Frie, Kirstin; Janssen, Christian

    2009-01-01

    Based on the theoretical and empirical approach of Pierre Bourdieu, a multivariate non-linear method is introduced as an alternative way to analyse the complex relationships between social determinants and health. The analysis is based on face-to-face interviews with 695 randomly selected respondents aged 30 to 59. Variables regarding socio-economic status, life circumstances, lifestyles, health-related behaviour and health were chosen for the analysis. In order to determine whether the respondents can be differentiated and described based on these variables, a non-linear canonical correlation analysis (OVERALS) was performed. The results can be described on three dimensions; Eigenvalues add up to the fit of 1.444, which can be interpreted as approximately 50 % of explained variance. The three-dimensional space illustrates correspondences between variables and provides a framework for interpretation based on latent dimensions, which can be described by age, education, income and gender. Using non-linear canonical correlation analysis, health characteristics can be analysed in conjunction with socio-economic conditions and lifestyles. Based on Bourdieus theoretical approach, the complex correlations between these variables can be more substantially interpreted and presented.

  11. wayGoo recommender system: personalized recommendations for events scheduling, based on static and real-time information

    NASA Astrophysics Data System (ADS)

    Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.

    2016-05-01

    wayGoo is a fully functional application whose main functionalities include content geolocation, event scheduling, and indoor navigation. However, significant information about events do not reach users' attention, either because of the size of this information or because some information comes from real - time data sources. The purpose of this work is to facilitate event management operations by prioritizing the presented events, based on users' interests using both, static and real - time data. Through the wayGoo interface, users select conceptual topics that are interesting for them. These topics constitute a browsing behavior vector which is used for learning users' interests implicitly, without being intrusive. Then, the system estimates user preferences and return an events list sorted from the most preferred one to the least. User preferences are modeled via a Naïve Bayesian Network which consists of: a) the `decision' random variable corresponding to users' decision on attending an event, b) the `distance' random variable, modeled by a linear regression that estimates the probability that the distance between a user and each event destination is not discouraging, ` the seat availability' random variable, modeled by a linear regression, which estimates the probability that the seat availability is encouraging d) and the `relevance' random variable, modeled by a clustering - based collaborative filtering, which determines the relevance of each event users' interests. Finally, experimental results show that the proposed system contribute essentially to assisting users in browsing and selecting events to attend.

  12. Sharpening method of satellite thermal image based on the geographical statistical model

    NASA Astrophysics Data System (ADS)

    Qi, Pengcheng; Hu, Shixiong; Zhang, Haijun; Guo, Guangmeng

    2016-04-01

    To improve the effectiveness of thermal sharpening in mountainous regions, paying more attention to the laws of land surface energy balance, a thermal sharpening method based on the geographical statistical model (GSM) is proposed. Explanatory variables were selected from the processes of land surface energy budget and thermal infrared electromagnetic radiation transmission, then high spatial resolution (57 m) raster layers were generated for these variables through spatially simulating or using other raster data as proxies. Based on this, the local adaptation statistical relationship between brightness temperature (BT) and the explanatory variables, i.e., the GSM, was built at 1026-m resolution using the method of multivariate adaptive regression splines. Finally, the GSM was applied to the high-resolution (57-m) explanatory variables; thus, the high-resolution (57-m) BT image was obtained. This method produced a sharpening result with low error and good visual effect. The method can avoid the blind choice of explanatory variables and remove the dependence on synchronous imagery at visible and near-infrared bands. The influences of the explanatory variable combination, sampling method, and the residual error correction on sharpening results were analyzed deliberately, and their influence mechanisms are reported herein.

  13. Analysis of host response to bacterial infection using error model based gene expression microarray experiments

    PubMed Central

    Stekel, Dov J.; Sarti, Donatella; Trevino, Victor; Zhang, Lihong; Salmon, Mike; Buckley, Chris D.; Stevens, Mark; Pallen, Mark J.; Penn, Charles; Falciani, Francesco

    2005-01-01

    A key step in the analysis of microarray data is the selection of genes that are differentially expressed. Ideally, such experiments should be properly replicated in order to infer both technical and biological variability, and the data should be subjected to rigorous hypothesis tests to identify the differentially expressed genes. However, in microarray experiments involving the analysis of very large numbers of biological samples, replication is not always practical. Therefore, there is a need for a method to select differentially expressed genes in a rational way from insufficiently replicated data. In this paper, we describe a simple method that uses bootstrapping to generate an error model from a replicated pilot study that can be used to identify differentially expressed genes in subsequent large-scale studies on the same platform, but in which there may be no replicated arrays. The method builds a stratified error model that includes array-to-array variability, feature-to-feature variability and the dependence of error on signal intensity. We apply this model to the characterization of the host response in a model of bacterial infection of human intestinal epithelial cells. We demonstrate the effectiveness of error model based microarray experiments and propose this as a general strategy for a microarray-based screening of large collections of biological samples. PMID:15800204

  14. An investigation of dynamic-analysis methods for variable-geometry structures

    NASA Technical Reports Server (NTRS)

    Austin, F.

    1980-01-01

    Selected space structure configurations were reviewed in order to define dynamic analysis problems associated with variable geometry. The dynamics of a beam being constructed from a flexible base and the relocation of the completed beam by rotating the remote manipulator system about the shoulder joint were selected. Equations of motion were formulated in physical coordinates for both of these problems, and FORTRAN programs were developed to generate solutions by numerically integrating the equations. These solutions served as a standard of comparison to gauge the accuracy of approximate solution techniques that were developed and studied. Good control was achieved in both problems. Unstable control system coupling with the system flexibility did not occur. An approximate method was developed for each problem to enable the analyst to investigate variable geometry effects during a short time span using standard fixed geometry programs such as NASTRAN. The average angle and average length techniques are discussed.

  15. Classification of 'Chemlali' accessions according to the geographical area using chemometric methods of phenolic profiles analysed by HPLC-ESI-TOF-MS.

    PubMed

    Taamalli, Amani; Arráez Román, David; Zarrouk, Mokhtar; Segura-Carretero, Antonio; Fernández-Gutiérrez, Alberto

    2012-05-01

    The present work describes a classification method of Tunisian 'Chemlali' olive oils based on their phenolic composition and geographical area. For this purpose, the data obtained by HPLC-ESI-TOF-MS from 13 samples of extra virgin olive oils, obtained from different production area throughout the country, were used for this study focusing in 23 phenolics compounds detected. The quantitative results showed a significant variability among the analysed oil samples. Factor analysis method using principal component was applied to the data in order to reduce the number of factors which explain the variability of the selected compounds. The data matrix constructed was subjected to a canonical discriminant analysis (CDA) in order to classify the oil samples. These results showed that 100% of cross-validated original group cases were correctly classified, which proves the usefulness of the selected variables. Copyright © 2011 Elsevier Ltd. All rights reserved.

  16. Enhanced CAH dechlorination in a low permeability, variably-saturated medium

    USGS Publications Warehouse

    Martin, J.P.; Sorenson, K.S.; Peterson, L.N.; Brennan, R.A.; Werth, C.J.; Sanford, R.A.; Bures, G.H.; Taylor, C.J.; ,

    2002-01-01

    An innovative pilot-scale field test was performed to enhance the anaerobic reductive dechlorination (ARD) of chlorinated aliphatic hydrocarbons (CAHs) in a low permeability, variably-saturated formation. The selected technology combines the use of a hydraulic fracturing (fracking) technique with enhanced bioremediation through the creation of highly-permeable sand- and electron donor-filled fractures in the low permeability matrix. Chitin was selected as the electron donor because of its unique properties as a polymeric organic material and based on the results of lab studies that indicated its ability to support ARD. The distribution and impact of chitin- and sand-filled fractures to the system was evaluated using hydrologic, geophysical, and geochemical parameters. The results indicate that, where distributed, chitin favorably impacted redox conditions and supported enhanced ARD of CAHs. These results indicate that this technology may be a viable and cost-effective approach for remediation of low-permeability, variably saturated systems.

  17. Sparse Zero-Sum Games as Stable Functional Feature Selection

    PubMed Central

    Sokolovska, Nataliya; Teytaud, Olivier; Rizkalla, Salwa; Clément, Karine; Zucker, Jean-Daniel

    2015-01-01

    In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints. PMID:26325268

  18. A discriminant function model for admission at undergraduate university level

    NASA Astrophysics Data System (ADS)

    Ali, Hamdi F.; Charbaji, Abdulrazzak; Hajj, Nada Kassim

    1992-09-01

    The study is aimed at predicting objective criteria based on a statistically tested model for admitting undergraduate students to Beirut University College. The University is faced with a dual problem of having to select only a fraction of an increasing number of applicants, and of trying to minimize the number of students placed on academic probation (currently 36 percent of new admissions). Out of 659 new students, a sample of 272 students (45 percent) were selected; these were all the students on the Dean's list and on academic probation. With academic performance as the dependent variable, the model included ten independent variables and their interactions. These variables included the type of high school, the language of instruction in high school, recommendations, sex, academic average in high school, score on the English Entrance Examination, the major in high school, and whether the major was originally applied for by the student. Discriminant analysis was used to evaluate the relative weight of the independent variables, and from the analysis three equations were developed, one for each academic division in the College. The predictive power of these equations was tested by using them to classify students not in the selected sample into successful and unsuccessful ones. Applicability of the model to other institutions of higher learning is discussed.

  19. Exploratory Spectroscopy of Magnetic Cataclysmic Variables Candidates and Other Variable Objects

    NASA Astrophysics Data System (ADS)

    Oliveira, A. S.; Rodrigues, C. V.; Cieslinski, D.; Jablonski, F. J.; Silva, K. M. G.; Almeida, L. A.; Rodríguez-Ardila, A.; Palhares, M. S.

    2017-04-01

    The increasing number of synoptic surveys made by small robotic telescopes, such as the photometric Catalina Real-Time Transient Survey (CRTS), provides a unique opportunity to discover variable sources and improves the statistical samples of such classes of objects. Our goal is the discovery of magnetic Cataclysmic Variables (mCVs). These are rare objects that probe interesting accretion scenarios controlled by the white-dwarf magnetic field. In particular, improved statistics of mCVs would help to address open questions on their formation and evolution. We performed an optical spectroscopy survey to search for signatures of magnetic accretion in 45 variable objects selected mostly from the CRTS. In this sample, we found 32 CVs, 22 being mCV candidates, 13 of which were previously unreported as such. If the proposed classifications are confirmed, it would represent an increase of 4% in the number of known polars and 12% in the number of known IPs. A fraction of our initial sample was classified as extragalactic sources or other types of variable stars by the inspection of the identification spectra. Despite the inherent complexity in identifying a source as an mCV, variability-based selection, followed by spectroscopic snapshot observations, has proved to be an efficient strategy for their discoveries, being a relatively inexpensive approach in terms of telescope time. Based on observations obtained at the Observatório do Pico dos Dias/LNA, and at the Southern Astrophysical Research (SOAR) telescope, which is a joint project of the Ministério da Ciência, Tecnologia, e Inovação (MCTI) da República Federativa do Brasil, the U.S. National Optical Astronomy Observatory (NOAO), the University of North Carolina at Chapel Hill (UNC), and Michigan State University (MSU).

  20. Assessment of economic status in trauma registries: A new algorithm for generating population-specific clustering-based models of economic status for time-constrained low-resource settings.

    PubMed

    Eyler, Lauren; Hubbard, Alan; Juillard, Catherine

    2016-10-01

    Low and middle-income countries (LMICs) and the world's poor bear a disproportionate share of the global burden of injury. Data regarding disparities in injury are vital to inform injury prevention and trauma systems strengthening interventions targeted towards vulnerable populations, but are limited in LMICs. We aim to facilitate injury disparities research by generating a standardized methodology for assessing economic status in resource-limited country trauma registries where complex metrics such as income, expenditures, and wealth index are infeasible to assess. To address this need, we developed a cluster analysis-based algorithm for generating simple population-specific metrics of economic status using nationally representative Demographic and Health Surveys (DHS) household assets data. For a limited number of variables, g, our algorithm performs weighted k-medoids clustering of the population using all combinations of g asset variables and selects the combination of variables and number of clusters that maximize average silhouette width (ASW). In simulated datasets containing both randomly distributed variables and "true" population clusters defined by correlated categorical variables, the algorithm selected the correct variable combination and appropriate cluster numbers unless variable correlation was very weak. When used with 2011 Cameroonian DHS data, our algorithm identified twenty economic clusters with ASW 0.80, indicating well-defined population clusters. This economic model for assessing health disparities will be used in the new Cameroonian six-hospital centralized trauma registry. By describing our standardized methodology and algorithm for generating economic clustering models, we aim to facilitate measurement of health disparities in other trauma registries in resource-limited countries. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  1. Selection Practices of Group Leaders: A National Survey.

    ERIC Educational Resources Information Center

    Riva, Maria T.; Lippert, Laurel; Tackett, M. Jan

    2000-01-01

    Study surveys the selection practices of group leaders. Explores methods of selection, variables used to make selection decisions, and the types of selection errors that leaders have experienced. Results suggest that group leaders use clinical judgment to make selection decisions and endorse using some specific variables in selection. (Contains 22…

  2. Genetic Variation Among Open-Pollinated Progeny of Eastern Cottonwood

    Treesearch

    R. E. Farmer

    1970-01-01

    Improvement programs in eastern cottonwood (Populus deltoides Bartr.) are most frequently designed to produce genetically superior clones for direct commercial use. This paper describes a progeny test to assess genetic variability on which selection might be based.

  3. Electromagnetic fields from mobile phone base station - variability analysis.

    PubMed

    Bienkowski, Pawel; Zubrzak, Bartlomiej

    2015-09-01

    The article describes the character of electromagnetic field (EMF) in mobile phone base station (BS) surroundings and its variability in time with an emphasis on the measurement difficulties related to its pulse and multi-frequency nature. Work also presents long-term monitoring measurements performed recently in different locations in Poland - small city with dispersed building development and in major polish city - dense urban area. Authors tried to determine the trends in changing of EMF spectrum analyzing daily changes of measured EMF levels in those locations. Research was performed using selective electromagnetic meters and also EMF meter with spectrum analysis.

  4. Multistage variable probability forest volume inventory. [the Defiance Unit of the Navajo Nation

    NASA Technical Reports Server (NTRS)

    Anderson, J. E. (Principal Investigator)

    1979-01-01

    An inventory scheme based on the use of computer processed LANDSAT MSS data was developed. Output from the inventory scheme provides an estimate of the standing net saw timber volume of a major timber species on a selected forested area of the Navajo Nation. Such estimates are based on the values of parameters currently used for scaled sawlog conversion to mill output. The multistage variable probability sampling appears capable of producing estimates which compare favorably with those produced using conventional techniques. In addition, the reduction in time, manpower, and overall costs lend it to numerous applications.

  5. Improving data analysis in herpetology: Using Akaike's information criterion (AIC) to assess the strength of biological hypotheses

    USGS Publications Warehouse

    Mazerolle, M.J.

    2006-01-01

    In ecology, researchers frequently use observational studies to explain a given pattern, such as the number of individuals in a habitat patch, with a large number of explanatory (i.e., independent) variables. To elucidate such relationships, ecologists have long relied on hypothesis testing to include or exclude variables in regression models, although the conclusions often depend on the approach used (e.g., forward, backward, stepwise selection). Though better tools have surfaced in the mid 1970's, they are still underutilized in certain fields, particularly in herpetology. This is the case of the Akaike information criterion (AIC) which is remarkably superior in model selection (i.e., variable selection) than hypothesis-based approaches. It is simple to compute and easy to understand, but more importantly, for a given data set, it provides a measure of the strength of evidence for each model that represents a plausible biological hypothesis relative to the entire set of models considered. Using this approach, one can then compute a weighted average of the estimate and standard error for any given variable of interest across all the models considered. This procedure, termed model-averaging or multimodel inference, yields precise and robust estimates. In this paper, I illustrate the use of the AIC in model selection and inference, as well as the interpretation of results analysed in this framework with two real herpetological data sets. The AIC and measures derived from it is should be routinely adopted by herpetologists. ?? Koninklijke Brill NV 2006.

  6. [Correlation coefficient-based classification method of hydrological dependence variability: With auto-regression model as example].

    PubMed

    Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi

    2018-04-01

    Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.

  7. Free-jet feasibility study of a thermal acoustic shield concept for AST/VCE application-dual stream nozzles. Comprehensive data report. Volume 2: Laser velocimeter and suppressor. Base pressure data

    NASA Technical Reports Server (NTRS)

    Janardan, B. A.; Brausch, J. F.; Price, A. O.

    1984-01-01

    Acoustic and diagnostic data that were obtained to determine the influence of selected geometric and aerodynamic flow variables of coannular nozzles with thermal acoustic shields are summarized in this comprehensive data report. A total of 136 static and simulated flight acoustic test points were conducted with 9 scale-model nozzles. Aerodynamic laser velocimeter measurements were made for four selected plumes. In addition, static pressure data in the chute base region of the suppressor configurations were obtained to assess the influence of the shield stream on the suppressor base drag.

  8. Security of a discretely signaled continuous variable quantum key distribution protocol for high rate systems.

    PubMed

    Zhang, Zheshen; Voss, Paul L

    2009-07-06

    We propose a continuous variable based quantum key distribution protocol that makes use of discretely signaled coherent light and reverse error reconciliation. We present a rigorous security proof against collective attacks with realistic lossy, noisy quantum channels, imperfect detector efficiency, and detector electronic noise. This protocol is promising for convenient, high-speed operation at link distances up to 50 km with the use of post-selection.

  9. What Makes a Difference during the Last Two Years of High School: An Overview of Studies Based on High School and Beyond Data.

    ERIC Educational Resources Information Center

    Marsh, Herbert W.

    Variables that influence growth and change in educational outcomes in the last 2 years of high school were studied using data from the High School and Beyond (HSB) study. The HSB study provided a database of thousands of variables for about 30 students from each of 1,000 randomly selected high schools in the United States in their sophomore and…

  10. Unbiased split variable selection for random survival forests using maximally selected rank statistics.

    PubMed

    Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

    2017-04-15

    The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  11. Influence of precipitation and crop germination on resource selection by mule deer (Odocoileus hemionus) in southwest Colorado

    USGS Publications Warehouse

    Carrollo, Emily M.; Johnson, Heather E.; Fischer, Justin W.; Hammond, Matthew; Dorsey, Patricia D.; Anderson, Charles; Vercauteren, Kurt C.; Walter, W. David

    2017-01-01

    Mule deer (Odocoileus hemionus) populations in the western United States provide many benefits to local economies but can also cause considerable damage to agriculture, particularly damage to lucrative crops. Limited information exists to understand resource selection of mule deer in response to annual variation in crop rotation and climatic conditions. We tested the hypothesis that mule deer select certain crops, and in particular sunflower, based on annual climatic variability. Our objective was to use movements, estimates of home range, and resource selection analysis to identify resources selected by mule deer. We used annually-derived crop-specific datasets along with Global Positioning System collars to monitor 14 mule deer in an agricultural area near public lands in southwestern Colorado, USA. We estimated home ranges for two winter seasons that ranged between 7.68 and 9.88 km2, and for two summer seasons that ranged between 5.51 and 6.24 km2. Mule deer selected areas closer to forest and alfalfa for most periods during 2012, but selected areas closer to sunflower in a majority of periods during 2013. Considerable annual variation in climate patterns and precipitation levels appeared to influence selection by mule deer because of variability in crop rotation and success of germination of specific crops.

  12. Influence of Precipitation and Crop Germination on Resource Selection by Mule Deer (Odocoileus hemionus) in Southwest Colorado.

    PubMed

    Carrollo, Emily M; Johnson, Heather E; Fischer, Justin W; Hammond, Matthew; Dorsey, Patricia D; Anderson, Charles; Vercauteren, Kurt C; Walter, W David

    2017-11-09

    Mule deer (Odocoileus hemionus) populations in the western United States provide many benefits to local economies but can also cause considerable damage to agriculture, particularly damage to lucrative crops. Limited information exists to understand resource selection of mule deer in response to annual variation in crop rotation and climatic conditions. We tested the hypothesis that mule deer select certain crops, and in particular sunflower, based on annual climatic variability. Our objective was to use movements, estimates of home range, and resource selection analysis to identify resources selected by mule deer. We used annually-derived crop-specific datasets along with Global Positioning System collars to monitor 14 mule deer in an agricultural area near public lands in southwestern Colorado, USA. We estimated home ranges for two winter seasons that ranged between 7.68 and 9.88 km 2 , and for two summer seasons that ranged between 5.51 and 6.24 km 2 . Mule deer selected areas closer to forest and alfalfa for most periods during 2012, but selected areas closer to sunflower in a majority of periods during 2013. Considerable annual variation in climate patterns and precipitation levels appeared to influence selection by mule deer because of variability in crop rotation and success of germination of specific crops.

  13. Discovering new variable stars at Key Stage 3

    NASA Astrophysics Data System (ADS)

    Chubb, Katy; Hood, Rosie; Wilson, Thomas; Holdship, Jonathan; Hutton, Sarah

    2017-05-01

    Details of the London pilot of the ‘Discovery Project’ are presented, where university-based astronomers were given the chance to pass on some real and applied knowledge of astronomy to a group of selected secondary school pupils. It was aimed at students in Key Stage 3 of their education, allowing them to be involved in real astronomical research at an early stage of their education, the chance to become the official discoverer of a new variable star, and to be listed in the International Variable Star Index database (The International Variable Star Index, Version 1.1, American Association of Variable Star Observers (AAVSO), 2016, http://aavso.org/vsx), all while learning and practising research-level skills. Future plans are discussed.

  14. The search for loci under selection: trends, biases and progress.

    PubMed

    Ahrens, Collin W; Rymer, Paul D; Stow, Adam; Bragg, Jason; Dillon, Shannon; Umbers, Kate D L; Dudaniec, Rachael Y

    2018-03-01

    Detecting genetic variants under selection using F ST outlier analysis (OA) and environmental association analyses (EAAs) are popular approaches that provide insight into the genetic basis of local adaptation. Despite the frequent use of OA and EAA approaches and their increasing attractiveness for detecting signatures of selection, their application to field-based empirical data have not been synthesized. Here, we review 66 empirical studies that use Single Nucleotide Polymorphisms (SNPs) in OA and EAA. We report trends and biases across biological systems, sequencing methods, approaches, parameters, environmental variables and their influence on detecting signatures of selection. We found striking variability in both the use and reporting of environmental data and statistical parameters. For example, linkage disequilibrium among SNPs and numbers of unique SNP associations identified with EAA were rarely reported. The proportion of putatively adaptive SNPs detected varied widely among studies, and decreased with the number of SNPs analysed. We found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection. OA identified a higher proportion of outliers when more individuals were sampled, but this was not the case for EAA. To facilitate repeatability, interpretation and synthesis of studies detecting selection, we recommend that future studies consistently report geographical coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure. Identifying standards for how OA and EAA studies are designed and reported will aid future transparency and comparability of SNP-based selection studies and help to progress landscape and evolutionary genomics. © 2018 John Wiley & Sons Ltd.

  15. Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

    PubMed

    Shen, Chung-Wei; Chen, Yi-Hau

    2018-03-13

    We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.

  16. On the relationship between ecosystem-scale hyperspectral reflectance and CO2 exchange in European mountain grasslands

    NASA Astrophysics Data System (ADS)

    Balzarolo, M.; Vescovo, L.; Hammerle, A.; Gianelle, D.; Papale, D.; Tomelleri, E.; Wohlfahrt, G.

    2015-05-01

    In this paper we explore the skill of hyperspectral reflectance measurements and vegetation indices (VIs) derived from these in estimating carbon dioxide (CO2) fluxes of grasslands. Hyperspectral reflectance data, CO2 fluxes and biophysical parameters were measured at three grassland sites located in European mountain regions using standardized protocols. The relationships between CO2 fluxes, ecophysiological variables, traditional VIs and VIs derived using all two-band combinations of wavelengths available from the whole hyperspectral data space were analysed. We found that VIs derived from hyperspectral data generally explained a large fraction of the variability in the investigated dependent variables but differed in their ability to estimate midday and daily average CO2 fluxes and various derived ecophysiological parameters. Relationships between VIs and CO2 fluxes and ecophysiological parameters were site-specific, likely due to differences in soils, vegetation parameters and environmental conditions. Chlorophyll and water-content-related VIs explained the largest fraction of variability in most of the dependent variables. Band selection based on a combination of a genetic algorithm with random forests (GA-rF) confirmed that it is difficult to select a universal band region suitable across the investigated ecosystems. Our findings have major implications for upscaling terrestrial CO2 fluxes to larger regions and for remote- and proximal-sensing sampling and analysis strategies and call for more cross-site synthesis studies linking ground-based spectral reflectance with ecosystem-scale CO2 fluxes.

  17. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm.

    PubMed

    Mao, Yong; Zhou, Xiao-Bo; Pi, Dao-Ying; Sun, You-Xian; Wong, Stephen T C

    2005-10-01

    In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.

  18. CORRELATION PURSUIT: FORWARD STEPWISE VARIABLE SELECTION FOR INDEX MODELS

    PubMed Central

    Zhong, Wenxuan; Zhang, Tingting; Zhu, Yu; Liu, Jun S.

    2012-01-01

    In this article, a stepwise procedure, correlation pursuit (COP), is developed for variable selection under the sufficient dimension reduction framework, in which the response variable Y is influenced by the predictors X1, X2, …, Xp through an unknown function of a few linear combinations of them. Unlike linear stepwise regression, COP does not impose a special form of relationship (such as linear) between the response variable and the predictor variables. The COP procedure selects variables that attain the maximum correlation between the transformed response and the linear combination of the variables. Various asymptotic properties of the COP procedure are established, and in particular, its variable selection performance under diverging number of predictors and sample size has been investigated. The excellent empirical performance of the COP procedure in comparison with existing methods are demonstrated by both extensive simulation studies and a real example in functional genomics. PMID:23243388

  19. Framework for making better predictions by directly estimating variables' predictivity.

    PubMed

    Lo, Adeline; Chernoff, Herman; Zheng, Tian; Lo, Shaw-Hwa

    2016-12-13

    We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.

  20. Climate Change Impacts at Department of Defense

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kotamarthi, Rao; Wang, Jiali; Zoebel, Zach

    This project is aimed at providing the U.S. Department of Defense (DoD) with a comprehensive analysis of the uncertainty associated with generating climate projections at the regional scale that can be used by stakeholders and decision makers to quantify and plan for the impacts of future climate change at specific locations. The merits and limitations of commonly used downscaling models, ranging from simple to complex, are compared, and their appropriateness for application at installation scales is evaluated. Downscaled climate projections are generated at selected DoD installations using dynamic and statistical methods with an emphasis on generating probability distributions of climatemore » variables and their associated uncertainties. The sites selection and selection of variables and parameters for downscaling was based on a comprehensive understanding of the current and projected roles that weather and climate play in operating, maintaining, and planning DoD facilities and installations.« less

  1. Explaining the Positive Relationship between Fourth-Grade Children’s Body Mass Index and Energy Intake at School-Provided Meals (Breakfast and Lunch)

    PubMed Central

    Baxter, Suzanne Domel; Royer, Julie A.; Hitchcock, David B.

    2013-01-01

    BACKGROUND A positive relationship exists between children’s body mass index (BMI) and energy intake at school-provided meals. To help explain this relationship, we investigated 7 outcome variables concerning aspects of school-provided meals—energy content of items selected, number of meal components selected, number of meal components eaten, amounts eaten of standardized school-meal portions, energy intake from flavored milk, energy intake received in trades, and energy content given in trades. METHODS We observed children in grade 4 (N=465) eating school-provided breakfast and lunch on one to 4 days per child. We measured children’s weight and height. For daily values at school meals, a generalized linear model was fit with BMI (dependent variable) and the 7 outcome variables, sex, and age (independent variables). RESULTS BMI was positively related to amounts eaten of standardized school-meal portions (p < .0001) and increased 8.45 kg/m2 per serving, controlling for other variables in the model. BMI was positively related to energy intake from flavored milk (p = .0041) and increased 0.347 kg/m2 for every 100-kcal consumed. BMI was negatively related to energy intake received in trades (p = .0003) and decreased 0.468 kg/m2 for every 100-kcal received. BMI was not significantly related to 4 outcome variables. CONCLUSIONS Knowing that relationships between BMI and actual consumption, not selection, at school-provided meals explained the (previously found) positive relationship between BMI and energy intake at school-provided meals is helpful for school-based obesity interventions. PMID:23517000

  2. Sole: Online Analysis of Southern FIA Data

    Treesearch

    Michael P. Spinney; Paul C. Van Deusen; Francis A. Roesch

    2006-01-01

    The Southern On Line Estimator (SOLE) is a flexible modular software program for analyzing U.S. Department of Agriculture Forest Service Forest Inventory and Analysis data. SOLE produces statistical tables, figures, maps, and portable document format reports based on user selected area and variables. SOLE?s Java-based graphical user interface is easy to use, and its R-...

  3. Group Variable Selection Via Convex Log-Exp-Sum Penalty with Application to a Breast Cancer Survivor Study

    PubMed Central

    Geng, Zhigeng; Wang, Sijian; Yu, Menggang; Monahan, Patrick O.; Champion, Victoria; Wahba, Grace

    2017-01-01

    Summary In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this article, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors. PMID:25257196

  4. Nest-site selection analysis of hooded crane (Grus monacha) in Northeastern China based on a multivariate ensemble model.

    PubMed

    Jiao, Shengwu; Guo, Yumin; Huettmann, Falk; Lei, Guangchun

    2014-07-01

    Avian nest-site selection is an important research and management subject. The hooded crane (Grus monacha) is a vulnerable (VU) species according to the IUCN Red List. Here, we present the first long-term Chinese legacy nest data for this species (1993-2010) with publicly available metadata. Further, we provide the first study that reports findings on multivariate nest habitat preference using such long-term field data for this species. Our work was carried out in Northeastern China, where we found and measured 24 nests and 81 randomly selected control plots and their environmental parameters in a vast landscape. We used machine learning (stochastic boosted regression trees) to quantify nest selection. Our analysis further included varclust (R Hmisc) and (TreenNet) to address statistical correlations and two-way interactions. We found that from an initial list of 14 measured field variables, water area (+), water depth (+) and shrub coverage (-) were the main explanatory variables that contributed to hooded crane nest-site selection. Agricultural sites played a smaller role in the selection of these nests. Our results are important for the conservation management of cranes all over East Asia and constitute a defensible and quantitative basis for predictive models.

  5. Forming a single layer of a composite powder based on the Ti-Nb system via selective laser melting (SLM)

    NASA Astrophysics Data System (ADS)

    Saprykin, A. A.; Sharkeev, Yu P.; Ibragimov, E. A.; Babakova, E. V.; Dudikhin, D. V.

    2016-07-01

    Alloys based on the titanium-niobium system are widely used in implant production. It is conditional, first of all, on the low modulus of elasticity and bio-inert properties of an alloy. These alloys are especially important for tooth replacement and orthopedic surgery. At present alloys based on the titanium-niobium system are produced mainly using conventional metallurgical methods. The further subtractive manufacturing an end product results in a lot of wastes, increasing, therefore, its cost. The alternative of these processes is additive manufacturing. Selective laser melting is a technology, which makes it possible to synthesize products of metal powders and their blends. The point of this technology is laser melting a layer of a powdered material; then a sintered layer is coated with the next layer of powder etc. Complex products and working prototypes are made on the base of this technology. The authors of this paper address to the issue of applying selective laser melting in order to synthesize a binary alloy of a composite powder based on the titanium-niobium system. A set of 10x10 mm samples is made in various process conditions. The samples are made by an experimental selective laser synthesis machine «VARISKAF-100MB». The machine provides adjustment of the following process variables: laser emission power, scanning rate and pitch, temperature of powder pre-heating, thickness of the layer to be sprinkled, and diameter of laser spot focusing. All samples are made in the preliminary vacuumized shielding atmosphere of argon. The porosity and thickness of the sintered layer related to the laser emission power are shown at various scanning rates. It is revealed that scanning rate and laser emission power are adjustable process variables, having the greatest effect on forming the sintered layer.

  6. Variability-selected active galactic nuclei in the VST-SUDARE/VOICE survey of the COSMOS field

    NASA Astrophysics Data System (ADS)

    De Cicco, D.; Paolillo, M.; Covone, G.; Falocco, S.; Longo, G.; Grado, A.; Limatola, L.; Botticella, M. T.; Pignata, G.; Cappellaro, E.; Vaccari, M.; Trevese, D.; Vagnetti, F.; Salvato, M.; Radovich, M.; Brandt, W. N.; Capaccioli, M.; Napolitano, N. R.; Schipani, P.

    2015-02-01

    Context. Active galaxies are characterized by variability at every wavelength, with timescales from hours to years depending on the observing window. Optical variability has proven to be an effective way of detecting AGNs in imaging surveys, lasting from weeks to years. Aims: In the present work we test the use of optical variability as a tool to identify active galactic nuclei in the VST multiepoch survey of the COSMOS field, originally tailored to detect supernova events. Methods: We make use of the multiwavelength data provided by other COSMOS surveys to discuss the reliability of the method and the nature of our AGN candidates. Results: The selection on the basis of optical variability returns a sample of 83 AGN candidates; based on a number of diagnostics, we conclude that 67 of them are confirmed AGNs (81% purity), 12 are classified as supernovae, while the nature of the remaining 4 is unknown. For the subsample of AGNs with some spectroscopic classification, we find that Type 1 are prevalent (89%) compared to Type 2 AGNs (11%). Overall, our approach is able to retrieve on average 15% of all AGNs in the field identified by means of spectroscopic or X-ray classification, with a strong dependence on the source apparent magnitude (completeness ranging from 26% to 5%). In particular, the completeness for Type 1 AGNs is 25%, while it drops to 6% for Type 2 AGNs. The rest of the X-ray selected AGN population presents on average a larger rms variability than the bulk of non-variable sources, indicating that variability detection for at least some of these objects is prevented only by the photometric accuracy of the data. The low completeness is in part due to the short observing span: we show that increasing the temporal baseline results in larger samples as expected for sources with a red-noise power spectrum. Our results allow us to assess the usefulness of this AGN selection technique in view of future wide-field surveys. Observations were provided by the ESO programs 088.D-0370 and 088.D-4013 (PI G. Pignata).Table 3 is available in electronic form at http://www.aanda.org

  7. Development of a computer-based clinical decision support tool for selecting appropriate rehabilitation interventions for injured workers.

    PubMed

    Gross, Douglas P; Zhang, Jing; Steenstra, Ivan; Barnsley, Susan; Haws, Calvin; Amell, Tyler; McIntosh, Greg; Cooper, Juliette; Zaiane, Osmar

    2013-12-01

    To develop a classification algorithm and accompanying computer-based clinical decision support tool to help categorize injured workers toward optimal rehabilitation interventions based on unique worker characteristics. Population-based historical cohort design. Data were extracted from a Canadian provincial workers' compensation database on all claimants undergoing work assessment between December 2009 and January 2011. Data were available on: (1) numerous personal, clinical, occupational, and social variables; (2) type of rehabilitation undertaken; and (3) outcomes following rehabilitation (receiving time loss benefits or undergoing repeat programs). Machine learning, concerned with the design of algorithms to discriminate between classes based on empirical data, was the foundation of our approach to build a classification system with multiple independent and dependent variables. The population included 8,611 unique claimants. Subjects were predominantly employed (85 %) males (64 %) with diagnoses of sprain/strain (44 %). Baseline clinician classification accuracy was high (ROC = 0.86) for selecting programs that lead to successful return-to-work. Classification performance for machine learning techniques outperformed the clinician baseline classification (ROC = 0.94). The final classifiers were multifactorial and included the variables: injury duration, occupation, job attachment status, work status, modified work availability, pain intensity rating, self-rated occupational disability, and 9 items from the SF-36 Health Survey. The use of machine learning classification techniques appears to have resulted in classification performance better than clinician decision-making. The final algorithm has been integrated into a computer-based clinical decision support tool that requires additional validation in a clinical sample.

  8. Variable Selection for Road Segmentation in Aerial Images

    NASA Astrophysics Data System (ADS)

    Warnke, S.; Bulatov, D.

    2017-05-01

    For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.

  9. Path Finding on High-Dimensional Free Energy Landscapes

    NASA Astrophysics Data System (ADS)

    Díaz Leines, Grisell; Ensing, Bernd

    2012-07-01

    We present a method for determining the average transition path and the free energy along this path in the space of selected collective variables. The formalism is based upon a history-dependent bias along a flexible path variable within the metadynamics framework but with a trivial scaling of the cost with the number of collective variables. Controlling the sampling of the orthogonal modes recovers the average path and the minimum free energy path as the limiting cases. The method is applied to resolve the path and the free energy of a conformational transition in alanine dipeptide.

  10. Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: a simulation study with continuous response.

    PubMed

    Binder, Harald; Sauerbrei, Willi; Royston, Patrick

    2013-06-15

    In observational studies, many continuous or categorical covariates may be related to an outcome. Various spline-based procedures or the multivariable fractional polynomial (MFP) procedure can be used to identify important variables and functional forms for continuous covariates. This is the main aim of an explanatory model, as opposed to a model only for prediction. The type of analysis often guides the complexity of the final model. Spline-based procedures and MFP have tuning parameters for choosing the required complexity. To compare model selection approaches, we perform a simulation study in the linear regression context based on a data structure intended to reflect realistic biomedical data. We vary the sample size, variance explained and complexity parameters for model selection. We consider 15 variables. A sample size of 200 (1000) and R(2)  = 0.2 (0.8) is the scenario with the smallest (largest) amount of information. For assessing performance, we consider prediction error, correct and incorrect inclusion of covariates, qualitative measures for judging selected functional forms and further novel criteria. From limited information, a suitable explanatory model cannot be obtained. Prediction performance from all types of models is similar. With a medium amount of information, MFP performs better than splines on several criteria. MFP better recovers simpler functions, whereas splines better recover more complex functions. For a large amount of information and no local structure, MFP and the spline procedures often select similar explanatory models. Copyright © 2012 John Wiley & Sons, Ltd.

  11. Uniting statistical and individual-based approaches for animal movement modelling.

    PubMed

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.

  12. Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling

    PubMed Central

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047

  13. Evidence-based selection process to the Master of Public Health program at Medical University.

    PubMed

    Panczyk, Mariusz; Juszczyk, Grzegorz; Zarzeka, Aleksander; Samoliński, Łukasz; Belowska, Jarosława; Cieślak, Ilona; Gotlib, Joanna

    2017-09-11

    Evaluation of the predictive validity of selected sociodemographic factors and admission criteria for Master's studies in Public Health at the Faculty of Health Sciences, Medical University of Warsaw (MUW). For the evaluation purposes recruitment data and learning results of students enrolled between 2008 and 2012 were used (N = 605, average age 22.9 ± 3.01). The predictive analysis was performed using the multiple linear regression method. In the proposed regression model 12 predictors were selected, including: sex, age, professional degree (BA), the Bachelor's studies grade point average (GPA), total score of the preliminary examination broken down into five thematic areas. Depending on the tested model, one of two dependent variables was used: first-year GPA or cumulative GPA in the Master program. The regression model based on the result variable of Master's GPA program was better matched to data in comparison to the model based on the first year GPA (adjusted R 2 0.413 versus 0.476 respectively). The Bachelor's studies GPA and each of the five subtests comprising the test entrance exam were significant predictors of success achieved by a student both after the first year and at the end of the course of studies. Criteria of admissions with total score of MCQs exam and Bachelor's studies GPA can be successfully used for selection of the candidates for Master's degree studies in Public Health. The high predictive validity of the recruitment system confirms the validity of the adopted admission policy at MUW.

  14. Roosting habitat use and selection by northern spotted owls during natal dispersal

    USGS Publications Warehouse

    Sovern, Stan G.; Forsman, Eric D.; Dugger, Catherine M.; Taylor, Margaret

    2015-01-01

    We studied habitat selection by northern spotted owls (Strix occidentalis caurina) during natal dispersal in Washington State, USA, at both the roost site and landscape scales. We used logistic regression to obtain parameters for an exponential resource selection function based on vegetation attributes in roost and random plots in 76 forest stands that were used for roosting. We used a similar analysis to evaluate selection of landscape habitat attributes based on 301 radio-telemetry relocations and random points within our study area. We found no evidence of within-stand selection for any of the variables examined, but 78% of roosts were in stands with at least some large (>50 cm dbh) trees. At the landscape scale, owls selected for stands with high canopy cover (>70%). Dispersing owls selected vegetation types that were more similar to habitat selected by adult owls than habitat that would result from following guidelines previously proposed to maintain dispersal habitat. Our analysis indicates that juvenile owls select stands for roosting that have greater canopy cover than is recommended in current agency guidelines.

  15. The Impact of Variability of Selected Geological and Mining Parameters on the Value and Risks of Projects in the Hard Coal Mining Industry

    NASA Astrophysics Data System (ADS)

    Kopacz, Michał

    2017-09-01

    The paper attempts to assess the impact of variability of selected geological (deposit) parameters on the value and risks of projects in the hard coal mining industry. The study was based on simulated discounted cash flow analysis, while the results were verified for three existing bituminous coal seams. The Monte Carlo simulation was based on nonparametric bootstrap method, while correlations between individual deposit parameters were replicated with use of an empirical copula. The calculations take into account the uncertainty towards the parameters of empirical distributions of the deposit variables. The Net Present Value (NPV) and the Internal Rate of Return (IRR) were selected as the main measures of value and risk, respectively. The impact of volatility and correlation of deposit parameters were analyzed in two aspects, by identifying the overall effect of the correlated variability of the parameters and the indywidual impact of the correlation on the NPV and IRR. For this purpose, a differential approach, allowing determining the value of the possible errors in calculation of these measures in numerical terms, has been used. Based on the study it can be concluded that the mean value of the overall effect of the variability does not exceed 11.8% of NPV and 2.4 percentage points of IRR. Neglecting the correlations results in overestimating the NPV and the IRR by up to 4.4%, and 0.4 percentage point respectively. It should be noted, however, that the differences in NPV and IRR values can vary significantly, while their interpretation depends on the likelihood of implementation. Generalizing the obtained results, based on the average values, the maximum value of the risk premium in the given calculation conditions of the "X" deposit, and the correspondingly large datasets (greater than 2500), should not be higher than 2.4 percentage points. The impact of the analyzed geological parameters on the NPV and IRR depends primarily on their co-existence, which can be measured by the strength of correlation. In the analyzed case, the correlations result in limiting the range of variation of the geological parameters and economics results (the empirical copula reduces the NPV and IRR in probabilistic approach). However, this is due to the adjustment of the calculation under conditions similar to those prevailing in the deposit.

  16. Operationalizing hippocampal volume as an enrichment biomarker for amnestic MCI trials: effect of algorithm, test-retest variability and cut-point on trial cost, duration and sample size

    PubMed Central

    Yu, P.; Sun, J.; Wolz, R.; Stephenson, D.; Brewer, J.; Fox, N.C.; Cole, P.E.; Jack, C.R.; Hill, D.L.G.; Schwarz, A.J.

    2014-01-01

    Objective To evaluate the effect of computational algorithm, measurement variability and cut-point on hippocampal volume (HCV)-based patient selection for clinical trials in mild cognitive impairment (MCI). Methods We used normal control and amnestic MCI subjects from ADNI-1 as normative reference and screening cohorts. We evaluated the enrichment performance of four widely-used hippocampal segmentation algorithms (FreeSurfer, HMAPS, LEAP and NeuroQuant) in terms of two-year changes in MMSE, ADAS-Cog and CDR-SB. We modeled the effect of algorithm, test-retest variability and cut-point on sample size, screen fail rates and trial cost and duration. Results HCV-based patient selection yielded not only reduced sample sizes (by ~40–60%) but also lower trial costs (by ~30–40%) across a wide range of cut-points. Overall, the dependence on the cut-point value was similar for the three clinical instruments considered. Conclusion These results provide a guide to the choice of HCV cut-point for aMCI clinical trials, allowing an informed trade-off between statistical and practical considerations. PMID:24211008

  17. Measurement error in epidemiologic studies of air pollution based on land-use regression models.

    PubMed

    Basagaña, Xavier; Aguilera, Inmaculada; Rivera, Marcela; Agis, David; Foraster, Maria; Marrugat, Jaume; Elosua, Roberto; Künzli, Nino

    2013-10-15

    Land-use regression (LUR) models are increasingly used to estimate air pollution exposure in epidemiologic studies. These models use air pollution measurements taken at a small set of locations and modeling based on geographical covariates for which data are available at all study participant locations. The process of LUR model development commonly includes a variable selection procedure. When LUR model predictions are used as explanatory variables in a model for a health outcome, measurement error can lead to bias of the regression coefficients and to inflation of their variance. In previous studies dealing with spatial predictions of air pollution, bias was shown to be small while most of the effect of measurement error was on the variance. In this study, we show that in realistic cases where LUR models are applied to health data, bias in health-effect estimates can be substantial. This bias depends on the number of air pollution measurement sites, the number of available predictors for model selection, and the amount of explainable variability in the true exposure. These results should be taken into account when interpreting health effects from studies that used LUR models.

  18. The Use of Variable Q1 Isolation Windows Improves Selectivity in LC-SWATH-MS Acquisition.

    PubMed

    Zhang, Ying; Bilbao, Aivett; Bruderer, Tobias; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard; Varesio, Emmanuel

    2015-10-02

    As tryptic peptides and metabolites are not equally distributed along the mass range, the probability of cross fragment ion interference is higher in certain windows when fixed Q1 SWATH windows are applied. We evaluated the benefits of utilizing variable Q1 SWATH windows with regards to selectivity improvement. Variable windows based on equalizing the distribution of either the precursor ion population (PIP) or the total ion current (TIC) within each window were generated by an in-house software, swathTUNER. These two variable Q1 SWATH window strategies outperformed, with respect to quantification and identification, the basic approach using a fixed window width (FIX) for proteomic profiling of human monocyte-derived dendritic cells (MDDCs). Thus, 13.8 and 8.4% additional peptide precursors, which resulted in 13.1 and 10.0% more proteins, were confidently identified by SWATH using the strategy PIP and TIC, respectively, in the MDDC proteomic sample. On the basis of the spectral library purity score, some improvement warranted by variable Q1 windows was also observed, albeit to a lesser extent, in the metabolomic profiling of human urine. We show that the novel concept of "scheduled SWATH" proposed here, which incorporates (i) variable isolation windows and (ii) precursor retention time segmentation further improves both peptide and metabolite identifications.

  19. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.

    PubMed

    Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H

    2017-07-01

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.

  20. A synthesis of the theories and concepts of early human evolution.

    PubMed

    Maslin, Mark A; Shultz, Susanne; Trauth, Martin H

    2015-03-05

    Current evidence suggests that many of the major events in hominin evolution occurred in East Africa. Hence, over the past two decades, there has been intensive work undertaken to understand African palaeoclimate and tectonics in order to put together a coherent picture of how the environment of Africa has varied over the past 10 Myr. A new consensus is emerging that suggests the unusual geology and climate of East Africa created a complex, environmentally very variable setting. This new understanding of East African climate has led to the pulsed climate variability hypothesis that suggests the long-term drying trend in East Africa was punctuated by episodes of short alternating periods of extreme humidity and aridity which may have driven hominin speciation, encephalization and dispersals out of Africa. This hypothesis is unique as it provides a conceptual framework within which other evolutionary theories can be examined: first, at macro-scale comparing phylogenetic gradualism and punctuated equilibrium; second, at a more focused level of human evolution comparing allopatric speciation, aridity hypothesis, turnover pulse hypothesis, variability selection hypothesis, Red Queen hypothesis and sympatric speciation based on sexual selection. It is proposed that each one of these mechanisms may have been acting on hominins during these short periods of climate variability, which then produce a range of different traits that led to the emergence of new species. In the case of Homo erectus (sensu lato), it is not just brain size that changes but life history (shortened inter-birth intervals, delayed development), body size and dimorphism, shoulder morphology to allow thrown projectiles, adaptation to long-distance running, ecological flexibility and social behaviour. The future of evolutionary research should be to create evidence-based meta-narratives, which encompass multiple mechanisms that select for different traits leading ultimately to speciation.

  1. A synthesis of the theories and concepts of early human evolution

    PubMed Central

    Maslin, Mark A.; Shultz, Susanne; Trauth, Martin H.

    2015-01-01

    Current evidence suggests that many of the major events in hominin evolution occurred in East Africa. Hence, over the past two decades, there has been intensive work undertaken to understand African palaeoclimate and tectonics in order to put together a coherent picture of how the environment of Africa has varied over the past 10 Myr. A new consensus is emerging that suggests the unusual geology and climate of East Africa created a complex, environmentally very variable setting. This new understanding of East African climate has led to the pulsed climate variability hypothesis that suggests the long-term drying trend in East Africa was punctuated by episodes of short alternating periods of extreme humidity and aridity which may have driven hominin speciation, encephalization and dispersals out of Africa. This hypothesis is unique as it provides a conceptual framework within which other evolutionary theories can be examined: first, at macro-scale comparing phylogenetic gradualism and punctuated equilibrium; second, at a more focused level of human evolution comparing allopatric speciation, aridity hypothesis, turnover pulse hypothesis, variability selection hypothesis, Red Queen hypothesis and sympatric speciation based on sexual selection. It is proposed that each one of these mechanisms may have been acting on hominins during these short periods of climate variability, which then produce a range of different traits that led to the emergence of new species. In the case of Homo erectus (sensu lato), it is not just brain size that changes but life history (shortened inter-birth intervals, delayed development), body size and dimorphism, shoulder morphology to allow thrown projectiles, adaptation to long-distance running, ecological flexibility and social behaviour. The future of evolutionary research should be to create evidence-based meta-narratives, which encompass multiple mechanisms that select for different traits leading ultimately to speciation. PMID:25602068

  2. Spatial analysis and land use regression of VOCs and NO(2) from school-based urban air monitoring in Detroit/Dearborn, USA.

    PubMed

    Mukerjee, Shaibal; Smith, Luther A; Johnson, Mary M; Neas, Lucas M; Stallings, Casson A

    2009-08-01

    Passive ambient air sampling for nitrogen dioxide (NO(2)) and volatile organic compounds (VOCs) was conducted at 25 school and two compliance sites in Detroit and Dearborn, Michigan, USA during the summer of 2005. Geographic Information System (GIS) data were calculated at each of 116 schools. The 25 selected schools were monitored to assess and model intra-urban gradients of air pollutants to evaluate impact of traffic and urban emissions on pollutant levels. Schools were chosen to be statistically representative of urban land use variables such as distance to major roadways, traffic intensity around the schools, distance to nearest point sources, population density, and distance to nearest border crossing. Two approaches were used to investigate spatial variability. First, Kruskal-Wallis analyses and pairwise comparisons on data from the schools examined coarse spatial differences based on city section and distance from heavily trafficked roads. Secondly, spatial variation on a finer scale and as a response to multiple factors was evaluated through land use regression (LUR) models via multiple linear regression. For weeklong exposures, VOCs did not exhibit spatial variability by city section or distance from major roads; NO(2) was significantly elevated in a section dominated by traffic and industrial influence versus a residential section. Somewhat in contrast to coarse spatial analyses, LUR results revealed spatial gradients in NO(2) and selected VOCs across the area. The process used to select spatially representative sites for air sampling and the results of coarse and fine spatial variability of air pollutants provide insights that may guide future air quality studies in assessing intra-urban gradients.

  3. Selecting Populations for Non-Analogous Climate Conditions Using Universal Response Functions: The Case of Douglas-Fir in Central Europe.

    PubMed

    Chakraborty, Debojyoti; Wang, Tongli; Andre, Konrad; Konnert, Monika; Lexer, Manfred J; Matulla, Christoph; Schueler, Silvio

    2015-01-01

    Identifying populations within tree species potentially adapted to future climatic conditions is an important requirement for reforestation and assisted migration programmes. Such populations can be identified either by empirical response functions based on correlations of quantitative traits with climate variables or by climate envelope models that compare the climate of seed sources and potential growing areas. In the present study, we analyzed the intraspecific variation in climate growth response of Douglas-fir planted within the non-analogous climate conditions of Central and continental Europe. With data from 50 common garden trials, we developed Universal Response Functions (URF) for tree height and mean basal area and compared the growth performance of the selected best performing populations with that of populations identified through a climate envelope approach. Climate variables of the trial location were found to be stronger predictors of growth performance than climate variables of the population origin. Although the precipitation regime of the population sources varied strongly none of the precipitation related climate variables of population origin was found to be significant within the models. Overall, the URFs explained more than 88% of variation in growth performance. Populations identified by the URF models originate from western Cascades and coastal areas of Washington and Oregon and show significantly higher growth performance than populations identified by the climate envelope approach under both current and climate change scenarios. The URFs predict decreasing growth performance at low and middle elevations of the case study area, but increasing growth performance on high elevation sites. Our analysis suggests that population recommendations based on empirical approaches should be preferred and population selections by climate envelope models without considering climatic constrains of growth performance should be carefully appraised before transferring populations to planting locations with novel or dissimilar climate.

  4. Assessing the accuracy and stability of variable selection ...

    EPA Pesticide Factsheets

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used, or stepwise procedures are employed which iteratively add/remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating dataset consists of the good/poor condition of n=1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p=212) of landscape features from the StreamCat dataset. Two types of RF models are compared: a full variable set model with all 212 predictors, and a reduced variable set model selected using a backwards elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors, and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substanti

  5. Seasonal Habitat Use by Greater Sage-Grouse (Centrocercus urophasianus) on a Landscape with Low Density Oil and Gas Development.

    PubMed

    Rice, Mindy B; Rossi, Liza G; Apa, Anthony D

    2016-01-01

    Fragmentation of the sagebrush (Artemisia spp.) ecosystem has led to concern about a variety of sagebrush obligates including the greater sage-grouse (Centrocercus urophasianus). Given the increase of energy development within greater sage-grouse habitats, mapping seasonal habitats in pre-development populations is critical. The North Park population in Colorado is one of the largest and most stable in the state and provides a unique case study for investigating resource selection at a relatively low level of energy development compared to other populations both within and outside the state. We used locations from 117 radio-marked female greater sage-grouse in North Park, Colorado to develop seasonal resource selection models. We then added energy development variables to the base models at both a landscape and local scale to determine if energy variables improved the fit of the seasonal models. The base models for breeding and winter resource selection predicted greater use in large expanses of sagebrush whereas the base summer model predicted greater use along the edge of riparian areas. Energy development variables did not improve the winter or the summer models at either scale of analysis, but distance to oil/gas roads slightly improved model fit at both scales in the breeding season, albeit in opposite ways. At the landscape scale, greater sage-grouse were closer to oil/gas roads whereas they were further from oil/gas roads at the local scale during the breeding season. Although we found limited effects from low level energy development in the breeding season, the scale of analysis can influence the interpretation of effects. The lack of strong effects from energy development may be indicative that energy development at current levels are not impacting greater sage-grouse in North Park. Our baseline seasonal resource selection maps can be used for conservation to help identify ways of minimizing the effects of energy development.

  6. Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Yadav, B.; Hatfield, K.

    2017-12-01

    We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.

  7. Firefly algorithm versus genetic algorithm as powerful variable selection tools and their effect on different multivariate calibration models in spectroscopy: A comparative study

    NASA Astrophysics Data System (ADS)

    Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed

    2017-01-01

    For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration.

  8. New variable selection methods for zero-inflated count data with applications to the substance abuse field

    PubMed Central

    Buu, Anne; Johnson, Norman J.; Li, Runze; Tan, Xianming

    2011-01-01

    Zero-inflated count data are very common in health surveys. This study develops new variable selection methods for the zero-inflated Poisson regression model. Our simulations demonstrate the negative consequences which arise from the ignorance of zero-inflation. Among the competing methods, the one-step SCAD method is recommended because it has the highest specificity, sensitivity, exact fit, and lowest estimation error. The design of the simulations is based on the special features of two large national databases commonly used in the alcoholism and substance abuse field so that our findings can be easily generalized to the real settings. Applications of the methodology are demonstrated by empirical analyses on the data from a well-known alcohol study. PMID:21563207

  9. THAT INSTRUMENT IS LOUSY! IN SEARCH OF AGREEMENT WHEN USING INSTRUMENTAL VARIABLES ESTIMATION IN SUBSTANCE USE RESEARCH

    PubMed Central

    Popovici, Ioana

    2009-01-01

    SUMMARY The primary statistical challenge that must be addressed when using cross-sectional data to estimate the consequences of consuming addictive substances is the likely endogeneity of substance use. While economists are in agreement on the need to consider potential endogeneity bias and the value of instrumental variables estimation, the selection of credible instruments is a topic of heated debate in the field. Rather than attempt to resolve this debate, our paper highlights the diversity of judgments about what constitutes appropriate instruments for substance use based on a comprehensive review of the economics literature since 1990. We then offer recommendations related to the selection of reliable instruments in future studies. PMID:20029936

  10. tICA-Metadynamics: Accelerating Metadynamics by Using Kinetically Selected Collective Variables.

    PubMed

    M Sultan, Mohammad; Pande, Vijay S

    2017-06-13

    Metadynamics is a powerful enhanced molecular dynamics sampling method that accelerates simulations by adding history-dependent multidimensional Gaussians along selective collective variables (CVs). In practice, choosing a small number of slow CVs remains challenging due to the inherent high dimensionality of biophysical systems. Here we show that time-structure based independent component analysis (tICA), a recent advance in Markov state model literature, can be used to identify a set of variationally optimal slow coordinates for use as CVs for Metadynamics. We show that linear and nonlinear tICA-Metadynamics can complement existing MD studies by explicitly sampling the system's slowest modes and can even drive transitions along the slowest modes even when no such transitions are observed in unbiased simulations.

  11. Distribution and relative abundance of humpback whales in relation to environmental variables in coastal British Columbia and adjacent waters

    NASA Astrophysics Data System (ADS)

    Dalla Rosa, Luciano; Ford, John K. B.; Trites, Andrew W.

    2012-03-01

    Humpback whales are common in feeding areas off British Columbia (BC) from spring to fall, and are widely distributed along the coast. Climate change and the increase in population size of North Pacific humpback whales may lead to increased anthropogenic impact and require a better understanding of species-habitat relationships. We investigated the distribution and relative abundance of humpback whales in relation to environmental variables and processes in BC waters using GIS and generalized additive models (GAMs). Six non-systematic cetacean surveys were conducted between 2004 and 2006. Whale encounter rates and environmental variables (oceanographic and remote sensing data) were recorded along transects divided into 4 km segments. A combined 3-year model and individual year models (two surveys each) were fitted with the mgcv R package. Model selection was based primarily on GCV scores. The explained deviance of our models ranged from 39% for the 3-year model to 76% for the 2004 model. Humpback whales were strongly associated with latitude and bathymetric features, including depth, slope and distance to the 100-m isobath. Distance to sea-surface-temperature fronts and salinity (climatology) were also constantly selected by the models. The shapes of smooth functions estimated for variables based on chlorophyll concentration or net primary productivity with different temporal resolutions and time lags were not consistent, even though higher numbers of whales seemed to be associated with higher primary productivity for some models. These and other selected explanatory variables may reflect areas of higher biological productivity that favor top predators. Our study confirms the presence of at least three important regions for humpback whales along the BC coast: south Dixon Entrance, middle and southwestern Hecate Strait and the area between La Perouse Bank and the southern edge of Juan de Fuca Canyon.

  12. Plasmodium relictum infection and MHC diversity in the house sparrow (Passer domesticus)

    PubMed Central

    Loiseau, Claire; Zoorob, Rima; Robert, Alexandre; Chastel, Olivier; Julliard, Romain; Sorci, Gabriele

    2011-01-01

    Antagonistic coevolution between hosts and parasites has been proposed as a mechanism maintaining genetic diversity in both host and parasite populations. In particular, the high level of genetic diversity usually observed at the major histocompatibility complex (MHC) is generally thought to be maintained by parasite-driven selection. Among the possible ways through which parasites can maintain MHC diversity, diversifying selection has received relatively less attention. This hypothesis is based on the idea that parasites exert spatially variable selection pressures because of heterogeneity in parasite genetic structure, abundance or virulence. Variable selection pressures should select for different host allelic lineages resulting in population-specific associations between MHC alleles and risk of infection. In this study, we took advantage of a large survey of avian malaria in 13 populations of the house sparrow (Passer domesticus) to test this hypothesis. We found that (i) several MHC alleles were either associated with increased or decreased risk to be infected with Plasmodium relictum, (ii) the effects were population specific, and (iii) some alleles had antagonistic effects across populations. Overall, these results support the hypothesis that diversifying selection in space can maintain MHC variation and suggest a pattern of local adaptation where MHC alleles are selected at the local host population level. PMID:20943698

  13. Exploratory Spectroscopy of Magnetic Cataclysmic Variables Candidates and Other Variable Objects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oliveira, A. S.; Palhares, M. S.; Rodrigues, C. V.

    2017-04-01

    The increasing number of synoptic surveys made by small robotic telescopes, such as the photometric Catalina Real-Time Transient Survey (CRTS), provides a unique opportunity to discover variable sources and improves the statistical samples of such classes of objects. Our goal is the discovery of magnetic Cataclysmic Variables (mCVs). These are rare objects that probe interesting accretion scenarios controlled by the white-dwarf magnetic field. In particular, improved statistics of mCVs would help to address open questions on their formation and evolution. We performed an optical spectroscopy survey to search for signatures of magnetic accretion in 45 variable objects selected mostly from themore » CRTS. In this sample, we found 32 CVs, 22 being mCV candidates, 13 of which were previously unreported as such. If the proposed classifications are confirmed, it would represent an increase of 4% in the number of known polars and 12% in the number of known IPs. A fraction of our initial sample was classified as extragalactic sources or other types of variable stars by the inspection of the identification spectra. Despite the inherent complexity in identifying a source as an mCV, variability-based selection, followed by spectroscopic snapshot observations, has proved to be an efficient strategy for their discoveries, being a relatively inexpensive approach in terms of telescope time.« less

  14. A New Variable Weighting and Selection Procedure for K-Means Cluster Analysis

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these…

  15. A survey of variable selection methods in two Chinese epidemiology journals

    PubMed Central

    2010-01-01

    Background Although much has been written on developing better procedures for variable selection, there is little research on how it is practiced in actual studies. This review surveys the variable selection methods reported in two high-ranking Chinese epidemiology journals. Methods Articles published in 2004, 2006, and 2008 in the Chinese Journal of Epidemiology and the Chinese Journal of Preventive Medicine were reviewed. Five categories of methods were identified whereby variables were selected using: A - bivariate analyses; B - multivariable analysis; e.g. stepwise or individual significance testing of model coefficients; C - first bivariate analyses, followed by multivariable analysis; D - bivariate analyses or multivariable analysis; and E - other criteria like prior knowledge or personal judgment. Results Among the 287 articles that reported using variable selection methods, 6%, 26%, 30%, 21%, and 17% were in categories A through E, respectively. One hundred sixty-three studies selected variables using bivariate analyses, 80% (130/163) via multiple significance testing at the 5% alpha-level. Of the 219 multivariable analyses, 97 (44%) used stepwise procedures, 89 (41%) tested individual regression coefficients, but 33 (15%) did not mention how variables were selected. Sixty percent (58/97) of the stepwise routines also did not specify the algorithm and/or significance levels. Conclusions The variable selection methods reported in the two journals were limited in variety, and details were often missing. Many studies still relied on problematic techniques like stepwise procedures and/or multiple testing of bivariate associations at the 0.05 alpha-level. These deficiencies should be rectified to safeguard the scientific validity of articles published in Chinese epidemiology journals. PMID:20920252

  16. Methodological development for selection of significant predictors explaining fatal road accidents.

    PubMed

    Dadashova, Bahar; Arenas-Ramírez, Blanca; Mira-McWilliams, José; Aparicio-Izquierdo, Francisco

    2016-05-01

    Identification of the most relevant factors for explaining road accident occurrence is an important issue in road safety research, particularly for future decision-making processes in transport policy. However model selection for this particular purpose is still an ongoing research. In this paper we propose a methodological development for model selection which addresses both explanatory variable and adequate model selection issues. A variable selection procedure, TIM (two-input model) method is carried out by combining neural network design and statistical approaches. The error structure of the fitted model is assumed to follow an autoregressive process. All models are estimated using Markov Chain Monte Carlo method where the model parameters are assigned non-informative prior distributions. The final model is built using the results of the variable selection. For the application of the proposed methodology the number of fatal accidents in Spain during 2000-2011 was used. This indicator has experienced the maximum reduction internationally during the indicated years thus making it an interesting time series from a road safety policy perspective. Hence the identification of the variables that have affected this reduction is of particular interest for future decision making. The results of the variable selection process show that the selected variables are main subjects of road safety policy measures. Published by Elsevier Ltd.

  17. Register-based predictors of violations of animal welfare legislation in dairy herds.

    PubMed

    Otten, N D; Nielsen, L R; Thomsen, P T; Houe, H

    2014-12-01

    The assessment of animal welfare can include resource-based or animal-based measures. Official animal welfare inspections in Denmark primarily control compliance with animal welfare legislation based on resource measures (e.g. housing system) and usually do not regard animal response parameters (e.g. clinical and behavioural observations). Herds selected for welfare inspections are sampled by a risk-based strategy based on existing register data. The aim of the present study was to evaluate register data variables as predictors of dairy herds with violations of the animal welfare legislation (VoAWL) defined as occurrence of at least one of the two most frequently violated measures found at recent inspections in Denmark, namely (a) presence of injured animals not separated from the rest of the group and/or (b) animals in a condition warranting euthanasia still being present in the herd. A total of 25 variables were extracted from the Danish Cattle Database and assessed as predictors using a multivariable logistic analysis of a data set including 73 Danish dairy herds, which all had more than 100 cows and cubicle loose-housing systems. Univariable screening was used to identify variables associated with VoAWL at a P-value<0.2 for the inclusion in a multivariable logistic regression analysis. Backward selection procedures identified the following variables for the final model predictive of VoAWL: increasing standard deviation of milk yield for first lactation cows, high bulk tank somatic cell count (⩾250 000 cells/ml) and suspiciously low number of recorded veterinary treatments (⩽25 treatments/100 cow years). The identified predictors may be explained by underlying management factors leading to impaired animal welfare in the herd, such as poor hygiene, feeding and management of dry or calving cows and sick animals. However, further investigations are required for causal inferences to be established.

  18. CLINICAL APPLICATIONS OF CRYOTHERAPY AMONG SPORTS PHYSICAL THERAPISTS.

    PubMed

    Hawkins, Shawn W; Hawkins, Jeremy R

    2016-02-01

    Therapeutic modalities (TM) are used by sports physical therapists (SPT) but how they are used is unknown. To identify the current clinical use patterns for cryotherapy among SPT. Cross-sectional survey. All members (7283) of the Sports Physical Therapy Section of the APTA were recruited. A scenario-based survey using pre-participation management of an acute or sub-acute ankle sprain was developed. A Select Survey link was distributed via email to participants. Respondents selected a treatment approach based upon options provided. Follow-up questions were asked. The survey was available for two weeks with a follow-up email sent after one week. Question answers were the main outcome measures. Reliability: Cronbach's alpha=>0.9. The SPT response rate = 6.9% (503); responses came from 48 states. Survey results indicated great variability in respondents' approaches to the treatment of an acute and sub-acute ankle sprain. SPT applied cryotherapy with great variability and not always in accordance to the limited research on the TM. Continuing education, application of current research, and additional outcomes based research needs to remain a focus for clinicians. 3.

  19. A comparison of recharge rates in aquifers of the United States based on groundwater-age data

    USGS Publications Warehouse

    McMahon, P.B.; Plummer, Niel; Böhlke, J.K.; Shapiro, S.D.; Hinkle, S.R.

    2011-01-01

    An overview is presented of existing groundwater-age data and their implications for assessing rates and timescales of recharge in selected unconfined aquifer systems of the United States. Apparent age distributions in aquifers determined from chlorofluorocarbon, sulfur hexafluoride, tritium/helium-3, and radiocarbon measurements from 565 wells in 45 networks were used to calculate groundwater recharge rates. Timescales of recharge were defined by 1,873 distributed tritium measurements and 102 radiocarbon measurements from 27 well networks. Recharge rates ranged from < 10 to 1,200 mm/yr in selected aquifers on the basis of measured vertical age distributions and assuming exponential age gradients. On a regional basis, recharge rates based on tracers of young groundwater exhibited a significant inverse correlation with mean annual air temperature and a significant positive correlation with mean annual precipitation. Comparison of recharge derived from groundwater ages with recharge derived from stream base-flow evaluation showed similar overall patterns but substantial local differences. Results from this compilation demonstrate that age-based recharge estimates can provide useful insights into spatial and temporal variability in recharge at a national scale and factors controlling that variability. Local age-based recharge estimates provide empirical data and process information that are needed for testing and improving more spatially complete model-based methods.

  20. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    DTIC Science & Technology

    2014-12-01

    regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value

  1. An Interactive Tool For Semi-automated Statistical Prediction Using Earth Observations and Models

    NASA Astrophysics Data System (ADS)

    Zaitchik, B. F.; Berhane, F.; Tadesse, T.

    2015-12-01

    We developed a semi-automated statistical prediction tool applicable to concurrent analysis or seasonal prediction of any time series variable in any geographic location. The tool was developed using Shiny, JavaScript, HTML and CSS. A user can extract a predictand by drawing a polygon over a region of interest on the provided user interface (global map). The user can select the Climatic Research Unit (CRU) precipitation or Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) as predictand. They can also upload their own predictand time series. Predictors can be extracted from sea surface temperature, sea level pressure, winds at different pressure levels, air temperature at various pressure levels, and geopotential height at different pressure levels. By default, reanalysis fields are applied as predictors, but the user can also upload their own predictors, including a wide range of compatible satellite-derived datasets. The package generates correlations of the variables selected with the predictand. The user also has the option to generate composites of the variables based on the predictand. Next, the user can extract predictors by drawing polygons over the regions that show strong correlations (composites). Then, the user can select some or all of the statistical prediction models provided. Provided models include Linear Regression models (GLM, SGLM), Tree-based models (bagging, random forest, boosting), Artificial Neural Network, and other non-linear models such as Generalized Additive Model (GAM) and Multivariate Adaptive Regression Splines (MARS). Finally, the user can download the analysis steps they used, such as the region they selected, the time period they specified, the predictand and predictors they chose and preprocessing options they used, and the model results in PDF or HTML format. Key words: Semi-automated prediction, Shiny, R, GLM, ANN, RF, GAM, MARS

  2. Development of a Robust Identifier for NPPs Transients Combining ARIMA Model and EBP Algorithm

    NASA Astrophysics Data System (ADS)

    Moshkbar-Bakhshayesh, Khalil; Ghofrani, Mohammad B.

    2014-08-01

    This study introduces a novel identification method for recognition of nuclear power plants (NPPs) transients by combining the autoregressive integrated moving-average (ARIMA) model and the neural network with error backpropagation (EBP) learning algorithm. The proposed method consists of three steps. First, an EBP based identifier is adopted to distinguish the plant normal states from the faulty ones. In the second step, ARIMA models use integrated (I) process to convert non-stationary data of the selected variables into stationary ones. Subsequently, ARIMA processes, including autoregressive (AR), moving-average (MA), or autoregressive moving-average (ARMA) are used to forecast time series of the selected plant variables. In the third step, for identification the type of transients, the forecasted time series are fed to the modular identifier which has been developed using the latest advances of EBP learning algorithm. Bushehr nuclear power plant (BNPP) transients are probed to analyze the ability of the proposed identifier. Recognition of transient is based on similarity of its statistical properties to the reference one, rather than the values of input patterns. More robustness against noisy data and improvement balance between memorization and generalization are salient advantages of the proposed identifier. Reduction of false identification, sole dependency of identification on the sign of each output signal, selection of the plant variables for transients training independent of each other, and extendibility for identification of more transients without unfavorable effects are other merits of the proposed identifier.

  3. Stochastic model search with binary outcomes for genome-wide association studies.

    PubMed

    Russu, Alberto; Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo

    2012-06-01

    The spread of case-control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model.

  4. Variability in size-selective mortality obscures the importance of larval traits to recruitment success in a temperate marine fish.

    PubMed

    Murphy, Hannah M; Warren-Myers, Fletcher W; Jenkins, Gregory P; Hamer, Paul A; Swearer, Stephen E

    2014-08-01

    In fishes, the growth-mortality hypothesis has received broad acceptance as a driver of recruitment variability. Recruitment is likely to be lower in years when the risk of starvation and predation in the larval stage is greater, leading to higher mortality. Juvenile snapper, Pagrus auratus (Sparidae), experience high recruitment variation in Port Phillip Bay, Australia. Using a 5-year (2005, 2007, 2008, 2010, 2011) data set of larval and juvenile snapper abundances and their daily growth histories, based on otolith microstructure, we found selective mortality acted on larval size at 5 days post-hatch in 4 low and average recruitment years. The highest recruitment year (2005) was characterised by no size-selective mortality. Larval growth of the initial larval population was related to recruitment, but larval growth of the juveniles was not. Selective mortality may have obscured the relationship between larval traits of the juveniles and recruitment as fast-growing and large larvae preferentially survived in lower recruitment years and fast growth was ubiquitous in high recruitment years. An index of daily mortality within and among 3 years (2007, 2008, 2010), where zooplankton were concurrently sampled with ichthyoplankton, was related to per capita availability of preferred larval prey, providing support for the match-mismatch hypothesis. In 2010, periods of low daily mortality resulted in no selective mortality. Thus both intra- and inter-annual variability in the magnitude and occurrence of selective mortality in species with complex life cycles can obscure relationships between larval traits and population replenishment, leading to underestimation of their importance in recruitment studies.

  5. Wetland selection by breeding and foraging black terns in the Prairie Pothole Region of the United States

    USGS Publications Warehouse

    Steen, Valerie A.; Powell, Abby N.

    2012-01-01

    We examined wetland selection by the Black Tern (Chlidonias niger), a species that breeds primarily in the prairie pothole region, has experienced population declines, and is difficult to manage because of low site fidelity. To characterize its selection of wetlands in this region, we surveyed 589 wetlands throughout North and South Dakota. We documented breeding at 5% and foraging at 17% of wetlands. We created predictive habitat models with a machine-learning algorithm, Random Forests, to explore the relative role of local wetland characteristics and those of the surrounding landscape and to evaluate which characteristics were important to predicting breeding versus foraging. We also examined area-dependent wetland selection while addressing the passive sampling bias by replacing occurrence of terns in the models with an index of density. Local wetland variables were more important than landscape variables in predictions of occurrence of breeding and foraging. Wetland size was more important to prediction of foraging than of breeding locations, while floating matted vegetation was more important to prediction of breeding than of foraging locations. The amount of seasonal wetland in the landscape was the only landscape variable important to prediction of both foraging and breeding. Models based on a density index indicated that wetland selection by foraging terns may be more area dependent than that by breeding terns. Our study provides some of the first evidence for differential breeding and foraging wetland selection by Black Terns and for a more limited role of landscape effects and area sensitivity than has been previously shown.

  6. Framework for making better predictions by directly estimating variables’ predictivity

    PubMed Central

    Chernoff, Herman; Lo, Shaw-Hwa

    2016-01-01

    We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the I-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the I-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the I-score on real data to demonstrate the statistic’s predictive performance on sample data. We conjecture that using the partition retention and I-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired. PMID:27911830

  7. An analysis of the relationship of seven selected variables to State Board Test Pool Examination performance of the University of Tennessee, Knoxville, College of Nursing.

    PubMed

    Sharp, T G

    1984-02-01

    The study was designed to determine whether any one of seven selected variables or a combination of the variables is predictive of performance on the State Board Test Pool Examination. The selected variables studied were: high school grade point average (HSGPA), The University of Tennessee, Knoxville, College of Nursing grade point average (GPA), and American College Test Assessment (ACT) standard scores (English, ENG; mathematics, MA; social studies, SS; natural sciences, NSC; composite, COMP). Data utilized were from graduates of the baccalaureate program of The University of Tennessee, Knoxville, College of Nursing from 1974 through 1979. The sample of 322 was selected from a total population of 572. The Statistical Analysis System (SAS) was designed to accomplish analysis of the predictive relationship of each of the seven selected variables to State Board Test Pool Examination performance (result of pass or fail), a stepwise discriminant analysis was designed for determining the predictive relationship of the strongest combination of the independent variables to overall State Board Test Pool Examination performance (result of pass or fail), and stepwise multiple regression analysis was designed to determine the strongest predictive combination of selected variables for each of the five subexams of the State Board Test Pool Examination. The selected variables were each found to be predictive of SBTPE performance (result of pass or fail). The strongest combination for predicting SBTPE performance (result of pass or fail) was found to be GPA, MA, and NSC.

  8. Space-Time Joint Interference Cancellation Using Fuzzy-Inference-Based Adaptive Filtering Techniques in Frequency-Selective Multipath Channels

    NASA Astrophysics Data System (ADS)

    Hu, Chia-Chang; Lin, Hsuan-Yu; Chen, Yu-Fan; Wen, Jyh-Horng

    2006-12-01

    An adaptive minimum mean-square error (MMSE) array receiver based on the fuzzy-logic recursive least-squares (RLS) algorithm is developed for asynchronous DS-CDMA interference suppression in the presence of frequency-selective multipath fading. This receiver employs a fuzzy-logic control mechanism to perform the nonlinear mapping of the squared error and squared error variation, denoted by ([InlineEquation not available: see fulltext.],[InlineEquation not available: see fulltext.]), into a forgetting factor[InlineEquation not available: see fulltext.]. For the real-time applicability, a computationally efficient version of the proposed receiver is derived based on the least-mean-square (LMS) algorithm using the fuzzy-inference-controlled step-size[InlineEquation not available: see fulltext.]. This receiver is capable of providing both fast convergence/tracking capability as well as small steady-state misadjustment as compared with conventional LMS- and RLS-based MMSE DS-CDMA receivers. Simulations show that the fuzzy-logic LMS and RLS algorithms outperform, respectively, other variable step-size LMS (VSS-LMS) and variable forgetting factor RLS (VFF-RLS) algorithms at least 3 dB and 1.5 dB in bit-error-rate (BER) for multipath fading channels.

  9. [Cephalometric analysis in cases with Class III malocclusions].

    PubMed

    Rak, D

    1989-01-01

    Various orthodontic class III anomalies, classified into several experimental groups, and eugnathic occlusions serving as controls were studied by roentgencephalometry. The objective of the study was to detect possible distinctions in the quantitative values of two variables chosen and to select the variables which most significantly discriminate the group of class III orthodontic anomalies. Attempts were made to ascertain whether or not there were sex-related differences. The teleroentgenograms of 269 examines, aged 10-18 years, of both sexes were analyzed. The experimental group consisted of 89 examinees class III orthodontic anomalies. The control group consisted of 180 examines with eugnathic occlusion. Latero-lateral skull roentgenograms were taken observing the rules of roentgenocephalometry. Using acetate paper, the drawings of profile teleroentgenograms were elaborated and the reference points and lines were entered. A total of 38 variables were analyzed, of which there were 10 linear, 19 angular, and 8 variables were obtained by mathematical calculation; the age variable was also analyzed. In statistical analyses an electronic computer was used. The results are presented in tables and graphs. The results obtained showed that: --compared to the findings in the control group, the subjects in the experimental group displayed significant changes in the following craniofacial characteristics a negative difference in the position of the apical base of the jaw, manifest concavity of the osseous profile and diminished convexity of the profile of soft parts, retroinclination of the lower incisors, mandibular prognathism, increased mandibular angle and increased mandibular proportion compared to maxillary and the anterior cranial base; --with regard to the sex of the examinees, only four linear variables of significantly discriminating character were selected, so that in can be concluded that there were no significant sex differences among the morphological characteristics of the viscerocranium.

  10. Estimation of selection intensity under overdominance by Bayesian methods.

    PubMed

    Buzbas, Erkan Ozge; Joyce, Paul; Abdo, Zaid

    2009-01-01

    A balanced pattern in the allele frequencies of polymorphic loci is a potential sign of selection, particularly of overdominance. Although this type of selection is of some interest in population genetics, there exists no likelihood based approaches specifically tailored to make inference on selection intensity. To fill this gap, we present Bayesian methods to estimate selection intensity under k-allele models with overdominance. Our model allows for an arbitrary number of loci and alleles within a locus. The neutral and selected variability within each locus are modeled with corresponding k-allele models. To estimate the posterior distribution of the mean selection intensity in a multilocus region, a hierarchical setup between loci is used. The methods are demonstrated with data at the Human Leukocyte Antigen loci from world-wide populations.

  11. A non-linear data mining parameter selection algorithm for continuous variables

    PubMed Central

    Razavi, Marianne; Brady, Sean

    2017-01-01

    In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables. PMID:29131829

  12. Datalist: A Value Added Service to Enable Easy Data Selection

    NASA Technical Reports Server (NTRS)

    Li, Angela; Hegde, Mahabaleshwa; Bryant, Keith; Seiler, Edward; Shie, Chung-Lin; Teng, William; Liu, Zhong; Hearty, Thomas; Shen, Suhung; Kempler, Steven; hide

    2016-01-01

    Imagine a user wanting to study hurricane events. This could involve searching and downloading multiple data variables from multiple data sets. The currently available services from the Goddard Earth Sciences Data and Information Services Center (GES DISC) only allow the user to select one data set at a time. The GES DISC started a Data List initiative, in order to enable users to easily select multiple data variables. A Data List is a collection of predefined or user-defined data variables from one or more archived data sets. Target users of Data Lists include science teams, individual science researchers, application users, and educational users. Data Lists are more than just data. Data Lists effectively provide users with a sophisticated integrated data and services package, including metadata, citation, documentation, visualization, and data-specific services, all available from one-stop shopping. Data Lists are created based on the software architecture of the GES DISC Unified User Interface (UUI). The Data List service is completely data-driven, and a Data List is treated just as any other data set. The predefined Data Lists, created by the experienced GES DISC science support team, should save a significant amount of time that users would otherwise have to spend.

  13. Investigating the Effect of Recruitment Variability on Length-Based Recruitment Indices for Antarctic Krill Using an Individual-Based Population Dynamics Model

    PubMed Central

    Thanassekos, Stéphane; Cox, Martin J.; Reid, Keith

    2014-01-01

    Antarctic krill (Euphausia superba; herein krill) is monitored as part of an on-going fisheries observer program that collects length-frequency data. A krill feedback management programme is currently being developed, and as part of this development, the utility of data-derived indices describing population level processes is being assessed. To date, however, little work has been carried out on the selection of optimum recruitment indices and it has not been possible to assess the performance of length-based recruitment indices across a range of recruitment variability. Neither has there been an assessment of uncertainty in the relationship between an index and the actual level of recruitment. Thus, until now, it has not been possible to take into account recruitment index uncertainty in krill stock management or when investigating relationships between recruitment and environmental drivers. Using length-frequency samples from a simulated population – where recruitment is known – the performance of six potential length-based recruitment indices is assessed, by exploring the index-to-recruitment relationship under increasing levels of recruitment variability (from ±10% to ±100% around a mean annual recruitment). The annual minimum of the proportion of individuals smaller than 40 mm (F40 min, %) was selected because it had the most robust index-to-recruitment relationship across differing levels of recruitment variability. The relationship was curvilinear and best described by a power law. Model uncertainty was described using the 95% prediction intervals, which were used to calculate coverage probabilities and assess model performance. Despite being the optimum recruitment index, the performance of F40 min degraded under high (>50%) recruitment variability. Due to the persistence of cohorts in the population over several years, the inclusion of F40 min values from preceding years in the relationship used to estimate recruitment in a given year improved its accuracy (mean bias reduction of 8.3% when including three F40 min values under a recruitment variability of 60%). PMID:25470296

  14. Strategies for soil-based precision agriculture in cotton

    NASA Astrophysics Data System (ADS)

    Neely, Haly L.; Morgan, Cristine L. S.; Stanislav, Scott; Rouze, Gregory; Shi, Yeyin; Thomasson, J. Alex; Valasek, John; Olsenholler, Jeff

    2016-05-01

    The goal of precision agriculture is to increase crop yield while maximizing the use efficiency of farm resources. In this application, UAV-based systems are presenting agricultural researchers with an opportunity to study crop response to environmental and management factors in real-time without disturbing the crop. The spatial variability soil properties, which drive crop yield and quality, cannot be changed and thus keen agronomic choices with soil variability in mind have the potential to increase profits. Additionally, measuring crop stress over time and in response to management and environmental conditions may enable agronomists and plant breeders to make more informed decisions about variety selection than the traditional end-of-season yield and quality measurements. In a previous study, seed-cotton yield was measured over 4 years and compared with soil variability as mapped by a proximal soil sensor. It was found that soil properties had a significant effect on seed-cotton yield and the effect was not consistent across years due to different precipitation conditions. However, when seed-cotton yield was compared to the normalized difference vegetation index (NDVI), as measured using a multispectral camera from a UAV, predictions improved. Further improvement was seen when soil-only pixels were removed from the analysis. On-going studies are using UAV-based data to uncover the thresholds for stress and yield potential. Long-term goals of this research include detecting stress before yield is reduced and selecting better adapted varieties.

  15. Photoswitchable carbohydrate-based fluorosurfactants as tuneable ice recrystallization inhibitors.

    PubMed

    Adam, Madeleine K; Hu, Yingxue; Poisson, Jessica S; Pottage, Matthew J; Ben, Robert N; Wilkinson, Brendan L

    2017-02-01

    Cryopreservation is an important technique employed for the storage and preservation of biological tissues and cells. The limited effectiveness and significant toxicity of conventionally-used cryoprotectants, such as DMSO, have prompted efforts toward the rational design of less toxic alternatives, including carbohydrate-based surfactants. In this paper, we report the modular synthesis and ice recrystallization inhibition (IRI) activity of a library of variably substituted, carbohydrate-based fluorosurfactants. Carbohydrate-based fluorosurfactants possessed a variable mono- or disaccharide head group appended to a hydrophobic fluoroalkyl-substituted azobenzene tail group. Light-addressable fluorosurfactants displayed weak-to-moderate IRI activity that could be tuned through selection of carbohydrate head group, position of the trifluoroalkyl group on the azobenzene ring, and isomeric state of the azobenzene tail fragment. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. High degree of genetic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus).

    PubMed

    Defaveri, Jacquelin; Shikano, Takahito; Shimada, Yukinori; Merilä, Juha

    2013-09-01

    Populations of widespread marine organisms are typically characterized by a low degree of genetic differentiation in neutral genetic markers, but much less is known about differentiation in genes whose functional roles are associated with specific selection regimes. To uncover possible adaptive population divergence and heterogeneous genomic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus), we used a candidate gene-based genome-scan approach to analyse variability in 138 microsatellite loci located within/close to (<6 kb) functionally important genes in samples collected from ten geographic locations. The degree of genetic differentiation in markers classified as neutral or under balancing selection-as determined with several outlier detection methods-was low (F(ST) = 0.033 or 0.011, respectively), whereas average FST for directionally selected markers was significantly higher (F(ST) = 0.097). Clustering analyses provided support for genomic and geographic heterogeneity in selection: six genetic clusters were identified based on allele frequency differences in the directionally selected loci, whereas four were identified with the neutral loci. Allelic variation in several loci exhibited significant associations with environmental variables, supporting the conjecture that temperature and salinity, but not optic conditions, are important drivers of adaptive divergence among populations. In general, these results suggest that in spite of the high degree of physical connectivity and gene flow as inferred from neutral marker genes, marine stickleback populations are strongly genetically structured in loci associated with functionally relevant genes. © 2013 John Wiley & Sons Ltd.

  17. Mendelian randomization with fine-mapped genetic data: Choosing from large numbers of correlated instrumental variables.

    PubMed

    Burgess, Stephen; Zuber, Verena; Valdes-Marquez, Elsa; Sun, Benjamin B; Hopewell, Jemma C

    2017-12-01

    Mendelian randomization uses genetic variants to make causal inferences about the effect of a risk factor on an outcome. With fine-mapped genetic data, there may be hundreds of genetic variants in a single gene region any of which could be used to assess this causal relationship. However, using too many genetic variants in the analysis can lead to spurious estimates and inflated Type 1 error rates. But if only a few genetic variants are used, then the majority of the data is ignored and estimates are highly sensitive to the particular choice of variants. We propose an approach based on summarized data only (genetic association and correlation estimates) that uses principal components analysis to form instruments. This approach has desirable theoretical properties: it takes the totality of data into account and does not suffer from numerical instabilities. It also has good properties in simulation studies: it is not particularly sensitive to varying the genetic variants included in the analysis or the genetic correlation matrix, and it does not have greatly inflated Type 1 error rates. Overall, the method gives estimates that are less precise than those from variable selection approaches (such as using a conditional analysis or pruning approach to select variants), but are more robust to seemingly arbitrary choices in the variable selection step. Methods are illustrated by an example using genetic associations with testosterone for 320 genetic variants to assess the effect of sex hormone related pathways on coronary artery disease risk, in which variable selection approaches give inconsistent inferences. © 2017 The Authors Genetic Epidemiology Published by Wiley Periodicals, Inc.

  18. [Study on correction of data bias caused by different missing mechanisms in survey of medical expenditure among students enrolling in Urban Resident Basic Medical Insurance].

    PubMed

    Zhang, Haixia; Zhao, Junkang; Gu, Caijiao; Cui, Yan; Rong, Huiying; Meng, Fanlong; Wang, Tong

    2015-05-01

    The study of the medical expenditure and its influencing factors among the students enrolling in Urban Resident Basic Medical Insurance (URBMI) in Taiyuan indicated that non response bias and selection bias coexist in dependent variable of the survey data. Unlike previous studies only focused on one missing mechanism, a two-stage method to deal with two missing mechanisms simultaneously was suggested in this study, combining multiple imputation with sample selection model. A total of 1 190 questionnaires were returned by the students (or their parents) selected in child care settings, schools and universities in Taiyuan by stratified cluster random sampling in 2012. In the returned questionnaires, 2.52% existed not missing at random (NMAR) of dependent variable and 7.14% existed missing at random (MAR) of dependent variable. First, multiple imputation was conducted for MAR by using completed data, then sample selection model was used to correct NMAR in multiple imputation, and a multi influencing factor analysis model was established. Based on 1 000 times resampling, the best scheme of filling the random missing values is the predictive mean matching (PMM) method under the missing proportion. With this optimal scheme, a two stage survey was conducted. Finally, it was found that the influencing factors on annual medical expenditure among the students enrolling in URBMI in Taiyuan included population group, annual household gross income, affordability of medical insurance expenditure, chronic disease, seeking medical care in hospital, seeking medical care in community health center or private clinic, hospitalization, hospitalization canceled due to certain reason, self medication and acceptable proportion of self-paid medical expenditure. The two-stage method combining multiple imputation with sample selection model can deal with non response bias and selection bias effectively in dependent variable of the survey data.

  19. Epithelial–mesenchymal transition biomarkers and support vector machine guided model in preoperatively predicting regional lymph node metastasis for rectal cancer

    PubMed Central

    Fan, X-J; Wan, X-B; Huang, Y; Cai, H-M; Fu, X-H; Yang, Z-L; Chen, D-K; Song, S-X; Wu, P-H; Liu, Q; Wang, L; Wang, J-P

    2012-01-01

    Background: Current imaging modalities are inadequate in preoperatively predicting regional lymph node metastasis (RLNM) status in rectal cancer (RC). Here, we designed support vector machine (SVM) model to address this issue by integrating epithelial–mesenchymal-transition (EMT)-related biomarkers along with clinicopathological variables. Methods: Using tissue microarrays and immunohistochemistry, the EMT-related biomarkers expression was measured in 193 RC patients. Of which, 74 patients were assigned to the training set to select the robust variables for designing SVM model. The SVM model predictive value was validated in the testing set (119 patients). Results: In training set, eight variables, including six EMT-related biomarkers and two clinicopathological variables, were selected to devise SVM model. In testing set, we identified 63 patients with high risk to RLNM and 56 patients with low risk. The sensitivity, specificity and overall accuracy of SVM in predicting RLNM were 68.3%, 81.1% and 72.3%, respectively. Importantly, multivariate logistic regression analysis showed that SVM model was indeed an independent predictor of RLNM status (odds ratio, 11.536; 95% confidence interval, 4.113–32.361; P<0.0001). Conclusion: Our SVM-based model displayed moderately strong predictive power in defining the RLNM status in RC patients, providing an important approach to select RLNM high-risk subgroup for neoadjuvant chemoradiotherapy. PMID:22538975

  20. Various Approaches for Targeting Quasar Candidates

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Zhao, Y.

    2015-09-01

    With the establishment and development of space-based and ground-based observational facilities, the improvement of scientific output of high-cost facilities is still a hot issue for astronomers. The discovery of new and rare quasars attracts much attention. Different methods to select quasar candidates are in bloom. Among them, some are based on color cuts, some are from multiwavelength data, some rely on variability of quasars, some are based on data mining, and some depend on ensemble methods.

  1. Comparison of two methods for estimating base flow in selected reaches of the South Platte River, Colorado

    USGS Publications Warehouse

    Capesius, Joseph P.; Arnold, L. Rick

    2012-01-01

    The Mass Balance results were quite variable over time such that they appeared suspect with respect to the concept of groundwater flow as being gradual and slow. The large degree of variability in the day-to-day and month-to-month Mass Balance results is likely the result of many factors. These factors could include ungaged stream inflows or outflows, short-term streamflow losses to and gains from temporary bank storage, and any lag in streamflow accounting owing to streamflow lag time of flow within a reach. The Pilot Point time series results were much less variable than the Mass Balance results and extreme values were effectively constrained. Less day-to-day variability, smaller magnitude extreme values, and smoother transitions in base-flow estimates provided by the Pilot Point method are more consistent with a conceptual model of groundwater flow being gradual and slow. The Pilot Point method provided a better fit to the conceptual model of groundwater flow and appeared to provide reasonable estimates of base flow.

  2. Variable-intercept panel model for deformation zoning of a super-high arch dam.

    PubMed

    Shi, Zhongwen; Gu, Chongshi; Qin, Dong

    2016-01-01

    This study determines dam deformation similarity indexes based on an analysis of deformation zoning features and panel data clustering theory, with comprehensive consideration to the actual deformation law of super-high arch dams and the spatial-temporal features of dam deformation. Measurement methods of these indexes are studied. Based on the established deformation similarity criteria, the principle used to determine the number of dam deformation zones is constructed through entropy weight method. This study proposes the deformation zoning method for super-high arch dams and the implementation steps, analyzes the effect of special influencing factors of different dam zones on the deformation, introduces dummy variables that represent the special effect of dam deformation, and establishes a variable-intercept panel model for deformation zoning of super-high arch dams. Based on different patterns of the special effect in the variable-intercept panel model, two panel analysis models were established to monitor fixed and random effects of dam deformation. Hausman test method of model selection and model effectiveness assessment method are discussed. Finally, the effectiveness of established models is verified through a case study.

  3. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    PubMed

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Comparisons of Selected Student and Teacher Variables in All-Girls and Coeducational Physical Education Environments

    ERIC Educational Resources Information Center

    Derry, Julie A.; Phillips, D. Allen

    2004-01-01

    The purpose of this study was to investigate selected student and teacher variables and compare the differences between these variables for female students and female teachers in coeducation and single-sex physical education classes. Eighteen female teachers and intact classes were selected; 9 teachers from coeducation and 9 teachers from…

  5. Using dynamic population simulations to extend resource selection analyses and prioritize habitats for conservation

    USGS Publications Warehouse

    Heinrichs, Julie; Aldridge, Cameron L.; O'Donnell, Michael; Schumaker, Nathan

    2017-01-01

    Prioritizing habitats for conservation is a challenging task, particularly for species with fluctuating populations and seasonally dynamic habitat needs. Although the use of resource selection models to identify and prioritize habitat for conservation is increasingly common, their ability to characterize important long-term habitats for dynamic populations are variable. To examine how habitats might be prioritized differently if resource selection was directly and dynamically linked with population fluctuations and movement limitations among seasonal habitats, we constructed a spatially explicit individual-based model for a dramatically fluctuating population requiring temporally varying resources. Using greater sage-grouse (Centrocercus urophasianus) in Wyoming as a case study, we used resource selection function maps to guide seasonal movement and habitat selection, but emergent population dynamics and simulated movement limitations modified long-term habitat occupancy. We compared priority habitats in RSF maps to long-term simulated habitat use. We examined the circumstances under which the explicit consideration of movement limitations, in combination with population fluctuations and trends, are likely to alter predictions of important habitats. In doing so, we assessed the future occupancy of protected areas under alternative population and habitat conditions. Habitat prioritizations based on resource selection models alone predicted high use in isolated parcels of habitat and in areas with low connectivity among seasonal habitats. In contrast, results based on more biologically-informed simulations emphasized central and connected areas near high-density populations, sometimes predicted to be low selection value. Dynamic models of habitat use can provide additional biological realism that can extend, and in some cases, contradict habitat use predictions generated from short-term or static resource selection analyses. The explicit inclusion of population dynamics and movement propensities via spatial simulation modeling frameworks may provide an informative means of predicting long-term habitat use, particularly for fluctuating populations with complex seasonal habitat needs. Importantly, our results indicate the possible need to consider habitat selection models as a starting point rather than the common end point for refining and prioritizing habitats for protection for cyclic and highly variable populations.

  6. Petroleomics by electrospray ionization FT-ICR mass spectrometry coupled to partial least squares with variable selection methods: prediction of the total acid number of crude oils.

    PubMed

    Terra, Luciana A; Filgueiras, Paulo R; Tose, Lílian V; Romão, Wanderson; de Souza, Douglas D; de Castro, Eustáquio V R; de Oliveira, Mirela S L; Dias, Júlio C M; Poppi, Ronei J

    2014-10-07

    Negative-ion mode electrospray ionization, ESI(-), with Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was coupled to a Partial Least Squares (PLS) regression and variable selection methods to estimate the total acid number (TAN) of Brazilian crude oil samples. Generally, ESI(-)-FT-ICR mass spectra present a power of resolution of ca. 500,000 and a mass accuracy less than 1 ppm, producing a data matrix containing over 5700 variables per sample. These variables correspond to heteroatom-containing species detected as deprotonated molecules, [M - H](-) ions, which are identified primarily as naphthenic acids, phenols and carbazole analog species. The TAN values for all samples ranged from 0.06 to 3.61 mg of KOH g(-1). To facilitate the spectral interpretation, three methods of variable selection were studied: variable importance in the projection (VIP), interval partial least squares (iPLS) and elimination of uninformative variables (UVE). The UVE method seems to be more appropriate for selecting important variables, reducing the dimension of the variables to 183 and producing a root mean square error of prediction of 0.32 mg of KOH g(-1). By reducing the size of the data, it was possible to relate the selected variables with their corresponding molecular formulas, thus identifying the main chemical species responsible for the TAN values.

  7. Active Traffic Management: Comprehension, Legibility, Distance, and Motorist Behavior In Response to Selected Variable Speed Limit and Lane Control Signing

    DOT National Transportation Integrated Search

    2016-06-01

    Active traffic management (ATM) incorporates a collection of strategies allowing the dynamic management of recurrent and nonrecurrent congestion based on prevailing traffic conditions. These strategies help to increase peak capacity, smooth traffic f...

  8. A Selective Overview of Variable Selection in High Dimensional Feature Space

    PubMed Central

    Fan, Jianqing

    2010-01-01

    High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976

  9. Escherichia coli bacteria density in relation to turbidity, streamflow characteristics, and season in the Chattahoochee River near Atlanta, Georgia, October 2000 through September 2008—Description, statistical analysis, and predictive modeling

    USGS Publications Warehouse

    Lawrence, Stephen J.

    2012-01-01

    Regression analyses show that E. coli density in samples was strongly related to turbidity, streamflow characteristics, and season at both sites. The regression equation chosen for the Norcross data showed that 78 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), streamflow event (dry-weather flow or stormflow), season (cool or warm), and an interaction term that is the cross product of streamflow event and turbidity. The regression equation chosen for the Atlanta data showed that 76 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), water temperature, streamflow event, and an interaction term that is the cross product of streamflow event and turbidity. Residual analysis and model confirmation using new data indicated the regression equations selected at both sites predicted E. coli density within the 90 percent prediction intervals of the equations and could be used to predict E. coli density in real time at both sites.

  10. Single-step fabrication of thin-film linear variable bandpass filters based on metal-insulator-metal geometry.

    PubMed

    Williams, Calum; Rughoobur, Girish; Flewitt, Andrew J; Wilkinson, Timothy D

    2016-11-10

    A single-step fabrication method is presented for ultra-thin, linearly variable optical bandpass filters (LVBFs) based on a metal-insulator-metal arrangement using modified evaporation deposition techniques. This alternate process methodology offers reduced complexity and cost in comparison to conventional techniques for fabricating LVBFs. We are able to achieve linear variation of insulator thickness across a sample, by adjusting the geometrical parameters of a typical physical vapor deposition process. We demonstrate LVBFs with spectral selectivity from 400 to 850 nm based on Ag (25 nm) and MgF2 (75-250 nm). Maximum spectral transmittance is measured at ∼70% with a Q-factor of ∼20.

  11. Model selection with multiple regression on distance matrices leads to incorrect inferences.

    PubMed

    Franckowiak, Ryan P; Panasci, Michael; Jarvis, Karl J; Acuña-Rodriguez, Ian S; Landguth, Erin L; Fortin, Marie-Josée; Wagner, Helene H

    2017-01-01

    In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.

  12. Explaining the positive relationship between fourth-grade children's body mass index and energy intake at school-provided meals (breakfast and lunch).

    PubMed

    Guinn, Caroline H; Baxter, Suzanne D; Royer, Julie A; Hitchcock, David B

    2013-05-01

    A 2010 publication showed a positive relationship between children's body mass index (BMI) and energy intake at school-provided meals (as assessed by direct meal observations). To help explain that relationship, we investigated 7 outcome variables concerning aspects of school-provided meals: energy content of items selected, number of meal components selected, number of meal components eaten, amounts eaten of standardized school-meal portions, energy intake from flavored milk, energy intake received in trades, and energy content given in trades. Fourth-grade children (N = 465) from Columbia, SC, were observed eating school-provided breakfast and lunch on 1 to 4 days per child. Researchers measured children's weight and height. For daily values at school meals, a generalized linear model was fit with BMI (dependent variable) and the 7 outcome variables, sex, and age (independent variables). BMI was positively related to amounts eaten of standardized school-meal portions (p < .0001) and increased 8.45 kg/m(2) per serving, controlling for other variables in the model. BMI was positively related to energy intake from flavored milk (p = .0041) and increased 0.347 kg/m(2) for every 100 kcal consumed. BMI was negatively related to energy intake received in trades (p = .0003) and decreased 0.468 kg/m(2) for every 100 kcal received. BMI was not significantly related to 4 outcome variables. Knowing that relationships between BMI and actual consumption, not selection, at school-provided meals explained the (previously found) positive relationship between BMI and energy intake at school-provided meals is helpful for school-based obesity interventions. © 2013, American School Health Association.

  13. BATEMANATER: a computer program to estimate and bootstrap mating system variables based on Bateman's principles.

    PubMed

    Jones, Adam G

    2015-11-01

    Bateman's principles continue to play a major role in the characterization of genetic mating systems in natural populations. The modern manifestations of Bateman's ideas include the opportunity for sexual selection (i.e. I(s) - the variance in relative mating success), the opportunity for selection (i.e. I - the variance in relative reproductive success) and the Bateman gradient (i.e. β(ss) - the slope of the least-squares regression of reproductive success on mating success). These variables serve as the foundation for one convenient approach for the quantification of mating systems. However, their estimation presents at least two challenges, which I address here with a new Windows-based computer software package called BATEMANATER. The first challenge is that confidence intervals for these variables are not easy to calculate. BATEMANATER solves this problem using a bootstrapping approach. The second, more serious, problem is that direct estimates of mating system variables from open populations will typically be biased if some potential progeny or adults are missing from the analysed sample. BATEMANATER addresses this problem using a maximum-likelihood approach to estimate mating system variables from incompletely sampled breeding populations. The current version of BATEMANATER addresses the problem for systems in which progeny can be collected in groups of half- or full-siblings, as would occur when eggs are laid in discrete masses or offspring occur in pregnant females. BATEMANATER has a user-friendly graphical interface and thus represents a new, convenient tool for the characterization and comparison of genetic mating systems. © 2015 John Wiley & Sons Ltd.

  14. Test Population Selection from Weibull-Based, Monte Carlo Simulations of Fatigue Life

    NASA Technical Reports Server (NTRS)

    Vlcek, Brian L.; Zaretsky, Erwin V.; Hendricks, Robert C.

    2008-01-01

    Fatigue life is probabilistic and not deterministic. Experimentally establishing the fatigue life of materials, components, and systems is both time consuming and costly. As a result, conclusions regarding fatigue life are often inferred from a statistically insufficient number of physical tests. A proposed methodology for comparing life results as a function of variability due to Weibull parameters, variability between successive trials, and variability due to size of the experimental population is presented. Using Monte Carlo simulation of randomly selected lives from a large Weibull distribution, the variation in the L10 fatigue life of aluminum alloy AL6061 rotating rod fatigue tests was determined as a function of population size. These results were compared to the L10 fatigue lives of small (10 each) populations from AL2024, AL7075 and AL6061. For aluminum alloy AL6061, a simple algebraic relationship was established for the upper and lower L10 fatigue life limits as a function of the number of specimens failed. For most engineering applications where less than 30 percent variability can be tolerated in the maximum and minimum values, at least 30 to 35 test samples are necessary. The variability of test results based on small sample sizes can be greater than actual differences, if any, that exists between materials and can result in erroneous conclusions. The fatigue life of AL2024 is statistically longer than AL6061 and AL7075. However, there is no statistical difference between the fatigue lives of AL6061 and AL7075 even though AL7075 had a fatigue life 30 percent greater than AL6061.

  15. Test Population Selection from Weibull-Based, Monte Carlo Simulations of Fatigue Life

    NASA Technical Reports Server (NTRS)

    Vlcek, Brian L.; Zaretsky, Erwin V.; Hendricks, Robert C.

    2012-01-01

    Fatigue life is probabilistic and not deterministic. Experimentally establishing the fatigue life of materials, components, and systems is both time consuming and costly. As a result, conclusions regarding fatigue life are often inferred from a statistically insufficient number of physical tests. A proposed methodology for comparing life results as a function of variability due to Weibull parameters, variability between successive trials, and variability due to size of the experimental population is presented. Using Monte Carlo simulation of randomly selected lives from a large Weibull distribution, the variation in the L10 fatigue life of aluminum alloy AL6061 rotating rod fatigue tests was determined as a function of population size. These results were compared to the L10 fatigue lives of small (10 each) populations from AL2024, AL7075 and AL6061. For aluminum alloy AL6061, a simple algebraic relationship was established for the upper and lower L10 fatigue life limits as a function of the number of specimens failed. For most engineering applications where less than 30 percent variability can be tolerated in the maximum and minimum values, at least 30 to 35 test samples are necessary. The variability of test results based on small sample sizes can be greater than actual differences, if any, that exists between materials and can result in erroneous conclusions. The fatigue life of AL2024 is statistically longer than AL6061 and AL7075. However, there is no statistical difference between the fatigue lives of AL6061 and AL7075 even though AL7075 had a fatigue life 30 percent greater than AL6061.

  16. Accelerating rejection-based simulation of biochemical reactions with bounded acceptance probability

    NASA Astrophysics Data System (ADS)

    Thanh, Vo Hong; Priami, Corrado; Zunino, Roberto

    2016-06-01

    Stochastic simulation of large biochemical reaction networks is often computationally expensive due to the disparate reaction rates and high variability of population of chemical species. An approach to accelerate the simulation is to allow multiple reaction firings before performing update by assuming that reaction propensities are changing of a negligible amount during a time interval. Species with small population in the firings of fast reactions significantly affect both performance and accuracy of this simulation approach. It is even worse when these small population species are involved in a large number of reactions. We present in this paper a new approximate algorithm to cope with this problem. It is based on bounding the acceptance probability of a reaction selected by the exact rejection-based simulation algorithm, which employs propensity bounds of reactions and the rejection-based mechanism to select next reaction firings. The reaction is ensured to be selected to fire with an acceptance rate greater than a predefined probability in which the selection becomes exact if the probability is set to one. Our new algorithm improves the computational cost for selecting the next reaction firing and reduces the updating the propensities of reactions.

  17. Accelerating rejection-based simulation of biochemical reactions with bounded acceptance probability

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thanh, Vo Hong, E-mail: vo@cosbi.eu; Priami, Corrado, E-mail: priami@cosbi.eu; Department of Mathematics, University of Trento, Trento

    Stochastic simulation of large biochemical reaction networks is often computationally expensive due to the disparate reaction rates and high variability of population of chemical species. An approach to accelerate the simulation is to allow multiple reaction firings before performing update by assuming that reaction propensities are changing of a negligible amount during a time interval. Species with small population in the firings of fast reactions significantly affect both performance and accuracy of this simulation approach. It is even worse when these small population species are involved in a large number of reactions. We present in this paper a new approximatemore » algorithm to cope with this problem. It is based on bounding the acceptance probability of a reaction selected by the exact rejection-based simulation algorithm, which employs propensity bounds of reactions and the rejection-based mechanism to select next reaction firings. The reaction is ensured to be selected to fire with an acceptance rate greater than a predefined probability in which the selection becomes exact if the probability is set to one. Our new algorithm improves the computational cost for selecting the next reaction firing and reduces the updating the propensities of reactions.« less

  18. A 24 km fiber-based discretely signaled continuous variable quantum key distribution system.

    PubMed

    Dinh Xuan, Quyen; Zhang, Zheshen; Voss, Paul L

    2009-12-21

    We report a continuous variable key distribution system that achieves a final secure key rate of 3.45 kilobits/s over a distance of 24.2 km of optical fiber. The protocol uses discrete signaling and post-selection to improve reconciliation speed and quantifies security by means of quantum state tomography. Polarization multiplexing and a frequency translation scheme permit transmission of a continuous wave local oscillator and suppression of noise from guided acoustic wave Brillouin scattering by more than 27 dB.

  19. Simulation of streamflows and basin-wide hydrologic variables over several climate-change scenarios, Methow River basin, Washington

    USGS Publications Warehouse

    Voss, Frank D.; Mastin, Mark C.

    2012-01-01

    A database was developed to automate model execution and to provide users with Internet access to voluminous data products ranging from summary figures to model output timeseries. Database-enabled Internet tools were developed to allow users to create interactive graphs of output results based on their analysis needs. For example, users were able to create graphs by selecting time intervals, greenhouse gas emission scenarios, general circulation models, and specific hydrologic variables.

  20. Identification of spectral regions for the quantification of red wine tannins with fourier transform mid-infrared spectroscopy.

    PubMed

    Jensen, Jacob S; Egebo, Max; Meyer, Anne S

    2008-05-28

    Accomplishment of fast tannin measurements is receiving increased interest as tannins are important for the mouthfeel and color properties of red wines. Fourier transform mid-infrared spectroscopy allows fast measurement of different wine components, but quantification of tannins is difficult due to interferences from spectral responses of other wine components. Four different variable selection tools were investigated for the identification of the most important spectral regions which would allow quantification of tannins from the spectra using partial least-squares regression. The study included the development of a new variable selection tool, iterative backward elimination of changeable size intervals PLS. The spectral regions identified by the different variable selection methods were not identical, but all included two regions (1485-1425 and 1060-995 cm(-1)), which therefore were concluded to be particularly important for tannin quantification. The spectral regions identified from the variable selection methods were used to develop calibration models. All four variable selection methods identified regions that allowed an improved quantitative prediction of tannins (RMSEP = 69-79 mg of CE/L; r = 0.93-0.94) as compared to a calibration model developed using all variables (RMSEP = 115 mg of CE/L; r = 0.87). Only minor differences in the performance of the variable selection methods were observed.

  1. Rejecting salient distractors: Generalization from experience.

    PubMed

    Vatterott, Daniel B; Mozer, Michael C; Vecera, Shaun P

    2018-02-01

    Distraction impairs performance of many important, everyday tasks. Attentional control limits distraction by preferentially selecting important items for limited-capacity cognitive operations. Research in attentional control has typically investigated the degree to which selection of items is stimulus-driven versus goal-driven. Recent work finds that when observers initially learn a task, the selection is based on stimulus-driven factors, but through experience, goal-driven factors have an increasing influence. The modulation of selection by goals has been studied within the paradigm of learned distractor rejection, in which experience over a sequence of trials enables individuals eventually to ignore a perceptually salient distractor. The experiments presented examine whether observers can generalize learned distractor rejection to novel distractors. Observers searched for a target and ignored a salient color-singleton distractor that appeared in half of the trials. In Experiment 1, observers who learned distractor rejection in a variable environment rejected a novel distractor more effectively than observers who learned distractor rejection in a less variable, homogeneous environment, demonstrating that variable, heterogeneous stimulus environments encourage generalizable learned distractor rejection. Experiments 2 and 3 investigated the time course of learned distractor rejection across the experiment and found that after experiencing four color-singleton distractors in different blocks, observers could effectively reject subsequent novel color-singleton distractors. These results suggest that the optimization of attentional control to the task environment can be interpreted as a form of learning, demonstrating experience's critical role in attentional control.

  2. Relating Solar Resource Variability to Cloud Type

    NASA Astrophysics Data System (ADS)

    Hinkelman, L. M.; Sengupta, M.

    2012-12-01

    Power production from renewable energy (RE) resources is rapidly increasing. Generation of renewable energy is quite variable since the solar and wind resources that form the inputs are, themselves, inherently variable. There is thus a need to understand the impact of renewable generation on the transmission grid. Such studies require estimates of high temporal and spatial resolution power output under various scenarios, which can be created from corresponding solar resource data. Satellite-based solar resource estimates are the best source of long-term solar irradiance data for the typically large areas covered by transmission studies. As satellite-based resource datasets are generally available at lower temporal and spatial resolution than required, there is, in turn, a need to downscale these resource data. Downscaling in both space and time requires information about solar irradiance variability, which is primarily a function of cloud types and properties. In this study, we analyze the relationship between solar resource variability and satellite-based cloud properties. One-minute resolution surface irradiance data were obtained from a number of stations operated by the National Oceanic and Atmospheric Administration (NOAA) under the Surface Radiation (SURFRAD) and Integrated Surface Irradiance Study (ISIS) networks as well as from NREL's Solar Radiation Research Laboratory (SRRL) in Golden, Colorado. Individual sites were selected so that a range of meteorological conditions would be represented. Cloud information at a nominal 4 km resolution and half hour intervals was derived from NOAA's Geostationary Operation Environmental Satellite (GOES) series of satellites. Cloud class information from the GOES data set was then used to select and composite irradiance data from the measurement sites. The irradiance variability for each cloud classification was characterized using general statistics of the fluxes themselves and their variability in time, as represented by ramps computed for time scales from 10 s to 0.5 hr. The statistical relationships derived using this method will be presented, comparing and contrasting the statistics computed for the different cloud types. The implications for downscaling irradiances from satellites or forecast models will also be discussed.

  3. Clustering and variable selection in the presence of mixed variable types and missing data.

    PubMed

    Storlie, C B; Myers, S M; Katusic, S K; Weaver, A L; Voigt, R G; Croarkin, P E; Stoeckel, R E; Port, J D

    2018-05-17

    We consider the problem of model-based clustering in the presence of many correlated, mixed continuous, and discrete variables, some of which may have missing values. Discrete variables are treated with a latent continuous variable approach, and the Dirichlet process is used to construct a mixture model with an unknown number of components. Variable selection is also performed to identify the variables that are most influential for determining cluster membership. The work is motivated by the need to cluster patients thought to potentially have autism spectrum disorder on the basis of many cognitive and/or behavioral test scores. There are a modest number of patients (486) in the data set along with many (55) test score variables (many of which are discrete valued and/or missing). The goal of the work is to (1) cluster these patients into similar groups to help identify those with similar clinical presentation and (2) identify a sparse subset of tests that inform the clusters in order to eliminate unnecessary testing. The proposed approach compares very favorably with other methods via simulation of problems of this type. The results of the autism spectrum disorder analysis suggested 3 clusters to be most likely, while only 4 test scores had high (>0.5) posterior probability of being informative. This will result in much more efficient and informative testing. The need to cluster observations on the basis of many correlated, continuous/discrete variables with missing values is a common problem in the health sciences as well as in many other disciplines. Copyright © 2018 John Wiley & Sons, Ltd.

  4. No difference in variability of unique hue selections and binary hue selections.

    PubMed

    Bosten, J M; Lawrance-Owen, A J

    2014-04-01

    If unique hues have special status in phenomenological experience as perceptually pure, it seems reasonable to assume that they are represented more precisely by the visual system than are other colors. Following the method of Malkoc et al. (J. Opt. Soc. Am. A22, 2154 [2005]), we gathered unique and binary hue selections from 50 subjects. For these subjects we repeated the measurements in two separate sessions, allowing us to measure test-retest reliabilities (0.52≤ρ≤0.78; p≪0.01). We quantified the within-individual variability for selections of each hue. Adjusting for the differences in variability intrinsic to different regions of chromaticity space, we compared the within-individual variability for unique hues to that for binary hues. Surprisingly, we found that selections of unique hues did not show consistently lower variability than selections of binary hues. We repeated hue measurements in a single session for an independent sample of 58 subjects, using a different relative scaling of the cardinal axes of MacLeod-Boynton chromaticity space. Again, we found no consistent difference in adjusted within-individual variability for selections of unique and binary hues. Our finding does not depend on the particular scaling chosen for the Y axis of MacLeod-Boynton chromaticity space.

  5. Variable screening via quantile partial correlation

    PubMed Central

    Ma, Shujie; Tsai, Chih-Ling

    2016-01-01

    In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented. PMID:28943683

  6. Plasticity models of material variability based on uncertainty quantification techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jones, Reese E.; Rizzi, Francesco; Boyce, Brad

    The advent of fabrication techniques like additive manufacturing has focused attention on the considerable variability of material response due to defects and other micro-structural aspects. This variability motivates the development of an enhanced design methodology that incorporates inherent material variability to provide robust predictions of performance. In this work, we develop plasticity models capable of representing the distribution of mechanical responses observed in experiments using traditional plasticity models of the mean response and recently developed uncertainty quantification (UQ) techniques. Lastly, we demonstrate that the new method provides predictive realizations that are superior to more traditional ones, and how these UQmore » techniques can be used in model selection and assessing the quality of calibrated physical parameters.« less

  7. On the Accretion Rates of SW Sextantis Nova-like Variables

    NASA Astrophysics Data System (ADS)

    Ballouz, Ronald-Louis; Sion, Edward M.

    2009-06-01

    We present accretion rates for selected samples of nova-like variables having IUE archival spectra and distances uniformly determined using an infrared method by Knigge. A comparison with accretion rates derived independently with a multiparametric optimization modeling approach by Puebla et al. is carried out. The accretion rates of SW Sextantis nova-like systems are compared with the accretion rates of non-SW Sextantis systems in the Puebla et al. sample and in our sample, which was selected in the orbital period range of three to four and a half hours, with all systems having distances using the method of Knigge. Based upon the two independent modeling approaches, we find no significant difference between the accretion rates of SW Sextantis systems and non-SW Sextantis nova-like systems insofar as optically thick disk models are appropriate. We find little evidence to suggest that the SW Sex stars have higher accretion rates than other nova-like cataclysmic variables (CVs) above the period gap within the same range of orbital periods.

  8. Crop weather models of corn and soybeans for Agrophysical Units (APU's) in Iowa using monthly meteorological predictors

    NASA Technical Reports Server (NTRS)

    Leduc, S. (Principal Investigator)

    1982-01-01

    Models based on multiple regression were developed to estimate corn and soybean yield from weather data for agrophysical units (APU) in Iowa. The predictor variables are derived from monthly average temperature and monthly total precipitation data at meteorological stations in the cooperative network. The models are similar in form to the previous models developed for crop reporting districts (CRD). The trends and derived variables were the same and the approach to select the significant predictors was similar to that used in developing the CRD models. The APU's were selected to be more homogeneous with respect crop to production than the CRDs. The APU models are quite similar to the CRD models, similar explained variation and number of predictor variables. The APU models are to be independently evaluated and compared to the previously evaluated CRD models. That comparison should indicate the preferred model area for this application, i.e., APU or CRD.

  9. Dental occlusion and temporomandibular disorders.

    PubMed

    Stone, J Caitlin; Hannah, Andrew; Nagar, Nathan

    2017-10-27

    Data sourcesMedline, Scopus and Google Scholar.Study selectionTwo reviewers selected studies independently. English language clinical studies assessing the association between temporomandibular disorders (TMD) and features of dental occlusion were considered.Data extraction and synthesisStudy quality was assessed based on the Newcastle-Ottawa Scale (NOS) and a narrative synthesis was presented.ResultsIn all 25 studies (17 case-control, eight comparative) were included. Overall there was a high variability between occlusal features and TMD diagnosis. Findings were consistent with a lack of clinically relevant association between TMD and dental occlusion. Only two studies were associated with TMD in the majority (≥50%) of single variable analyses in patient populations. Only mediotrusive interferences are associated with TMD in the majority of multiple variable analyses.ConclusionsThe findings support the absence of a disease-specific association, there is no ground to hypothesise a major role for dental occlusion in the pathophysiology of TMDs. Dental clinicians are thus encouraged to move forward and abandon the old-fashioned gnathological paradig.

  10. Integrated control-structure design

    NASA Technical Reports Server (NTRS)

    Hunziker, K. Scott; Kraft, Raymond H.; Bossi, Joseph A.

    1991-01-01

    A new approach for the design and control of flexible space structures is described. The approach integrates the structure and controller design processes thereby providing extra opportunities for avoiding some of the disastrous effects of control-structures interaction and for discovering new, unexpected avenues of future structural design. A control formulation based on Boyd's implementation of Youla parameterization is employed. Control design parameters are coupled with structural design variables to produce a set of integrated-design variables which are selected through optimization-based methodology. A performance index reflecting spacecraft mission goals and constraints is formulated and optimized with respect to the integrated design variables. Initial studies have been concerned with achieving mission requirements with a lighter, more flexible space structure. Details of the formulation of the integrated-design approach are presented and results are given from a study involving the integrated redesign of a flexible geostationary platform.

  11. Selection of key ambient particulate variables for epidemiological studies - applying cluster and heatmap analyses as tools for data reduction.

    PubMed

    Gu, Jianwei; Pitz, Mike; Breitner, Susanne; Birmili, Wolfram; von Klot, Stephanie; Schneider, Alexandra; Soentgen, Jens; Reller, Armin; Peters, Annette; Cyrys, Josef

    2012-10-01

    The success of epidemiological studies depends on the use of appropriate exposure variables. The purpose of this study is to extract a relatively small selection of variables characterizing ambient particulate matter from a large measurement data set. The original data set comprised a total of 96 particulate matter variables that have been continuously measured since 2004 at an urban background aerosol monitoring site in the city of Augsburg, Germany. Many of the original variables were derived from measured particle size distribution (PSD) across the particle diameter range 3 nm to 10 μm, including size-segregated particle number concentration, particle length concentration, particle surface concentration and particle mass concentration. The data set was complemented by integral aerosol variables. These variables were measured by independent instruments, including black carbon, sulfate, particle active surface concentration and particle length concentration. It is obvious that such a large number of measured variables cannot be used in health effect analyses simultaneously. The aim of this study is a pre-screening and a selection of the key variables that will be used as input in forthcoming epidemiological studies. In this study, we present two methods of parameter selection and apply them to data from a two-year period from 2007 to 2008. We used the agglomerative hierarchical cluster method to find groups of similar variables. In total, we selected 15 key variables from 9 clusters which are recommended for epidemiological analyses. We also applied a two-dimensional visualization technique called "heatmap" analysis to the Spearman correlation matrix. 12 key variables were selected using this method. Moreover, the positive matrix factorization (PMF) method was applied to the PSD data to characterize the possible particle sources. Correlations between the variables and PMF factors were used to interpret the meaning of the cluster and the heatmap analyses. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Reward speeds up and increases consistency of visual selective attention: a lifespan comparison.

    PubMed

    Störmer, Viola; Eppinger, Ben; Li, Shu-Chen

    2014-06-01

    Children and older adults often show less favorable reward-based learning and decision making, relative to younger adults. It is unknown, however, whether reward-based processes that influence relatively early perceptual and attentional processes show similar lifespan differences. In this study, we investigated whether stimulus-reward associations affect selective visual attention differently across the human lifespan. Children, adolescents, younger adults, and older adults performed a visual search task in which the target colors were associated with either high or low monetary rewards. We discovered that high reward value speeded up response times across all four age groups, indicating that reward modulates attentional selection across the lifespan. This speed-up in response time was largest in younger adults, relative to the other three age groups. Furthermore, only younger adults benefited from high reward value in increasing response consistency (i.e., reduction of trial-by-trial reaction time variability). Our findings suggest that reward-based modulations of relatively early and implicit perceptual and attentional processes are operative across the lifespan, and the effects appear to be greater in adulthood. The age-specific effect of reward on reducing intraindividual response variability in younger adults likely reflects mechanisms underlying the development and aging of reward processing, such as lifespan age differences in the efficacy of dopaminergic modulation. Overall, the present results indicate that reward shapes visual perception across different age groups by biasing attention to motivationally salient events.

  13. VARIABILITY AND CHARACTER ASSOCIATION IN ROSE COLOURED LEADWORT (PLUMBAGO ROSEA Linn.)

    PubMed Central

    Kurian, Alice; Anitha, C.A.; Nybe, E.V.

    2001-01-01

    Forty five plumbago rosea accessions collected from different parts of Kerala state were evaluated for variability in morphological and yield related characters and plumbagin content. Highly significant variation was evident for all the characters studied except leaf size indicating wide variability in the accessions. Accessions PR 25 and PR 31 appear to be promising with respect to root yield and high plumbagin content. Character association revelated significant and positive correlation of all the characters except leaf size with yield. Hence, selection of high yielding types could easily be done based on visual characters expressing more vegetative growth but with reduced leaf size. PMID:22557037

  14. Predicting the graft survival for heart-lung transplantation patients: an integrated data mining methodology.

    PubMed

    Oztekin, Asil; Delen, Dursun; Kong, Zhenyu James

    2009-12-01

    Predicting the survival of heart-lung transplant patients has the potential to play a critical role in understanding and improving the matching procedure between the recipient and graft. Although voluminous data related to the transplantation procedures is being collected and stored, only a small subset of the predictive factors has been used in modeling heart-lung transplantation outcomes. The previous studies have mainly focused on applying statistical techniques to a small set of factors selected by the domain-experts in order to reveal the simple linear relationships between the factors and survival. The collection of methods known as 'data mining' offers significant advantages over conventional statistical techniques in dealing with the latter's limitations such as normality assumption of observations, independence of observations from each other, and linearity of the relationship between the observations and the output measure(s). There are statistical methods that overcome these limitations. Yet, they are computationally more expensive and do not provide fast and flexible solutions as do data mining techniques in large datasets. The main objective of this study is to improve the prediction of outcomes following combined heart-lung transplantation by proposing an integrated data-mining methodology. A large and feature-rich dataset (16,604 cases with 283 variables) is used to (1) develop machine learning based predictive models and (2) extract the most important predictive factors. Then, using three different variable selection methods, namely, (i) machine learning methods driven variables-using decision trees, neural networks, logistic regression, (ii) the literature review-based expert-defined variables, and (iii) common sense-based interaction variables, a consolidated set of factors is generated and used to develop Cox regression models for heart-lung graft survival. The predictive models' performance in terms of 10-fold cross-validation accuracy rates for two multi-imputed datasets ranged from 79% to 86% for neural networks, from 78% to 86% for logistic regression, and from 71% to 79% for decision trees. The results indicate that the proposed integrated data mining methodology using Cox hazard models better predicted the graft survival with different variables than the conventional approaches commonly used in the literature. This result is validated by the comparison of the corresponding Gains charts for our proposed methodology and the literature review based Cox results, and by the comparison of Akaike information criteria (AIC) values received from each. Data mining-based methodology proposed in this study reveals that there are undiscovered relationships (i.e. interactions of the existing variables) among the survival-related variables, which helps better predict the survival of the heart-lung transplants. It also brings a different set of variables into the scene to be evaluated by the domain-experts and be considered prior to the organ transplantation.

  15. The effect of friend selection on social influences in obesity.

    PubMed

    Trogdon, Justin G; Allaire, Benjamin T

    2014-12-01

    We present an agent-based model of weight choice and peer selection that simulates the effect of peer selection on social multipliers for weight loss interventions. The model generates social clustering around weight through two mechanisms: a causal link from others' weight to an individual's weight and the propensity to select peers based on weight. We simulated weight loss interventions and tried to identify intervention targets that maximized the spillover of weight loss from intervention participants to nonparticipants. Social multipliers increase with the number of intervention participants' friends. For example, when friend selection was based on a variable exogenous to weight, the weight lost among non-participants increased by 23% (14.3lb vs. 11.6lb) when targeting the most popular obese. Holding constant the number of participants' friends, multipliers increase with increased weight clustering due to selection, up to a point. For example, among the most popular obese, social multipliers when matching on a characteristic correlated with weight (1.189) were higher than when matching on the exogenous characteristic (1.168) and when matching on weight (1.180). Increased weight clustering also implies more obese "friends of friends" of participants, who reduce social multipliers. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Recursive feature selection with significant variables of support vectors.

    PubMed

    Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh

    2012-01-01

    The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  17. Parameter selection for and implementation of a web-based decision-support tool to predict extubation outcome in premature infants.

    PubMed

    Mueller, Martina; Wagner, Carol L; Annibale, David J; Knapp, Rebecca G; Hulsey, Thomas C; Almeida, Jonas S

    2006-03-01

    Approximately 30% of intubated preterm infants with respiratory distress syndrome (RDS) will fail attempted extubation, requiring reintubation and mechanical ventilation. Although ventilator technology and monitoring of premature infants have improved over time, optimal extubation remains challenging. Furthermore, extubation decisions for premature infants require complex informational processing, techniques implicitly learned through clinical practice. Computer-aided decision-support tools would benefit inexperienced clinicians, especially during peak neonatal intensive care unit (NICU) census. A five-step procedure was developed to identify predictive variables. Clinical expert (CE) thought processes comprised one model. Variables from that model were used to develop two mathematical models for the decision-support tool: an artificial neural network (ANN) and a multivariate logistic regression model (MLR). The ranking of the variables in the three models was compared using the Wilcoxon Signed Rank Test. The best performing model was used in a web-based decision-support tool with a user interface implemented in Hypertext Markup Language (HTML) and the mathematical model employing the ANN. CEs identified 51 potentially predictive variables for extubation decisions for an infant on mechanical ventilation. Comparisons of the three models showed a significant difference between the ANN and the CE (p = 0.0006). Of the original 51 potentially predictive variables, the 13 most predictive variables were used to develop an ANN as a web-based decision-tool. The ANN processes user-provided data and returns the prediction 0-1 score and a novelty index. The user then selects the most appropriate threshold for categorizing the prediction as a success or failure. Furthermore, the novelty index, indicating the similarity of the test case to the training case, allows the user to assess the confidence level of the prediction with regard to how much the new data differ from the data originally used for the development of the prediction tool. State-of-the-art, machine-learning methods can be employed for the development of sophisticated tools to aid clinicians' decisions. We identified numerous variables considered relevant for extubation decisions for mechanically ventilated premature infants with RDS. We then developed a web-based decision-support tool for clinicians which can be made widely available and potentially improve patient care world wide.

  18. A Cautious Note on Auxiliary Variables That Can Increase Bias in Missing Data Problems.

    PubMed

    Thoemmes, Felix; Rose, Norman

    2014-01-01

    The treatment of missing data in the social sciences has changed tremendously during the last decade. Modern missing data techniques such as multiple imputation and full-information maximum likelihood are used much more frequently. These methods assume that data are missing at random. One very common approach to increase the likelihood that missing at random is achieved consists of including many covariates as so-called auxiliary variables. These variables are either included based on data considerations or in an inclusive fashion; that is, taking all available auxiliary variables. In this article, we point out that there are some instances in which auxiliary variables exhibit the surprising property of increasing bias in missing data problems. In a series of focused simulation studies, we highlight some situations in which this type of biasing behavior can occur. We briefly discuss possible ways how one can avoid selecting bias-inducing covariates as auxiliary variables.

  19. The Speed of Serial Attention Shifts in Visual Search: Evidence from the N2pc Component.

    PubMed

    Grubert, Anna; Eimer, Martin

    2016-02-01

    Finding target objects among distractors in visual search display is often assumed to be based on sequential movements of attention between different objects. However, the speed of such serial attention shifts is still under dispute. We employed a search task that encouraged the successive allocation of attention to two target objects in the same search display and measured N2pc components to determine how fast attention moved between these objects. Each display contained one digit in a known color (fixed-color target) and another digit whose color changed unpredictably across trials (variable-color target) together with two gray distractor digits. Participants' task was to find the fixed-color digit and compare its numerical value with that of the variable-color digit. N2pc components to fixed-color targets preceded N2pc components to variable-color digits, demonstrating that these two targets were indeed selected in a fixed serial order. The N2pc to variable-color digits emerged approximately 60 msec after the N2pc to fixed-color digits, which shows that attention can be reallocated very rapidly between different target objects in the visual field. When search display durations were increased, thereby relaxing the temporal demands on serial selection, the two N2pc components to fixed-color and variable-color targets were elicited within 90 msec of each other. Results demonstrate that sequential shifts of attention between different target locations can operate very rapidly at speeds that are in line with the assumptions of serial selection models of visual search.

  20. Bayesian Group Bridge for Bi-level Variable Selection.

    PubMed

    Mallick, Himel; Yi, Nengjun

    2017-06-01

    A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.

  1. Prediction of thoracic injury severity in frontal impacts by selected anatomical morphomic variables through model-averaged logistic regression approach.

    PubMed

    Zhang, Peng; Parenteau, Chantal; Wang, Lu; Holcombe, Sven; Kohoyda-Inglis, Carla; Sullivan, June; Wang, Stewart

    2013-11-01

    This study resulted in a model-averaging methodology that predicts crash injury risk using vehicle, demographic, and morphomic variables and assesses the importance of individual predictors. The effectiveness of this methodology was illustrated through analysis of occupant chest injuries in frontal vehicle crashes. The crash data were obtained from the International Center for Automotive Medicine (ICAM) database for calendar year 1996 to 2012. The morphomic data are quantitative measurements of variations in human body 3-dimensional anatomy. Morphomics are obtained from imaging records. In this study, morphomics were obtained from chest, abdomen, and spine CT using novel patented algorithms. A NASS-trained crash investigator with over thirty years of experience collected the in-depth crash data. There were 226 cases available with occupants involved in frontal crashes and morphomic measurements. Only cases with complete recorded data were retained for statistical analysis. Logistic regression models were fitted using all possible configurations of vehicle, demographic, and morphomic variables. Different models were ranked by the Akaike Information Criteria (AIC). An averaged logistic regression model approach was used due to the limited sample size relative to the number of variables. This approach is helpful when addressing variable selection, building prediction models, and assessing the importance of individual variables. The final predictive results were developed using this approach, based on the top 100 models in the AIC ranking. Model-averaging minimized model uncertainty, decreased the overall prediction variance, and provided an approach to evaluating the importance of individual variables. There were 17 variables investigated: four vehicle, four demographic, and nine morphomic. More than 130,000 logistic models were investigated in total. The models were characterized into four scenarios to assess individual variable contribution to injury risk. Scenario 1 used vehicle variables; Scenario 2, vehicle and demographic variables; Scenario 3, vehicle and morphomic variables; and Scenario 4 used all variables. AIC was used to rank the models and to address over-fitting. In each scenario, the results based on the top three models and the averages of the top 100 models were presented. The AIC and the area under the receiver operating characteristic curve (AUC) were reported in each model. The models were re-fitted after removing each variable one at a time. The increases of AIC and the decreases of AUC were then assessed to measure the contribution and importance of the individual variables in each model. The importance of the individual variables was also determined by their weighted frequencies of appearance in the top 100 selected models. Overall, the AUC was 0.58 in Scenario 1, 0.78 in Scenario 2, 0.76 in Scenario 3 and 0.82 in Scenario 4. The results showed that morphomic variables are as accurate at predicting injury risk as demographic variables. The results of this study emphasize the importance of including morphomic variables when assessing injury risk. The results also highlight the need for morphomic data in the development of human mathematical models when assessing restraint performance in frontal crashes, since morphomic variables are more "tangible" measurements compared to demographic variables such as age and gender. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. Continuous-time discrete-space models for animal movement

    USGS Publications Warehouse

    Hanks, Ephraim M.; Hooten, Mevin B.; Alldredge, Mat W.

    2015-01-01

    The processes influencing animal movement and resource selection are complex and varied. Past efforts to model behavioral changes over time used Bayesian statistical models with variable parameter space, such as reversible-jump Markov chain Monte Carlo approaches, which are computationally demanding and inaccessible to many practitioners. We present a continuous-time discrete-space (CTDS) model of animal movement that can be fit using standard generalized linear modeling (GLM) methods. This CTDS approach allows for the joint modeling of location-based as well as directional drivers of movement. Changing behavior over time is modeled using a varying-coefficient framework which maintains the computational simplicity of a GLM approach, and variable selection is accomplished using a group lasso penalty. We apply our approach to a study of two mountain lions (Puma concolor) in Colorado, USA.

  3. Firefly algorithm versus genetic algorithm as powerful variable selection tools and their effect on different multivariate calibration models in spectroscopy: A comparative study.

    PubMed

    Attia, Khalid A M; Nassar, Mohammed W I; El-Zeiny, Mohamed B; Serag, Ahmed

    2017-01-05

    For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. A comparison of regression methods for model selection in individual-based landscape genetic analysis.

    PubMed

    Shirk, Andrew J; Landguth, Erin L; Cushman, Samuel A

    2018-01-01

    Anthropogenic migration barriers fragment many populations and limit the ability of species to respond to climate-induced biome shifts. Conservation actions designed to conserve habitat connectivity and mitigate barriers are needed to unite fragmented populations into larger, more viable metapopulations, and to allow species to track their climate envelope over time. Landscape genetic analysis provides an empirical means to infer landscape factors influencing gene flow and thereby inform such conservation actions. However, there are currently many methods available for model selection in landscape genetics, and considerable uncertainty as to which provide the greatest accuracy in identifying the true landscape model influencing gene flow among competing alternative hypotheses. In this study, we used population genetic simulations to evaluate the performance of seven regression-based model selection methods on a broad array of landscapes that varied by the number and type of variables contributing to resistance, the magnitude and cohesion of resistance, as well as the functional relationship between variables and resistance. We also assessed the effect of transformations designed to linearize the relationship between genetic and landscape distances. We found that linear mixed effects models had the highest accuracy in every way we evaluated model performance; however, other methods also performed well in many circumstances, particularly when landscape resistance was high and the correlation among competing hypotheses was limited. Our results provide guidance for which regression-based model selection methods provide the most accurate inferences in landscape genetic analysis and thereby best inform connectivity conservation actions. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  5. Intelligent scanning: automated standard plane selection and biometric measurement of early gestational sac in routine ultrasound examination.

    PubMed

    Zhang, Ling; Chen, Siping; Chin, Chien Ting; Wang, Tianfu; Li, Shengli

    2012-08-01

    To assist radiologists and decrease interobserver variability when using 2D ultrasonography (US) to locate the standardized plane of early gestational sac (SPGS) and to perform gestational sac (GS) biometric measurements. In this paper, the authors report the design of the first automatic solution, called "intelligent scanning" (IS), for selecting SPGS and performing biometric measurements using real-time 2D US. First, the GS is efficiently and precisely located in each ultrasound frame by exploiting a coarse to fine detection scheme based on the training of two cascade AdaBoost classifiers. Next, the SPGS are automatically selected by eliminating false positives. This is accomplished using local context information based on the relative position of anatomies in the image sequence. Finally, a database-guided multiscale normalized cuts algorithm is proposed to generate the initial contour of the GS, based on which the GS is automatically segmented for measurement by a modified snake model. This system was validated on 31 ultrasound videos involving 31 pregnant volunteers. The differences between system performance and radiologist performance with respect to SPGS selection and length and depth (diameter) measurements are 7.5% ± 5.0%, 5.5% ± 5.2%, and 6.5% ± 4.6%, respectively. Additional validations prove that the IS precision is in the range of interobserver variability. Our system can display the SPGS along with biometric measurements in approximately three seconds after the video ends, when using a 1.9 GHz dual-core computer. IS of the GS from 2D real-time US is a practical, reproducible, and reliable approach.

  6. Environmental variability and acoustic signals: a multi-level approach in songbirds.

    PubMed

    Medina, Iliana; Francis, Clinton D

    2012-12-23

    Among songbirds, growing evidence suggests that acoustic adaptation of song traits occurs in response to habitat features. Despite extensive study, most research supporting acoustic adaptation has only considered acoustic traits averaged for species or populations, overlooking intraindividual variation of song traits, which may facilitate effective communication in heterogeneous and variable environments. Fewer studies have explicitly incorporated sexual selection, which, if strong, may favour variation across environments. Here, we evaluate the prevalence of acoustic adaptation among 44 species of songbirds by determining how environmental variability and sexual selection intensity are associated with song variability (intraindividual and intraspecific) and short-term song complexity. We show that variability in precipitation can explain short-term song complexity among taxonomically diverse songbirds, and that precipitation seasonality and the intensity of sexual selection are related to intraindividual song variation. Our results link song complexity to environmental variability, something previously found for mockingbirds (Family Mimidae). Perhaps more importantly, our results illustrate that individual variation in song traits may be shaped by both environmental variability and strength of sexual selection.

  7. Representativeness-based sampling network design for the State of Alaska

    Treesearch

    Forrest M. Hoffman; Jitendra Kumar; Richard T. Mills; William W. Hargrove

    2013-01-01

    Resource and logistical constraints limit the frequency and extent of environmental observations, particularly in the Arctic, necessitating the development of a systematic sampling strategy to maximize coverage and objectively represent environmental variability at desired scales. A quantitative methodology for stratifying sampling domains, informing site selection,...

  8. Standardization of SOPs to Evaluations: Impacts on Regulatory Decisions using Learning and Memory as Case Studies

    EPA Science Inventory

    In an era of global trade and regulatory cooperation, consistent and scientifically based interpretation of developmental neurotoxicity (DNT) studies is essential, particularly for non­ standard assays and variable endpoints. Because there is flexibility in the selection of ...

  9. Evaluating the performance of different predictor strategies in regression-based downscaling with a focus on glacierized mountain environments

    NASA Astrophysics Data System (ADS)

    Hofer, Marlis; Nemec, Johanna

    2016-04-01

    This study presents first steps towards verifying the hypothesis that uncertainty in global and regional glacier mass simulations can be reduced considerably by reducing the uncertainty in the high-resolution atmospheric input data. To this aim, we systematically explore the potential of different predictor strategies for improving the performance of regression-based downscaling approaches. The investigated local-scale target variables are precipitation, air temperature, wind speed, relative humidity and global radiation, all at a daily time scale. Observations of these target variables are assessed from three sites in geo-environmentally and climatologically very distinct settings, all within highly complex topography and in the close proximity to mountain glaciers: (1) the Vernagtbach station in the Northern European Alps (VERNAGT), (2) the Artesonraju measuring site in the tropical South American Andes (ARTESON), and (3) the Brewster measuring site in the Southern Alps of New Zealand (BREWSTER). As the large-scale predictors, ERA interim reanalysis data are used. In the applied downscaling model training and evaluation procedures, particular emphasis is put on appropriately accounting for the pitfalls of limited and/or patchy observation records that are usually the only (if at all) available data from the glacierized mountain sites. Generalized linear models and beta regression are investigated as alternatives to ordinary least squares regression for the non-Gaussian target variables. By analyzing results for the three different sites, five predictands and for different times of the year, we look for systematic improvements in the downscaling models' skill specifically obtained by (i) using predictor data at the optimum scale rather than the minimum scale of the reanalysis data, (ii) identifying the optimum predictor allocation in the vertical, and (iii) considering multiple (variable, level and/or grid point) predictor options combined with state-of-art empirical feature selection tools. First results show that in particular for air temperature, those downscaling models based on direct predictor selection show comparative skill like those models based on multiple predictors. For all other target variables, however, multiple predictor approaches can considerably outperform those models based on single predictors. Including multiple variable types emerges as the most promising predictor option (in particular for wind speed at all sites), even if the same predictor set is used across the different cases.

  10. Predictors of Sun-Protective Practices among Iranian Female College Students: Application of Protection Motivation Theory.

    PubMed

    Dehbari, Samaneh Rooshanpour; Dehdari, Tahereh; Dehdari, Laleh; Mahmoudi, Maryam

    2015-01-01

    Given the importance of sun protection in the prevention of skin cancer, this study was designed to determine predictors of sun-protective practices among a sample of Iranian female college students based on protection motivation theory (PMT) variables. In this cross-sectional study, a total of 201 female college students in Iran University of Medical Sciences were selected. Demographic and PMT variables were assessed with a 67-item questionnaire. Multiple linear regression was used to identify demographic and PMT variables that were associated with sun-protective practices and intention. one percent of participants always wore a hat with a brim, 3.5% gloves and 15.9% sunglasses while outdoors. Only 10.9% regularly had their skin checked by a doctor. Perceived rewards, response efficacy, fear, self-efficacy and marital status were the five variables which could predict 39% variance of participants intention to perform sun-protective practices. Also, intention and response cost explained 31% of the variance of sun-protective practices. These predictive variables may be used to develop theory-based education interventions interventions to prevent skin cancer among college students.

  11. An Investigation of Factors That Influence the Hypothesis Generation Ability of Students in School- Based Agricultural Education Programs When Troubleshooting Small Gasoline Engines

    ERIC Educational Resources Information Center

    Blackburn, J. Joey; Robinson, J. Shane

    2017-01-01

    The purpose of this study was to determine if selected factors influenced the ability of students in school-based agricultural education programs to generate a correct hypothesis when troubleshooting small gasoline engines. Variables of interest included students' cognitive style, age, GPA, and content knowledge in small gasoline engines. Kirton's…

  12. Artificial neural network model for ozone concentration estimation and Monte Carlo analysis

    NASA Astrophysics Data System (ADS)

    Gao, Meng; Yin, Liting; Ning, Jicai

    2018-07-01

    Air pollution in urban atmosphere directly affects public-health; therefore, it is very essential to predict air pollutant concentrations. Air quality is a complex function of emissions, meteorology and topography, and artificial neural networks (ANNs) provide a sound framework for relating these variables. In this study, we investigated the feasibility of using ANN model with meteorological parameters as input variables to predict ozone concentration in the urban area of Jinan, a metropolis in Northern China. We firstly found that the architecture of network of neurons had little effect on the predicting capability of ANN model. A parsimonious ANN model with 6 routinely monitored meteorological parameters and one temporal covariate (the category of day, i.e. working day, legal holiday and regular weekend) as input variables was identified, where the 7 input variables were selected following the forward selection procedure. Compared with the benchmarking ANN model with 9 meteorological and photochemical parameters as input variables, the predicting capability of the parsimonious ANN model was acceptable. Its predicting capability was also verified in term of warming success ratio during the pollution episodes. Finally, uncertainty and sensitivity analysis were also performed based on Monte Carlo simulations (MCS). It was concluded that the ANN could properly predict the ambient ozone level. Maximum temperature, atmospheric pressure, sunshine duration and maximum wind speed were identified as the predominate input variables significantly influencing the prediction of ambient ozone concentrations.

  13. Applying causal mediation analysis to personality disorder research.

    PubMed

    Walters, Glenn D

    2018-01-01

    This article is designed to address fundamental issues in the application of causal mediation analysis to research on personality disorders. Causal mediation analysis is used to identify mechanisms of effect by testing variables as putative links between the independent and dependent variables. As such, it would appear to have relevance to personality disorder research. It is argued that proper implementation of causal mediation analysis requires that investigators take several factors into account. These factors are discussed under 5 headings: variable selection, model specification, significance evaluation, effect size estimation, and sensitivity testing. First, care must be taken when selecting the independent, dependent, mediator, and control variables for a mediation analysis. Some variables make better mediators than others and all variables should be based on reasonably reliable indicators. Second, the mediation model needs to be properly specified. This requires that the data for the analysis be prospectively or historically ordered and possess proper causal direction. Third, it is imperative that the significance of the identified pathways be established, preferably with a nonparametric bootstrap resampling approach. Fourth, effect size estimates should be computed or competing pathways compared. Finally, investigators employing the mediation method are advised to perform a sensitivity analysis. Additional topics covered in this article include parallel and serial multiple mediation designs, moderation, and the relationship between mediation and moderation. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  14. Species distribution model transferability and model grain size - finer may not always be better.

    PubMed

    Manzoor, Syed Amir; Griffiths, Geoffrey; Lukac, Martin

    2018-05-08

    Species distribution models have been used to predict the distribution of invasive species for conservation planning. Understanding spatial transferability of niche predictions is critical to promote species-habitat conservation and forecasting areas vulnerable to invasion. Grain size of predictor variables is an important factor affecting the accuracy and transferability of species distribution models. Choice of grain size is often dependent on the type of predictor variables used and the selection of predictors sometimes rely on data availability. This study employed the MAXENT species distribution model to investigate the effect of the grain size on model transferability for an invasive plant species. We modelled the distribution of Rhododendron ponticum in Wales, U.K. and tested model performance and transferability by varying grain size (50 m, 300 m, and 1 km). MAXENT-based models are sensitive to grain size and selection of variables. We found that over-reliance on the commonly used bioclimatic variables may lead to less accurate models as it often compromises the finer grain size of biophysical variables which may be more important determinants of species distribution at small spatial scales. Model accuracy is likely to increase with decreasing grain size. However, successful model transferability may require optimization of model grain size.

  15. A New Integrated Weighted Model in SNOW-V10: Verification of Categorical Variables

    NASA Astrophysics Data System (ADS)

    Huang, Laura X.; Isaac, George A.; Sheng, Grant

    2014-01-01

    This paper presents the verification results for nowcasts of seven categorical variables from an integrated weighted model (INTW) and the underlying numerical weather prediction (NWP) models. Nowcasting, or short range forecasting (0-6 h), over complex terrain with sufficient accuracy is highly desirable but a very challenging task. A weighting, evaluation, bias correction and integration system (WEBIS) for generating nowcasts by integrating NWP forecasts and high frequency observations was used during the Vancouver 2010 Olympic and Paralympic Winter Games as part of the Science of Nowcasting Olympic Weather for Vancouver 2010 (SNOW-V10) project. Forecast data from Canadian high-resolution deterministic NWP system with three nested grids (at 15-, 2.5- and 1-km horizontal grid-spacing) were selected as background gridded data for generating the integrated nowcasts. Seven forecast variables of temperature, relative humidity, wind speed, wind gust, visibility, ceiling and precipitation rate are treated as categorical variables for verifying the integrated weighted forecasts. By analyzing the verification of forecasts from INTW and the NWP models among 15 sites, the integrated weighted model was found to produce more accurate forecasts for the 7 selected forecast variables, regardless of location. This is based on the multi-categorical Heidke skill scores for the test period 12 February to 21 March 2010.

  16. Effects and detection of raw material variability on the performance of near-infrared calibration models for pharmaceutical products.

    PubMed

    Igne, Benoit; Shi, Zhenqi; Drennen, James K; Anderson, Carl A

    2014-02-01

    The impact of raw material variability on the prediction ability of a near-infrared calibration model was studied. Calibrations, developed from a quaternary mixture design comprising theophylline anhydrous, lactose monohydrate, microcrystalline cellulose, and soluble starch, were challenged by intentional variation of raw material properties. A design with two theophylline physical forms, three lactose particle sizes, and two starch manufacturers was created to test model robustness. Further challenges to the models were accomplished through environmental conditions. Along with full-spectrum partial least squares (PLS) modeling, variable selection by dynamic backward PLS and genetic algorithms was utilized in an effort to mitigate the effects of raw material variability. In addition to evaluating models based on their prediction statistics, prediction residuals were analyzed by analyses of variance and model diagnostics (Hotelling's T(2) and Q residuals). Full-spectrum models were significantly affected by lactose particle size. Models developed by selecting variables gave lower prediction errors and proved to be a good approach to limit the effect of changing raw material characteristics. Hotelling's T(2) and Q residuals provided valuable information that was not detectable when studying only prediction trends. Diagnostic statistics were demonstrated to be critical in the appropriate interpretation of the prediction of quality parameters. © 2013 Wiley Periodicals, Inc. and the American Pharmacists Association.

  17. Probabilistic structural analysis methods for improving Space Shuttle engine reliability

    NASA Technical Reports Server (NTRS)

    Boyce, L.

    1989-01-01

    Probabilistic structural analysis methods are particularly useful in the design and analysis of critical structural components and systems that operate in very severe and uncertain environments. These methods have recently found application in space propulsion systems to improve the structural reliability of Space Shuttle Main Engine (SSME) components. A computer program, NESSUS, based on a deterministic finite-element program and a method of probabilistic analysis (fast probability integration) provides probabilistic structural analysis for selected SSME components. While computationally efficient, it considers both correlated and nonnormal random variables as well as an implicit functional relationship between independent and dependent variables. The program is used to determine the response of a nickel-based superalloy SSME turbopump blade. Results include blade tip displacement statistics due to the variability in blade thickness, modulus of elasticity, Poisson's ratio or density. Modulus of elasticity significantly contributed to blade tip variability while Poisson's ratio did not. Thus, a rational method for choosing parameters to be modeled as random is provided.

  18. Development of a new linearly variable edge filter (LVEF)-based compact slit-less mini-spectrometer

    NASA Astrophysics Data System (ADS)

    Mahmoud, Khaled; Park, Seongchong; Lee, Dong-Hoon

    2018-02-01

    This paper presents the development of a compact charge-coupled detector (CCD) spectrometer. We describe the design, concept and characterization of VNIR linear variable edge filter (LVEF)- based mini-spectrometer. The new instrument has been realized for operation in the 300 nm to 850 nm wavelength range. The instrument consists of a linear variable edge filter in front of CCD array. Low-size, light-weight and low-cost could be achieved using the linearly variable filters with no need to use any moving parts for wavelength selection as in the case of commercial spectrometers available in the market. This overview discusses the main components characteristics, the main concept with the main advantages and limitations reported. Experimental characteristics of the LVEFs are described. The mathematical approach to get the position-dependent slit function of the presented prototype spectrometer and its numerical de-convolution solution for a spectrum reconstruction is described. The performance of our prototype instrument is demonstrated by measuring the spectrum of a reference light source.

  19. Optimal Solar PV Arrays Integration for Distributed Generation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Omitaomu, Olufemi A; Li, Xueping

    2012-01-01

    Solar photovoltaic (PV) systems hold great potential for distributed energy generation by installing PV panels on rooftops of residential and commercial buildings. Yet challenges arise along with the variability and non-dispatchability of the PV systems that affect the stability of the grid and the economics of the PV system. This paper investigates the integration of PV arrays for distributed generation applications by identifying a combination of buildings that will maximize solar energy output and minimize system variability. Particularly, we propose mean-variance optimization models to choose suitable rooftops for PV integration based on Markowitz mean-variance portfolio selection model. We further introducemore » quantity and cardinality constraints to result in a mixed integer quadratic programming problem. Case studies based on real data are presented. An efficient frontier is obtained for sample data that allows decision makers to choose a desired solar energy generation level with a comfortable variability tolerance level. Sensitivity analysis is conducted to show the tradeoffs between solar PV energy generation potential and variability.« less

  20. Gait and footwear in children and adolescents with Charcot-Marie-Tooth disease: A cross-sectional, case-controlled study.

    PubMed

    Kennedy, Rachel A; McGinley, Jennifer L; Paterson, Kade L; Ryan, Monique M; Carroll, Kate

    2018-05-01

    Children with Charcot-Marie-Tooth disease (CMT) report problems with gait and footwear. We evaluated differences in spatio-temporal gait variables and gait variability between children with CMT and typically developing (TD) children, and investigated the effect of footwear upon gait. A cross-sectional study of 30 children with CMT and 30 age- and gender-matched TD children aged 4-18 years. Gait was assessed at self-selected speed on an electronic walkway while barefoot and in two types of the child's own footwear; optimal (e.g., athletic-type runners) and suboptimal (e.g., flip-flops). Children with CMT walked more slowly (mean (SD) -13.81 (3.61) cm/s), with shorter steps (-6.28 (1.37) cm), wider base of support (+2.47 (0.66) cm; all p < 0.001) and greater base of support variability (0.48 (0.15) cm, p = 0.002) compared to TD children. Gait was faster in optimal footwear than suboptimal (-7.55 (1.31) cm/s) and barefoot (-7.42 (1.07) cm/sec; both p < 0.001) in the combined group of children. Gait in suboptimal footwear was more variable compared to barefoot and optimal footwear. Greater base of support variability and reduced balance was moderately correlated for both groups (CMT and TD). Gait is slower with shorter, wider steps and greater base of support variability in children with CMT. Poor balance is associated with greater base of support gait variability. Suboptimal footwear negatively affects gait in all children (CMT and TD), which has clinical implications for children and adolescents with CMT who have weaker feet and ankles, and poor balance. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Fecundity selection on ornamental plumage colour differs between ages and sexes and varies over small spatial scales.

    PubMed

    Parker, T H; Wilkin, T A; Barr, I R; Sheldon, B C; Rowe, L; Griffith, S C

    2011-07-01

    Avian plumage colours are some of the most conspicuous sexual ornaments, and yet standardized selection gradients for plumage colour have rarely been quantified. We examined patterns of fecundity selection on plumage colour in blue tits (Cyanistes caeruleus L.). When not accounting for environmental heterogeneity, we detected relatively few cases of selection. We found significant disruptive selection on adult male crown colour and yearling female chest colour and marginally nonsignificant positive linear selection on adult female crown colour. We discovered no new significant selection gradients with canonical rotation of the matrix of nonlinear selection. Next, using a long-term data set, we identified territory-level environmental variables that predicted fecundity to determine whether these variables influenced patterns of plumage selection. The first of these variables, the density of oaks within 50 m of the nest, influenced selection gradients only for yearling males. The second variable, an inverse function of nesting density, interacted with a subset of plumage selection gradients for yearling males and adult females, although the strength and direction of selection did not vary predictably with population density across these analyses. Overall, fecundity selection on plumage colour in blue tits appeared rare and inconsistent among sexes and age classes. © 2011 The Authors. Journal of Evolutionary Biology © 2011 European Society For Evolutionary Biology.

  2. Analysis of Observational Studies in the Presence of Treatment Selection Bias: Effects of Invasive Cardiac Management on AMI Survival Using Propensity Score and Instrumental Variable Methods

    PubMed Central

    Stukel, Thérèse A.; Fisher, Elliott S; Wennberg, David E.; Alter, David A.; Gottlieb, Daniel J.; Vermeulen, Marian J.

    2007-01-01

    Context Comparisons of outcomes between patients treated and untreated in observational studies may be biased due to differences in patient prognosis between groups, often because of unobserved treatment selection biases. Objective To compare 4 analytic methods for removing the effects of selection bias in observational studies: multivariable model risk adjustment, propensity score risk adjustment, propensity-based matching, and instrumental variable analysis. Design, Setting, and Patients A national cohort of 122 124 patients who were elderly (aged 65–84 years), receiving Medicare, and hospitalized with acute myocardial infarction (AMI) in 1994–1995, and who were eligible for cardiac catheterization. Baseline chart reviews were taken from the Cooperative Cardiovascular Project and linked to Medicare health administrative data to provide a rich set of prognostic variables. Patients were followed up for 7 years through December 31, 2001, to assess the association between long-term survival and cardiac catheterization within 30 days of hospital admission. Main Outcome Measure Risk-adjusted relative mortality rate using each of the analytic methods. Results Patients who received cardiac catheterization (n=73 238) were younger and had lower AMI severity than those who did not. After adjustment for prognostic factors by using standard statistical risk-adjustment methods, cardiac catheterization was associated with a 50% relative decrease in mortality (for multivariable model risk adjustment: adjusted relative risk [RR], 0.51; 95% confidence interval [CI], 0.50–0.52; for propensity score risk adjustment: adjusted RR, 0.54; 95% CI, 0.53–0.55; and for propensity-based matching: adjusted RR, 0.54; 95% CI, 0.52–0.56). Using regional catheterization rate as an instrument, instrumental variable analysis showed a 16% relative decrease in mortality (adjusted RR, 0.84; 95% CI, 0.79–0.90). The survival benefits of routine invasive care from randomized clinical trials are between 8% and 21 %. Conclusions Estimates of the observational association of cardiac catheterization with long-term AMI mortality are highly sensitive to analytic method. All standard risk-adjustment methods have the same limitations regarding removal of unmeasured treatment selection biases. Compared with standard modeling, instrumental variable analysis may produce less biased estimates of treatment effects, but is more suited to answering policy questions than specific clinical questions. PMID:17227979

  3. The employment of Support Vector Machine to classify high and low performance archers based on bio-physiological variables

    NASA Astrophysics Data System (ADS)

    Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Amirul Abdullah, Muhammad; Hasnun Arif Hassan, Mohd; Khalil, Zubair

    2018-04-01

    The present study employs a machine learning algorithm namely support vector machine (SVM) to classify high and low potential archers from a collection of bio-physiological variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. The bio-physiological variables namely resting heart rate, resting respiratory rate, resting diastolic blood pressure, resting systolic blood pressure, as well as calories intake, were measured prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models i.e. linear, quadratic and cubic kernel functions, were trained on the aforementioned variables. The k-means clustered the archers into high (HPA) and low potential archers (LPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy with a classification accuracy of 94% in comparison the other tested models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected bio-physiological variables examined.

  4. Expected value analysis for integrated supplier selection and inventory control of multi-product inventory system with fuzzy cost

    NASA Astrophysics Data System (ADS)

    Sutrisno, Widowati, Tjahjana, R. Heru

    2017-12-01

    The future cost in many industrial problem is obviously uncertain. Then a mathematical analysis for a problem with uncertain cost is needed. In this article, we deals with the fuzzy expected value analysis to solve an integrated supplier selection and supplier selection problem with uncertain cost where the costs uncertainty is approached by a fuzzy variable. We formulate the mathematical model of the problems fuzzy expected value based quadratic optimization with total cost objective function and solve it by using expected value based fuzzy programming. From the numerical examples result performed by the authors, the supplier selection problem was solved i.e. the optimal supplier was selected for each time period where the optimal product volume of all product that should be purchased from each supplier for each time period was determined and the product stock level was controlled as decided by the authors i.e. it was followed the given reference level.

  5. Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits.

    PubMed

    van Heerwaarden, Joost; van Zanten, Martijn; Kruijer, Willem

    2015-10-01

    Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation.

  6. Forecasting of cyanobacterial density in Torrão reservoir using artificial neural networks.

    PubMed

    Torres, Rita; Pereira, Elisa; Vasconcelos, Vítor; Teles, Luís Oliva

    2011-06-01

    The ability of general regression neural networks (GRNN) to forecast the density of cyanobacteria in the Torrão reservoir (Tâmega river, Portugal), in a period of 15 days, based on three years of collected physical and chemical data, was assessed. Several models were developed and 176 were selected based on their correlation values for the verification series. A time lag of 11 was used, equivalent to one sample (periods of 15 days in the summer and 30 days in the winter). Several combinations of the series were used. Input and output data collected from three depths of the reservoir were applied (surface, euphotic zone limit and bottom). The model that presented a higher average correlation value presented the correlations 0.991; 0.843; 0.978 for training, verification and test series. This model had the three series independent in time: first test series, then verification series and, finally, training series. Only six input variables were considered significant to the performance of this model: ammonia, phosphates, dissolved oxygen, water temperature, pH and water evaporation, physical and chemical parameters referring to the three depths of the reservoir. These variables are common to the next four best models produced and, although these included other input variables, their performance was not better than the selected best model.

  7. Renal Function Descriptors in Neonates: Which Creatinine-Based Formula Best Describes Vancomycin Clearance?

    PubMed

    Bhongsatiern, Jiraganya; Stockmann, Chris; Yu, Tian; Constance, Jonathan E; Moorthy, Ganesh; Spigarelli, Michael G; Desai, Pankaj B; Sherwin, Catherine M T

    2016-05-01

    Growth and maturational changes have been identified as significant covariates in describing variability in clearance of renally excreted drugs such as vancomycin. Because of immaturity of clearance mechanisms, quantification of renal function in neonates is of importance. Several serum creatinine (SCr)-based renal function descriptors have been developed in adults and children, but none are selectively derived for neonates. This review summarizes development of the neonatal kidney and discusses assessment of the renal function regarding estimation of glomerular filtration rate using renal function descriptors. Furthermore, identification of the renal function descriptors that best describe the variability of vancomycin clearance was performed in a sample study of a septic neonatal cohort. Population pharmacokinetic models were developed applying a combination of age-weight, renal function descriptors, or SCr alone. In addition to age and weight, SCr or renal function descriptors significantly reduced variability of vancomycin clearance. The population pharmacokinetic models with Léger and modified Schwartz formulas were selected as the optimal final models, although the other renal function descriptors and SCr provided reasonably good fit to the data, suggesting further evaluation of the final models using external data sets and cross validation. The present study supports incorporation of renal function descriptors in the estimation of vancomycin clearance in neonates. © 2015, The American College of Clinical Pharmacology.

  8. Individualised training to address variability of radiologists' performance

    NASA Astrophysics Data System (ADS)

    Sun, Shanghua; Taylor, Paul; Wilkinson, Louise; Khoo, Lisanne

    2008-03-01

    Computer-based tools are increasingly used for training and the continuing professional development of radiologists. We propose an adaptive training system to support individualised learning in mammography, based on a set of real cases, which are annotated with educational content by experienced breast radiologists. The system has knowledge of the strengths and weakness of each radiologist's performance: each radiologist is assessed to compute a profile showing how they perform on different sets of cases, classified by type of abnormality, breast density, and perceptual difficulty. We also assess variability in cognitive aspects of image perception, classifying errors made by radiologists as errors of search, recognition or decision. This is a novel element in our approach. The profile is used to select cases to present to the radiologist. The intelligent and flexible presentation of these cases distinguishes our system from existing training tools. The training cases are organised and indexed by an ontology we have developed for breast radiologist training, which is consistent with the radiologists' profile. Hence, the training system is able to select appropriate cases to compose an individualised training path, addressing the variability of the radiologists' performance. A substantial part of the system, the ontology has been evaluated on a large number of cases, and the training system is under implementation for further evaluation.

  9. Valorization of genetic variability for the qualitative improvement of autochthonous grape cultivars of Cirò's terroir through the self-fertilization.

    PubMed

    Meneghetti, Stefano; Gaiotti, Federica; Giust, Mirella; Belfiore, Nicola; Tomasi, Diego

    2015-03-01

    This study uses PCR-derived marker systems to investigate the genetic differences of 22 grapevine accessions obtained through a self-fertilization program using Gaglioppo and Magliocco dolce. The aim of the study was to improve some qualitative parameters, while preserving the adaptive characteristics of these two cultivars to the adverse environmental conditions of the Calabria region (southern Italy). These two Calabrian grapevines have been cultivated within a restricted area and have been placed under a strong anthropic pressure which has limited their phenotypical variability with no selection of higher performant biotypes. Therefore, to have accessions with improved qualitative traits, a program of genetic improvement based on the self-fertilization of Gaglioppo and Magliocco dolce cultivars was performed in 1998, producing 3,122 accessions. Selection cycles were performed in 14 years. A first selection cycle (1998-2000), based on visual inspection of vegetative traits, selected 1,320 accessions, planted in an experimental vineyard in 2000. A second selection cycle (2000-2008), based on phenotypic traits, sanitary aspects, and chemical composition of the grapes, selected 42 accessions, planted in a new experimental vineyard in 2008. A final selection cycle (2008-2012), produced 22 accessions (virus free), with the best agronomic, sanitary, and qualitative aspects: two accessions obtained from Gaglioppo have been selected by color characteristics (i.e., anthocyanin total content and stability); 20 genotypes obtained from Magliocco dolce had a better macro-composition of the grape (i.e., good sugar content with a balanced acidity). SSR analyses were performed to check the self-fertilization process. The study of genetic differences between accessions was performed by AFLPs, SAMPLs, and M-AFLPs. The application of the above-mentioned techniques allowed both to discriminate molecularly the 22 accessions grouped these accessions according to their genetic similarity. The self-fertilization approach has enabled improvement in the quality of the grapes, while preserving the high degree of adaptation to the environment of these two native Calabrian cultivars in southern Italy.

  10. A fuel-based approach for emission factor development for highway paving construction equipment in China.

    PubMed

    Li, Zhen; Zhang, Kaishan; Pang, Kaili; Di, Baofeng

    2016-12-01

    The objective of this paper is to develop and demonstrate a fuel-based approach for emissions factor estimation for highway paving construction equipment in China for better accuracy. A highway construction site in Chengdu was selected for this study with NO emissions being characterized and demonstrated. Four commonly used paving equipment, i.e., three rollers and one paver were selected in this study. A portable emission measurement system (PEMS) was developed and used for emission measurements of selected equipment during real-world highway construction duties. Three duty modes were defined to characterize the NO emissions, i.e., idling, moving, and working. In order to develop a representative emission factor for these highway construction equipment, composite emission factors were estimated using modal emission rates and the corresponding modal durations in the process of typical construction duties. Depending on duty mode and equipment type, NO emission rate ranged from 2.6-63.7mg/s and 6.0-55.6g/kg-fuel with the fuel consumption ranging from 0.31-4.52 g/s correspondingly. The NO composite emission factor was estimated to be 9-41mg/s with the single-drum roller being the highest and double-drum roller being the lowest and 6-30g/kg-fuel with the pneumatic tire roller being the highest while the double-drum roller being the lowest. For the paver, both time-based and fuel consumption-based NO composite emission rates are higher than all of the rollers with 56mg/s and 30g/kg-fuel, respectively. In terms of time-based quantity, the working mode contributes more than the other modes with idling being the least for both emissions and fuel consumption. In contrast, the fuel-based emission rate appears to have less variability in emissions. Thus, in order to estimate emission factors for emission inventory development, the fuel-based emission factor may be selected for better accuracy. The fuel-based composite emissions factors will be less variable and more accurate than time-based emission factors. As a consequence, emissions inventory developed using this approach will be more accurate and practical.

  11. Modulation Depth Estimation and Variable Selection in State-Space Models for Neural Interfaces

    PubMed Central

    Hochberg, Leigh R.; Donoghue, John P.; Brown, Emery N.

    2015-01-01

    Rapid developments in neural interface technology are making it possible to record increasingly large signal sets of neural activity. Various factors such as asymmetrical information distribution and across-channel redundancy may, however, limit the benefit of high-dimensional signal sets, and the increased computational complexity may not yield corresponding improvement in system performance. High-dimensional system models may also lead to overfitting and lack of generalizability. To address these issues, we present a generalized modulation depth measure using the state-space framework that quantifies the tuning of a neural signal channel to relevant behavioral covariates. For a dynamical system, we develop computationally efficient procedures for estimating modulation depth from multivariate data. We show that this measure can be used to rank neural signals and select an optimal channel subset for inclusion in the neural decoding algorithm. We present a scheme for choosing the optimal subset based on model order selection criteria. We apply this method to neuronal ensemble spike-rate decoding in neural interfaces, using our framework to relate motor cortical activity with intended movement kinematics. With offline analysis of intracortical motor imagery data obtained from individuals with tetraplegia using the BrainGate neural interface, we demonstrate that our variable selection scheme is useful for identifying and ranking the most information-rich neural signals. We demonstrate that our approach offers several orders of magnitude lower complexity but virtually identical decoding performance compared to greedy search and other selection schemes. Our statistical analysis shows that the modulation depth of human motor cortical single-unit signals is well characterized by the generalized Pareto distribution. Our variable selection scheme has wide applicability in problems involving multisensor signal modeling and estimation in biomedical engineering systems. PMID:25265627

  12. Effect of Methodological and Ecological Approaches on Heterogeneity of Nest-Site Selection of a Long-Lived Vulture

    PubMed Central

    Moreno-Opo, Rubén; Fernández-Olalla, Mariana; Margalida, Antoni; Arredondo, Ángel; Guil, Francisco

    2012-01-01

    The application of scientific-based conservation measures requires that sampling methodologies in studies modelling similar ecological aspects produce comparable results making easier their interpretation. We aimed to show how the choice of different methodological and ecological approaches can affect conclusions in nest-site selection studies along different Palearctic meta-populations of an indicator species. First, a multivariate analysis of the variables affecting nest-site selection in a breeding colony of cinereous vulture (Aegypius monachus) in central Spain was performed. Then, a meta-analysis was applied to establish how methodological and habitat-type factors determine differences and similarities in the results obtained by previous studies that have modelled the forest breeding habitat of the species. Our results revealed patterns in nesting-habitat modelling by the cinereous vulture throughout its whole range: steep and south-facing slopes, great cover of large trees and distance to human activities were generally selected. The ratio and situation of the studied plots (nests/random), the use of plots vs. polygons as sampling units and the number of years of data set determined the variability explained by the model. Moreover, a greater size of the breeding colony implied that ecological and geomorphological variables at landscape level were more influential. Additionally, human activities affected in greater proportion to colonies situated in Mediterranean forests. For the first time, a meta-analysis regarding the factors determining nest-site selection heterogeneity for a single species at broad scale was achieved. It is essential to homogenize and coordinate experimental design in modelling the selection of species' ecological requirements in order to avoid that differences in results among studies would be due to methodological heterogeneity. This would optimize best conservation and management practices for habitats and species in a global context. PMID:22413023

  13. Evolution of catalytic RNA in the laboratory

    NASA Technical Reports Server (NTRS)

    Joyce, Gerald F.

    1992-01-01

    We are interested in the biochemistry of existing RNA enzymes and in the development of RNA enzymes with novel catalytic function. The focal point of our research program has been the design and operation of a laboratory system for the controlled evolution of catalytic RNA. This system serves as working model of RNA-based life and can be used to explore the catalytic potential of RNA. Evolution requires the integration of three chemical processes: amplification, mutation, and selection. Amplification results in additional copies of the genetic material. Mutation operates at the level of genotype to introduce variability, this variability in turn being expressed as a range of phenotypes. Selection operates at the level of phenotype to reduce variability by excluding those individuals that do not conform to the prevailing fitness criteria. These three processes must be linked so that only the selected individuals are amplified, subject to mutational error, to produce a progeny distribution of mutant individuals. We devised techniques for the amplification, mutation, and selection of catalytic RNA, all of which can be performed rapidly in vitro within a single reaction vessel. We integrated these techniques in such a way that they can be performed iteratively and routinely. This allowed us to conduct evolution experiments in response to artificially-imposed selection constraints. Our objective was to develop novel RNA enzymes by altering the selection constraints in a controlled manner. In this way we were able to expand the catalytic repertoire of RNA. Our long-range objective is to develop an RNA enzyme with RNA replicase activity. If such an enzyme had the ability to produce additional copies of itself, then RNA evolution would operate autonomously and the origin of life will have been realized in the laboratory.

  14. The relative age effect in soccer: a match-related perspective.

    PubMed

    Vaeyens, Roel; Philippaerts, Renaat M; Malina, Robert M

    2005-07-01

    Asymmetries in the distributions of birth dates in senior professional and youth soccer players have been interpreted as evidence for systematic discrimination against individuals born shortly before the cut-off date in assigning youth to specific age groups. This concept is known as the "relative age effect". The results of a longitudinal study of birth date distritubions of 2757 semi-professional and amateur senior soccer players in Belgium are presented. Records for competitive games were available in official statistics provided by the Royal Belgian Football Association. The chi-square statistic was used to examine differences between observed and expected birth date distributions. Regression analyses indicated a shift of bias when two different start dates were compared. Players born in the early part of the new age band (January to March) were over-represented compared with players born late in the new selection period (October to December). However, players with birthdays at the start of the old selection year (August) were still represented. In a retrospective analysis of 2138 players, variables indicative of match involvement, number of selections for matches, and time played were examined in relation to the relative age effect. The group of semi-professional and amateur senior soccer players born in the first quarter of the selected age band received more playing opportunities. Comparisons of birth date distributions (traditional approach to relative age effect) with match-related variables gave similar, though not entirely consistent, results. However, there were no differences for the mean number of selections and for playing minutes between players born at the start or the end of the selection year. Our findings suggest that match-based variables may provide a more reliable indication of the relative age effect in soccer.

  15. Resampling procedures to identify important SNPs using a consensus approach.

    PubMed

    Pardy, Christopher; Motyer, Allan; Wilson, Susan

    2011-11-29

    Our goal is to identify common single-nucleotide polymorphisms (SNPs) (minor allele frequency > 1%) that add predictive accuracy above that gained by knowledge of easily measured clinical variables. We take an algorithmic approach to predict each phenotypic variable using a combination of phenotypic and genotypic predictors. We perform our procedure on the first simulated replicate and then validate against the others. Our procedure performs well when predicting Q1 but is less successful for the other outcomes. We use resampling procedures where possible to guard against false positives and to improve generalizability. The approach is based on finding a consensus regarding important SNPs by applying random forests and the least absolute shrinkage and selection operator (LASSO) on multiple subsamples. Random forests are used first to discard unimportant predictors, narrowing our focus to roughly 100 important SNPs. A cross-validation LASSO is then used to further select variables. We combine these procedures to guarantee that cross-validation can be used to choose a shrinkage parameter for the LASSO. If the clinical variables were unavailable, this prefiltering step would be essential. We perform the SNP-based analyses simultaneously rather than one at a time to estimate SNP effects in the presence of other causal variants. We analyzed the first simulated replicate of Genetic Analysis Workshop 17 without knowledge of the true model. Post-conference knowledge of the simulation parameters allowed us to investigate the limitations of our approach. We found that many of the false positives we identified were substantially correlated with genuine causal SNPs.

  16. Boosted structured additive regression for Escherichia coli fed-batch fermentation modeling.

    PubMed

    Melcher, Michael; Scharl, Theresa; Luchner, Markus; Striedner, Gerald; Leisch, Friedrich

    2017-02-01

    The quality of biopharmaceuticals and patients' safety are of highest priority and there are tremendous efforts to replace empirical production process designs by knowledge-based approaches. Main challenge in this context is that real-time access to process variables related to product quality and quantity is severely limited. To date comprehensive on- and offline monitoring platforms are used to generate process data sets that allow for development of mechanistic and/or data driven models for real-time prediction of these important quantities. Ultimate goal is to implement model based feed-back control loops that facilitate online control of product quality. In this contribution, we explore structured additive regression (STAR) models in combination with boosting as a variable selection tool for modeling the cell dry mass, product concentration, and optical density on the basis of online available process variables and two-dimensional fluorescence spectroscopic data. STAR models are powerful extensions of linear models allowing for inclusion of smooth effects or interactions between predictors. Boosting constructs the final model in a stepwise manner and provides a variable importance measure via predictor selection frequencies. Our results show that the cell dry mass can be modeled with a relative error of about ±3%, the optical density with ±6%, the soluble protein with ±16%, and the insoluble product with an accuracy of ±12%. Biotechnol. Bioeng. 2017;114: 321-334. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  17. An entropy-variables-based formulation of residual distribution schemes for non-equilibrium flows

    NASA Astrophysics Data System (ADS)

    Garicano-Mena, Jesús; Lani, Andrea; Degrez, Gérard

    2018-06-01

    In this paper we present an extension of Residual Distribution techniques for the simulation of compressible flows in non-equilibrium conditions. The latter are modeled by means of a state-of-the-art multi-species and two-temperature model. An entropy-based variable transformation that symmetrizes the projected advective Jacobian for such a thermophysical model is introduced. Moreover, the transformed advection Jacobian matrix presents a block diagonal structure, with mass-species and electronic-vibrational energy being completely decoupled from the momentum and total energy sub-system. The advantageous structure of the transformed advective Jacobian can be exploited by contour-integration-based Residual Distribution techniques: established schemes that operate on dense matrices can be substituted by the same scheme operating on the momentum-energy subsystem matrix and repeated application of scalar scheme to the mass-species and electronic-vibrational energy terms. Finally, the performance gain of the symmetrizing-variables formulation is quantified on a selection of representative testcases, ranging from subsonic to hypersonic, in inviscid or viscous conditions.

  18. Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA

    PubMed Central

    Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui

    2014-01-01

    Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792

  19. Independence-Based Optimization of Epistemic Model Checking

    DTIC Science & Technology

    2017-02-22

    favourite restaurant . Their waiter informs them that arrangements have been made with the maitre d’hotel for the bill to be paid anonymously. One of the...nondeterministically selecting a value of either 0 or 1. Each cryptographer is associated with a set of variables, whose values they are able to observe at each...slotsCi is used to represent which slot, if any, agent Ci has selected for its transmission; slotsCi[0] represents that the agent has nothing to transmit

  20. Dynamic flashing yellow arrow (FYA): a study on variable left-turn mode operational and safety impacts phase II - model expansion and testing : [summary].

    DOT National Transportation Integrated Search

    2016-05-01

    In phase two of this project, the UCF team further developed the DSS to automate selection of FYA left-turn modes based on traffic volumes at intersections acquired in real time from existing sensors.

  1. Medical Decision Making: A Selective Review for Child Psychiatrists and Psychologists

    ERIC Educational Resources Information Center

    Galanter, Cathryn A.; Patel, Vimla L.

    2005-01-01

    Physicians, including child and adolescent psychiatrists, show variability and inaccuracies in diagnosis and treatment of their patients and do not routinely implement evidenced-based medical and psychiatric treatments in the community. We believe that it is necessary to characterize the decision-making processes of child and adolescent…

  2. Marketing the Community College Starts with Understanding Students' Perspectives.

    ERIC Educational Resources Information Center

    Absher, Keith; Crawford, Gerald

    1996-01-01

    Examines variables taken into account by community college students in choosing a college, arguing that increased competition for students means that colleges must employ marketing strategies. Discusses the use of the selection factors as market segmentation tools. Identifies five principal market segments based on student classifications of…

  3. ESTIMATED EFFECTIVE CHIMNEY HEIGHTS BASED ON RAWINSONDE OBSERVATIONS AT SELECTED SITES IN THE UNITED STATES

    EPA Science Inventory

    The plume rise equations of Briggs (1975) for variable vertical profiles of temperature and wind speed are described and applied for hypothetical small and very large chimneys at five NWS rawinsonde stations across the United States. From other available data additional informati...

  4. Face-Likeness and Image Variability Drive Responses in Human Face-Selective Ventral Regions

    PubMed Central

    Davidenko, Nicolas; Remus, David A.; Grill-Spector, Kalanit

    2012-01-01

    The human ventral visual stream contains regions that respond selectively to faces over objects. However, it is unknown whether responses in these regions correlate with how face-like stimuli appear. Here, we use parameterized face silhouettes to manipulate the perceived face-likeness of stimuli and measure responses in face- and object-selective ventral regions with high-resolution fMRI. We first use “concentric hyper-sphere” (CH) sampling to define face silhouettes at different distances from the prototype face. Observers rate the stimuli as progressively more face-like the closer they are to the prototype face. Paradoxically, responses in both face- and object-selective regions decrease as face-likeness ratings increase. Because CH sampling produces blocks of stimuli whose variability is negatively correlated with face-likeness, this effect may be driven by more adaptation during high face-likeness (low-variability) blocks than during low face-likeness (high-variability) blocks. We tested this hypothesis by measuring responses to matched-variability (MV) blocks of stimuli with similar face-likeness ratings as with CH sampling. Critically, under MV sampling, we find a face-specific effect: responses in face-selective regions gradually increase with perceived face-likeness, but responses in object-selective regions are unchanged. Our studies provide novel evidence that face-selective responses correlate with the perceived face-likeness of stimuli, but this effect is revealed only when image variability is controlled across conditions. Finally, our data show that variability is a powerful factor that drives responses across the ventral stream. This indicates that controlling variability across conditions should be a critical tool in future neuroimaging studies of face and object representation. PMID:21823208

  5. Stochastic model search with binary outcomes for genome-wide association studies

    PubMed Central

    Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo

    2012-01-01

    Objective The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Materials and methods Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. Results BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. Discussion BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. Conclusion The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model. PMID:22534080

  6. Adaptive molecular evolution of the Major Histocompatibility Complex genes, DRA and DQA, in the genus Equus

    PubMed Central

    2011-01-01

    Background Major Histocompatibility Complex (MHC) genes are central to vertebrate immune response and are believed to be under balancing selection by pathogens. This hypothesis has been supported by observations of extremely high polymorphism, elevated nonsynonymous to synonymous base pair substitution rates and trans-species polymorphisms at these loci. In equids, the organization and variability of this gene family has been described, however the full extent of diversity and selection is unknown. As selection is not expected to act uniformly on a functional gene, maximum likelihood codon-based models of selection that allow heterogeneity in selection across codon positions can be valuable for examining MHC gene evolution and the molecular basis for species adaptations. Results We investigated the evolution of two class II MHC genes of the Equine Lymphocyte Antigen (ELA), DRA and DQA, in the genus Equus with the addition of novel alleles identified in plains zebra (E. quagga, formerly E. burchelli). We found that both genes exhibited a high degree of polymorphism and inter-specific sharing of allele lineages. To our knowledge, DRA allelic diversity was discovered to be higher than has ever been observed in vertebrates. Evidence was also found to support a duplication of the DQA locus. Selection analyses, evaluated in terms of relative rates of nonsynonymous to synonymous mutations (dN/dS) averaged over the gene region, indicated that the majority of codon sites were conserved and under purifying selection (dN

  7. Towards an understanding of Internet-based problem shopping behaviour: The concept of online shopping addiction and its proposed predictors

    PubMed Central

    ROSE, SUSAN; DHANDAYUDHAM, ARUN

    2014-01-01

    Background: Compulsive and addictive forms of consumption and buying behaviour have been researched in both business and medical literature. Shopping enabled via the Internet now introduces new features to the shopping experience that translate to positive benefits for the shopper. Evidence now suggests that this new shopping experience may lead to problematic online shopping behaviour. This paper provides a theoretical review of the literature relevant to online shopping addiction (OSA). Based on this selective review, a conceptual model of OSA is presented. Method: The selective review of the literature draws on searches within databases relevant to both clinical and consumer behaviour literature including EBSCO, ABI Pro-Quest, Web of Science – Social Citations Index, Medline, PsycINFO and Pubmed. The article reviews current thinking on problematic, and specifically addictive, behaviour in relation to online shopping. Results: The review of the literature enables the extension of existing knowledge into the Internet-context. A conceptual model of OSA is developed with theoretical support provided for the inclusion of 7 predictor variables: low self-esteem, low self-regulation; negative emotional state; enjoyment; female gender; social anonymity and cognitive overload. The construct of OSA is defined and six component criteria of OSA are proposed based on established technological addiction criteria. Conclusions: Current Internet-based shopping experiences may trigger problematic behaviours which can be classified on a spectrum which at the extreme end incorporates OSA. The development of a conceptual model provides a basis for the future measurement and testing of proposed predictor variables and the outcome variable OSA. PMID:25215218

  8. Towards an understanding of Internet-based problem shopping behaviour: The concept of online shopping addiction and its proposed predictors.

    PubMed

    Rose, Susan; Dhandayudham, Arun

    2014-06-01

    Compulsive and addictive forms of consumption and buying behaviour have been researched in both business and medical literature. Shopping enabled via the Internet now introduces new features to the shopping experience that translate to positive benefits for the shopper. Evidence now suggests that this new shopping experience may lead to problematic online shopping behaviour. This paper provides a theoretical review of the literature relevant to online shopping addiction (OSA). Based on this selective review, a conceptual model of OSA is presented. The selective review of the literature draws on searches within databases relevant to both clinical and consumer behaviour literature including EBSCO, ABI Pro-Quest, Web of Science - Social Citations Index, Medline, PsycINFO and Pubmed. The article reviews current thinking on problematic, and specifically addictive, behaviour in relation to online shopping. The review of the literature enables the extension of existing knowledge into the Internet-context. A conceptual model of OSA is developed with theoretical support provided for the inclusion of 7 predictor variables: low self-esteem, low self-regulation; negative emotional state; enjoyment; female gender; social anonymity and cognitive overload. The construct of OSA is defined and six component criteria of OSA are proposed based on established technological addiction criteria. Current Internet-based shopping experiences may trigger problematic behaviours which can be classified on a spectrum which at the extreme end incorporates OSA. The development of a conceptual model provides a basis for the future measurement and testing of proposed predictor variables and the outcome variable OSA.

  9. Is portion size selection associated with expected satiation, perceived healthfulness or expected tastiness? A case study on pizza using a photograph-based computer task.

    PubMed

    Labbe, D; Rytz, A; Godinot, N; Ferrage, A; Martin, N

    2017-01-01

    Increasing portion sizes over the last 30 years are considered to be one of the factors underlying overconsumption. Past research on the drivers of portion selection for foods showed that larger portions are selected for foods delivering low expected satiation. However, the respective contribution of expected satiation vs. two other potential drivers of portion size selection, i.e. perceived healthfulness and expected tastiness, has never been explored. In this study, we conjointly explored the role of expected satiation, perceived healthfulness and expected tastiness when selecting portions within a range of six commercial pizzas varying in their toppings and brands. For each product, 63 pizza consumers selected a portion size that would satisfy them for lunch and scored their expected satiation, perceived healthfulness and expected tastiness. As six participants selected an entire pizza as ideal portion independently of topping or brand, their data sets were not considered in the data analyses completed on responses from 57 participants. Hierarchical multiple regression analyses showed that portion size variance was predicted by perceived healthiness and expected tastiness variables. Two sub-groups of participants with different portion size patterns across pizzas were identified through post-hoc exploratory analysis. The explanatory power of the regression model was significantly improved by adding interaction terms between sub-group and expected satiation variables and between sub-group and perceived healthfulness variables to the model. Analysis at a sub-group level showed either positive or negative association between portion size and expected satiation depending on sub-groups. For one group, portion size selection was more health-driven and for the other, more hedonic-driven. These results showed that even when considering a well-liked product category, perceived healthfulness can be an important factor influencing portion size decision. Copyright © 2016 Nestec S.A. Published by Elsevier Ltd.. All rights reserved.

  10. Bayesian analysis of factors associated with fibromyalgia syndrome subjects

    NASA Astrophysics Data System (ADS)

    Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie

    2015-01-01

    Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.

  11. Reactibodies generated by kinetic selection couple chemical reactivity with favorable protein dynamics

    PubMed Central

    Smirnov, Ivan; Carletti, Eugénie; Kurkova, Inna; Nachon, Florian; Nicolet, Yvain; Mitkevich, Vladimir A.; Débat, Hélène; Avalle, Bérangère; Belogurov, Alexey A.; Kuznetsov, Nikita; Reshetnyak, Andrey; Masson, Patrick; Tonevitsky, Alexander G.; Ponomarenko, Natalia; Makarov, Alexander A.; Friboulet, Alain; Tramontano, Alfonso; Gabibov, Alexander

    2011-01-01

    Igs offer a versatile template for combinatorial and rational design approaches to the de novo creation of catalytically active proteins. We have used a covalent capture selection strategy to identify biocatalysts from within a human semisynthetic antibody variable fragment library that uses a nucleophilic mechanism. Specific phosphonylation at a single tyrosine within the variable light-chain framework was confirmed in a recombinant IgG construct. High-resolution crystallographic structures of unmodified and phosphonylated Fabs display a 15-Å-deep two-chamber cavity at the interface of variable light (VL) and variable heavy (VH) fragments having a nucleophilic tyrosine at the base of the site. The depth and structure of the pocket are atypical of antibodies in general but can be compared qualitatively with the catalytic site of cholinesterases. A structurally disordered heavy chain complementary determining region 3 loop, constituting a wall of the cleft, is stabilized after covalent modification by hydrogen bonding to the phosphonate tropinol moiety. These features and presteady state kinetics analysis indicate that an induced fit mechanism operates in this reaction. Mutations of residues located in this stabilized loop do not interfere with direct contacts to the organophosphate ligand but can interrogate second shell interactions, because the H3 loop has a conformation adjusted for binding. Kinetic and thermodynamic parameters along with computational docking support the active site model, including plasticity and simple catalytic components. Although relatively uncomplicated, this catalytic machinery displays both stereo- and chemical selectivity. The organophosphate pesticide paraoxon is hydrolyzed by covalent catalysis with rate-limiting dephosphorylation. This reactibody is, therefore, a kinetically selected protein template that has enzyme-like catalytic attributes. PMID:21896761

  12. Empirical Assessment of Spatial Prediction Methods for Location Cost Adjustment Factors

    PubMed Central

    Migliaccio, Giovanni C.; Guindani, Michele; D'Incognito, Maria; Zhang, Linlin

    2014-01-01

    In the feasibility stage, the correct prediction of construction costs ensures that budget requirements are met from the start of a project's lifecycle. A very common approach for performing quick-order-of-magnitude estimates is based on using Location Cost Adjustment Factors (LCAFs) that compute historically based costs by project location. Nowadays, numerous LCAF datasets are commercially available in North America, but, obviously, they do not include all locations. Hence, LCAFs for un-sampled locations need to be inferred through spatial interpolation or prediction methods. Currently, practitioners tend to select the value for a location using only one variable, namely the nearest linear-distance between two sites. However, construction costs could be affected by socio-economic variables as suggested by macroeconomic theories. Using a commonly used set of LCAFs, the City Cost Indexes (CCI) by RSMeans, and the socio-economic variables included in the ESRI Community Sourcebook, this article provides several contributions to the body of knowledge. First, the accuracy of various spatial prediction methods in estimating LCAF values for un-sampled locations was evaluated and assessed in respect to spatial interpolation methods. Two Regression-based prediction models were selected, a Global Regression Analysis and a Geographically-weighted regression analysis (GWR). Once these models were compared against interpolation methods, the results showed that GWR is the most appropriate way to model CCI as a function of multiple covariates. The outcome of GWR, for each covariate, was studied for all the 48 states in the contiguous US. As a direct consequence of spatial non-stationarity, it was possible to discuss the influence of each single covariate differently from state to state. In addition, the article includes a first attempt to determine if the observed variability in cost index values could be, at least partially explained by independent socio-economic variables. PMID:25018582

  13. DISCOVERING THE MISSING 2.2 < z < 3 QUASARS BY COMBINING OPTICAL VARIABILITY AND OPTICAL/NEAR-INFRARED COLORS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu Xuebing; Wang Ran; Bian Fuyan

    2011-09-15

    The identification of quasars in the redshift range 2.2 < z < 3 is known to be very inefficient because the optical colors of such quasars are indistinguishable from those of stars. Recent studies have proposed using optical variability or near-infrared (near-IR) colors to improve the identification of the missing quasars in this redshift range. Here we present a case study combining both methods. We select a sample of 70 quasar candidates from variables in Sloan Digital Sky Survey (SDSS) Stripe 82, which are non-ultraviolet excess sources and have UKIDSS near-IR public data. They are clearly separated into two partsmore » on the Y - K/g - z color-color diagram, and 59 of them meet or lie close to a newly proposed Y - K/g - z selection criterion for z < 4 quasars. Of these 59 sources, 44 were previously identified as quasars in SDSS DR7, and 35 of them are quasars at 2.2 < z < 3. We present spectroscopic observations of 14 of 15 remaining quasar candidates using the Bok 2.3 m telescope and the MMT 6.5 m telescope, and successfully identify all of them as new quasars at z = 2.36-2.88. We also apply this method to a sample of 643 variable quasar candidates with SDSS-UKIDSS nine-band photometric data selected from 1875 new quasar candidates in SDSS Stripe 82 given by Butler and Bloom based on the time-series selections, and find that 188 of them are probably new quasars with photometric redshifts at 2.2 < z < 3. Our results indicate that the combination of optical variability and optical/near-IR colors is probably the most efficient way to find 2.2 < z < 3 quasars and is very helpful for constructing a complete quasar sample. We discuss its implications for ongoing and upcoming large optical and near-IR sky surveys.« less

  14. Evidence that divergent selection shapes a developmental cline in a forest tree species complex.

    PubMed

    Costa E Silva, João; Harrison, Peter A; Wiltshire, Robert; Potts, Brad M

    2018-05-19

    Evolutionary change in developmental trajectories (heterochrony) is a major mechanism of adaptation in plants and animals. However, there are few detailed studies of the variation in the timing of developmental events among wild populations. We here aimed to identify the climatic drivers and measure selection shaping a genetic-based developmental cline among populations of an endemic tree species complex on the island of Tasmania. Seed lots from 38 native provenances encompassing the clinal transition from the heteroblastic Eucalyptus tenuiramis to the homoblastic Eucalyptus risdonii were grown in a common-garden field trial in southern Tasmania for 20 years. We used 27 climatic variables to model the provenance variation in vegetative juvenility as assessed at age 5 years. A phenotypic selection analysis was used to measure the fitness consequences of variation in vegetative juvenility based on its impact on the survival and reproductive capacity of survivors at age 20 years. Significant provenance divergence in vegetative juvenility was shown to be associated with home-site aridity, with the retention of juvenile foliage increasing with increasing aridity. Our results indicated that climate change may lead to different directions of selection across the geographic range of the complex, and in our mesic field site demonstrated that total directional selection within phenotypically variable provenances was in favour of reduced vegetative juvenility. We provide evidence that heteroblasty is adaptive and argue that, in assessing the impacts of rapid global change, developmental plasticity and heterochrony are underappreciated processes which can contribute to populations of long-lived organisms, such as trees, persisting and ultimately adapting to environmental change.

  15. Bayesian classification for the selection of in vitro human embryos using morphological and clinical data.

    PubMed

    Morales, Dinora Araceli; Bengoetxea, Endika; Larrañaga, Pedro; García, Miguel; Franco, Yosu; Fresnada, Mónica; Merino, Marisa

    2008-05-01

    In vitro fertilization (IVF) is a medically assisted reproduction technique that enables infertile couples to achieve successful pregnancy. Given the uncertainty of the treatment, we propose an intelligent decision support system based on supervised classification by Bayesian classifiers to aid to the selection of the most promising embryos that will form the batch to be transferred to the woman's uterus. The aim of the supervised classification system is to improve overall success rate of each IVF treatment in which a batch of embryos is transferred each time, where the success is achieved when implantation (i.e. pregnancy) is obtained. Due to ethical reasons, different legislative restrictions apply in every country on this technique. In Spain, legislation allows a maximum of three embryos to form each transfer batch. As a result, clinicians prefer to select the embryos by non-invasive embryo examination based on simple methods and observation focused on morphology and dynamics of embryo development after fertilization. This paper proposes the application of Bayesian classifiers to this embryo selection problem in order to provide a decision support system that allows a more accurate selection than with the actual procedures which fully rely on the expertise and experience of embryologists. For this, we propose to take into consideration a reduced subset of feature variables related to embryo morphology and clinical data of patients, and from this data to induce Bayesian classification models. Results obtained applying a filter technique to choose the subset of variables, and the performance of Bayesian classifiers using them, are presented.

  16. [Non-destructive detection research for hollow heart of potato based on semi-transmission hyperspectral imaging and SVM].

    PubMed

    Huang, Tao; Li, Xiao-yu; Xu, Meng-ling; Jin, Rui; Ku, Jing; Xu, Sen-miao; Wu, Zhen-zhong

    2015-01-01

    The quality of potato is directly related to their edible value and industrial value. Hollow heart of potato, as a physiological disease occurred inside the tuber, is difficult to be detected. This paper put forward a non-destructive detection method by using semi-transmission hyperspectral imaging with support vector machine (SVM) to detect hollow heart of potato. Compared to reflection and transmission hyperspectral image, semi-transmission hyperspectral image can get clearer image which contains the internal quality information of agricultural products. In this study, 224 potato samples (149 normal samples and 75 hollow samples) were selected as the research object, and semi-transmission hyperspectral image acquisition system was constructed to acquire the hyperspectral images (390-1 040 nn) of the potato samples, and then the average spectrum of region of interest were extracted for spectral characteristics analysis. Normalize was used to preprocess the original spectrum, and prediction model were developed based on SVM using all wave bands, the accurate recognition rate of test set is only 87. 5%. In order to simplify the model competitive.adaptive reweighed sampling algorithm (CARS) and successive projection algorithm (SPA) were utilized to select important variables from the all 520 spectral variables and 8 variables were selected (454, 601, 639, 664, 748, 827, 874 and 936 nm). 94. 64% of the accurate recognition rate of test set was obtained by using the 8 variables to develop SVM model. Parameter optimization algorithms, including artificial fish swarm algorithm (AFSA), genetic algorithm (GA) and grid search algorithm, were used to optimize the SVM model parameters: penalty parameter c and kernel parameter g. After comparative analysis, AFSA, a new bionic optimization algorithm based on the foraging behavior of fish swarm, was proved to get the optimal model parameter (c=10. 659 1, g=0. 349 7), and the recognition accuracy of 10% were obtained for the AFSA-SVM model. The results indicate that combining the semi-transmission hyperspectral imaging technology with CARS-SPA and AFSA-SVM can accurately detect hollow heart of potato, and also provide technical support for rapid non-destructive detecting of hollow heart of potato.

  17. Optimizing selection of training and auxiliary data for operational land cover classification for the LCMAP initiative

    NASA Astrophysics Data System (ADS)

    Zhu, Zhe; Gallant, Alisa L.; Woodcock, Curtis E.; Pengra, Bruce; Olofsson, Pontus; Loveland, Thomas R.; Jin, Suming; Dahal, Devendra; Yang, Limin; Auch, Roger F.

    2016-12-01

    The U.S. Geological Survey's Land Change Monitoring, Assessment, and Projection (LCMAP) initiative is a new end-to-end capability to continuously track and characterize changes in land cover, use, and condition to better support research and applications relevant to resource management and environmental change. Among the LCMAP product suite are annual land cover maps that will be available to the public. This paper describes an approach to optimize the selection of training and auxiliary data for deriving the thematic land cover maps based on all available clear observations from Landsats 4-8. Training data were selected from map products of the U.S. Geological Survey's Land Cover Trends project. The Random Forest classifier was applied for different classification scenarios based on the Continuous Change Detection and Classification (CCDC) algorithm. We found that extracting training data proportionally to the occurrence of land cover classes was superior to an equal distribution of training data per class, and suggest using a total of 20,000 training pixels to classify an area about the size of a Landsat scene. The problem of unbalanced training data was alleviated by extracting a minimum of 600 training pixels and a maximum of 8000 training pixels per class. We additionally explored removing outliers contained within the training data based on their spectral and spatial criteria, but observed no significant improvement in classification results. We also tested the importance of different types of auxiliary data that were available for the conterminous United States, including: (a) five variables used by the National Land Cover Database, (b) three variables from the cloud screening "Function of mask" (Fmask) statistics, and (c) two variables from the change detection results of CCDC. We found that auxiliary variables such as a Digital Elevation Model and its derivatives (aspect, position index, and slope), potential wetland index, water probability, snow probability, and cloud probability improved the accuracy of land cover classification. Compared to the original strategy of the CCDC algorithm (500 pixels per class), the use of the optimal strategy improved the classification accuracies substantially (15-percentage point increase in overall accuracy and 4-percentage point increase in minimum accuracy).

  18. Mutual information-based feature selection for radiomics

    NASA Astrophysics Data System (ADS)

    Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine

    2016-03-01

    Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.

  19. On using surface-source downhole-receiver logging to determine seismic slownesses

    USGS Publications Warehouse

    Boore, D.M.; Thompson, E.M.

    2007-01-01

    We present a method to solve for slowness models from surface-source downhole-receiver seismic travel-times. The method estimates the slownesses in a single inversion of the travel-times from all receiver depths and accounts for refractions at layer boundaries. The number and location of layer interfaces in the model can be selected based on lithologic changes or linear trends in the travel-time data. The interfaces based on linear trends in the data can be picked manually or by an automated algorithm. We illustrate the method with example sites for which geologic descriptions of the subsurface materials and independent slowness measurements are available. At each site we present slowness models that result from different interpretations of the data. The examples were carefully selected to address the reliability of interface-selection and the ability of the inversion to identify thin layers, large slowness contrasts, and slowness gradients. Additionally, we compare the models in terms of ground-motion amplification. These plots illustrate the sensitivity of site amplifications to the uncertainties in the slowness model. We show that one-dimensional site amplifications are insensitive to thin layers in the slowness models; although slowness is variable over short ranges of depth, this variability has little affect on ground-motion amplification at frequencies up to 5 Hz.

  20. TIMSS 2011 Student and Teacher Predictors for Mathematics Achievement Explored and Identified via Elastic Net.

    PubMed

    Yoo, Jin Eun

    2018-01-01

    A substantial body of research has been conducted on variables relating to students' mathematics achievement with TIMSS. However, most studies have employed conventional statistical methods, and have focused on selected few indicators instead of utilizing hundreds of variables TIMSS provides. This study aimed to find a prediction model for students' mathematics achievement using as many TIMSS student and teacher variables as possible. Elastic net, the selected machine learning technique in this study, takes advantage of both LASSO and ridge in terms of variable selection and multicollinearity, respectively. A logistic regression model was also employed to predict TIMSS 2011 Korean 4th graders' mathematics achievement. Ten-fold cross-validation with mean squared error was employed to determine the elastic net regularization parameter. Among 162 TIMSS variables explored, 12 student and 5 teacher variables were selected in the elastic net model, and the prediction accuracy, sensitivity, and specificity were 76.06, 70.23, and 80.34%, respectively. This study showed that the elastic net method can be successfully applied to educational large-scale data by selecting a subset of variables with reasonable prediction accuracy and finding new variables to predict students' mathematics achievement. Newly found variables via machine learning can shed light on the existing theories from a totally different perspective, which in turn propagates creation of a new theory or complement of existing ones. This study also examined the current scale development convention from a machine learning perspective.

  1. TIMSS 2011 Student and Teacher Predictors for Mathematics Achievement Explored and Identified via Elastic Net

    PubMed Central

    Yoo, Jin Eun

    2018-01-01

    A substantial body of research has been conducted on variables relating to students' mathematics achievement with TIMSS. However, most studies have employed conventional statistical methods, and have focused on selected few indicators instead of utilizing hundreds of variables TIMSS provides. This study aimed to find a prediction model for students' mathematics achievement using as many TIMSS student and teacher variables as possible. Elastic net, the selected machine learning technique in this study, takes advantage of both LASSO and ridge in terms of variable selection and multicollinearity, respectively. A logistic regression model was also employed to predict TIMSS 2011 Korean 4th graders' mathematics achievement. Ten-fold cross-validation with mean squared error was employed to determine the elastic net regularization parameter. Among 162 TIMSS variables explored, 12 student and 5 teacher variables were selected in the elastic net model, and the prediction accuracy, sensitivity, and specificity were 76.06, 70.23, and 80.34%, respectively. This study showed that the elastic net method can be successfully applied to educational large-scale data by selecting a subset of variables with reasonable prediction accuracy and finding new variables to predict students' mathematics achievement. Newly found variables via machine learning can shed light on the existing theories from a totally different perspective, which in turn propagates creation of a new theory or complement of existing ones. This study also examined the current scale development convention from a machine learning perspective. PMID:29599736

  2. [Rapid assessment of critical quality attributes of Chinese materia medica (II): strategy of NIR assignment].

    PubMed

    Pei, Yan-Ling; Wu, Zhi-Sheng; Shi, Xin-Yuan; Zhou, Lu-Wei; Qiao, Yan-Jiang

    2014-09-01

    The present paper firstly reviewed the research progress and main methods of NIR spectral assignment coupled with our research results. Principal component analysis was focused on characteristic signal extraction to reflect spectral differences. Partial least squares method was concerned with variable selection to discover characteristic absorption band. Two-dimensional correlation spectroscopy was mainly adopted for spectral assignment. Autocorrelation peaks were obtained from spectral changes, which were disturbed by external factors, such as concentration, temperature and pressure. Density functional theory was used to calculate energy from substance structure to establish the relationship between molecular energy and spectra change. Based on the above reviewed method, taking a NIR spectral assignment of chlorogenic acid as example, a reliable spectral assignment for critical quality attributes of Chinese materia medica (CMM) was established using deuterium technology and spectral variable selection. The result demonstrated the assignment consistency according to spectral features of different concentrations of chlorogenic acid and variable selection region of online NIR model in extract process. Although spectral assignment was initial using an active pharmaceutical ingredient, it is meaningful to look forward to the futurity of the complex components in CMM. Therefore, it provided methodology for NIR spectral assignment of critical quality attributes in CMM.

  3. An international delphi survey for the definition of the variables for the development of new classification criteria for periodic fever aphtous stomatitis pharingitis cervical adenitis (PFAPA).

    PubMed

    Vanoni, Federica; Federici, Silvia; Antón, Jordi; Barron, Karyl S; Brogan, Paul; De Benedetti, Fabrizio; Dedeoglu, Fatma; Demirkaya, Erkan; Hentgen, Veronique; Kallinich, Tilmann; Laxer, Ronald; Russo, Ricardo; Toplak, Natasa; Uziel, Yosef; Martini, Alberto; Ruperto, Nicolino; Gattorno, Marco; Hofer, Michael

    2018-04-18

    Diagnosis of Periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis (PFAPA) is currently based on a set of criteria proposed in 1999 modified from Marshall's criteria. Nevertheless no validated evidence based set of classification criteria for PFAPA has been established so far. The aim of this study was to identify candidate classification criteria PFAPA syndrome using international consensus formation through a Delphi questionnaire survey. A first open-ended questionnaire was sent to adult and pediatric clinicians/researchers, asking to identify the variables thought most likely to be helpful and relevant for the diagnosis of PFAPA. In a second survey, respondents were asked to select, from the list of variables coming from the first survey, the 10 features that they felt were most important, and to rank them in descending order from most important to least important. The response rate to the first and second Delphi was respectively 109/124 (88%) and 141/162 (87%). The number of participants that completed the first and second Delphi was 69/124 (56%) and 110/162 (68%). From the first Delphi we obtained a list of 92 variables, of which 62 were selected in the second Delphi. Variables reaching the top five position of the rank were regular periodicity, aphthous stomatitis, response to corticosteroids, cervical adenitis, and well-being between flares. Our process led to identification of features that were felt to be the most important as candidate classification criteria for PFAPA by a large sample of international rheumatologists. The performance of these items will be tested further in the next phase of the study, through analysis of real patient data.

  4. Biomarkers of Progression after HIV Acute/Early Infection: Nothing Compares to CD4+ T-cell Count?

    PubMed Central

    Ghiglione, Yanina; Hormanstorfer, Macarena; Coloccini, Romina; Salido, Jimena; Trifone, César; Ruiz, María Julia; Falivene, Juliana; Caruso, María Paula; Figueroa, María Inés; Salomón, Horacio; Giavedoni, Luis D.; Pando, María de los Ángeles; Gherardi, María Magdalena; Rabinovich, Roberto Daniel; Sued, Omar

    2018-01-01

    Progression of HIV infection is variable among individuals, and definition disease progression biomarkers is still needed. Here, we aimed to categorize the predictive potential of several variables using feature selection methods and decision trees. A total of seventy-five treatment-naïve subjects were enrolled during acute/early HIV infection. CD4+ T-cell counts (CD4TC) and viral load (VL) levels were determined at enrollment and for one year. Immune activation, HIV-specific immune response, Human Leukocyte Antigen (HLA) and C-C chemokine receptor type 5 (CCR5) genotypes, and plasma levels of 39 cytokines were determined. Data were analyzed by machine learning and non-parametric methods. Variable hierarchization was performed by Weka correlation-based feature selection and J48 decision tree. Plasma interleukin (IL)-10, interferon gamma-induced protein (IP)-10, soluble IL-2 receptor alpha (sIL-2Rα) and tumor necrosis factor alpha (TNF-α) levels correlated directly with baseline VL, whereas IL-2, TNF-α, fibroblast growth factor (FGF)-2 and macrophage inflammatory protein (MIP)-1β correlated directly with CD4+ T-cell activation (p < 0.05). However, none of these cytokines had good predictive values to distinguish “progressors” from “non-progressors”. Similarly, immune activation, HIV-specific immune responses and HLA/CCR5 genotypes had low discrimination power. Baseline CD4TC was the most potent discerning variable with a cut-off of 438 cells/μL (accuracy = 0.93, κ-Cohen = 0.85). Limited discerning power of the other factors might be related to frequency, variability and/or sampling time. Future studies based on decision trees to identify biomarkers of post-treatment control are warrantied. PMID:29342870

  5. Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

    PubMed

    van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

    2017-01-01

    In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.

  6. Biomarkers of Progression after HIV Acute/Early Infection: Nothing Compares to CD4⁺ T-cell Count?

    PubMed

    Turk, Gabriela; Ghiglione, Yanina; Hormanstorfer, Macarena; Laufer, Natalia; Coloccini, Romina; Salido, Jimena; Trifone, César; Ruiz, María Julia; Falivene, Juliana; Holgado, María Pía; Caruso, María Paula; Figueroa, María Inés; Salomón, Horacio; Giavedoni, Luis D; Pando, María de Los Ángeles; Gherardi, María Magdalena; Rabinovich, Roberto Daniel; Pury, Pedro A; Sued, Omar

    2018-01-13

    Progression of HIV infection is variable among individuals, and definition disease progression biomarkers is still needed. Here, we aimed to categorize the predictive potential of several variables using feature selection methods and decision trees. A total of seventy-five treatment-naïve subjects were enrolled during acute/early HIV infection. CD4⁺ T-cell counts (CD4TC) and viral load (VL) levels were determined at enrollment and for one year. Immune activation, HIV-specific immune response, Human Leukocyte Antigen (HLA) and C-C chemokine receptor type 5 (CCR5) genotypes, and plasma levels of 39 cytokines were determined. Data were analyzed by machine learning and non-parametric methods. Variable hierarchization was performed by Weka correlation-based feature selection and J48 decision tree. Plasma interleukin (IL)-10, interferon gamma-induced protein (IP)-10, soluble IL-2 receptor alpha (sIL-2Rα) and tumor necrosis factor alpha (TNF-α) levels correlated directly with baseline VL, whereas IL-2, TNF-α, fibroblast growth factor (FGF)-2 and macrophage inflammatory protein (MIP)-1β correlated directly with CD4⁺ T-cell activation ( p < 0.05). However, none of these cytokines had good predictive values to distinguish "progressors" from "non-progressors". Similarly, immune activation, HIV-specific immune responses and HLA/CCR5 genotypes had low discrimination power. Baseline CD4TC was the most potent discerning variable with a cut-off of 438 cells/μL (accuracy = 0.93, κ-Cohen = 0.85). Limited discerning power of the other factors might be related to frequency, variability and/or sampling time. Future studies based on decision trees to identify biomarkers of post-treatment control are warrantied.

  7. Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies

    PubMed Central

    Manitz, Juliane; Burger, Patricia; Amos, Christopher I.; Chang-Claude, Jenny; Wichmann, Heinz-Erich; Kneib, Thomas; Bickeböller, Heike

    2017-01-01

    The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility. PMID:28785300

  8. Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies.

    PubMed

    Friedrichs, Stefanie; Manitz, Juliane; Burger, Patricia; Amos, Christopher I; Risch, Angela; Chang-Claude, Jenny; Wichmann, Heinz-Erich; Kneib, Thomas; Bickeböller, Heike; Hofner, Benjamin

    2017-01-01

    The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility.

  9. Infrastructure features outperform environmental variables explaining rabbit abundance around motorways.

    PubMed

    Planillo, Aimara; Malo, Juan E

    2018-01-01

    Human disturbance is widespread across landscapes in the form of roads that alter wildlife populations. Knowing which road features are responsible for the species response and their relevance in comparison with environmental variables will provide useful information for effective conservation measures. We sampled relative abundance of European rabbits, a very widespread species, in motorway verges at regional scale, in an area with large variability in environmental and infrastructure conditions. Environmental variables included vegetation structure, plant productivity, distance to water sources, and altitude. Infrastructure characteristics were the type of vegetation in verges, verge width, traffic volume, and the presence of embankments. We performed a variance partitioning analysis to determine the relative importance of two sets of variables on rabbit abundance. Additionally, we identified the most important variables and their effects model averaging after model selection by AICc on hypothesis-based models. As a group, infrastructure features explained four times more variability in rabbit abundance than environmental variables, being the effects of the former critical in motorway stretches located in altered landscapes with no available habitat for rabbits, such as agricultural fields. Model selection and Akaike weights showed that verge width and traffic volume are the most important variables explaining rabbit abundance index, with positive and negative effects, respectively. In the light of these results, the response of species to the infrastructure can be modulated through the modification of motorway features, being some of them manageable in the design phase. The identification of such features leads to suggestions for improvement through low-cost corrective measures and conservation plans. As a general indication, keeping motorway verges less than 10 m wide will prevent high densities of rabbits and avoid the unwanted effects that rabbit populations can generate in some areas.

  10. Cholinergic enhancement reduces functional connectivity and BOLD variability in visual extrastriate cortex during selective attention.

    PubMed

    Ricciardi, Emiliano; Handjaras, Giacomo; Bernardi, Giulio; Pietrini, Pietro; Furey, Maura L

    2013-01-01

    Enhancing cholinergic function improves performance on various cognitive tasks and alters neural responses in task specific brain regions. We have hypothesized that the changes in neural activity observed during increased cholinergic function reflect an increase in neural efficiency that leads to improved task performance. The current study tested this hypothesis by assessing neural efficiency based on cholinergically-mediated effects on regional brain connectivity and BOLD signal variability. Nine subjects participated in a double-blind, placebo-controlled crossover fMRI study. Following an infusion of physostigmine (1 mg/h) or placebo, echo-planar imaging (EPI) was conducted as participants performed a selective attention task. During the task, two images comprised of superimposed pictures of faces and houses were presented. Subjects were instructed periodically to shift their attention from one stimulus component to the other and to perform a matching task using hand held response buttons. A control condition included phase-scrambled images of superimposed faces and houses that were presented in the same temporal and spatial manner as the attention task; participants were instructed to perform a matching task. Cholinergic enhancement improved performance during the selective attention task, with no change during the control task. Functional connectivity analyses showed that the strength of connectivity between ventral visual processing areas and task-related occipital, parietal and prefrontal regions reduced significantly during cholinergic enhancement, exclusively during the selective attention task. Physostigmine administration also reduced BOLD signal temporal variability relative to placebo throughout temporal and occipital visual processing areas, again during the selective attention task only. Together with the observed behavioral improvement, the decreases in connectivity strength throughout task-relevant regions and BOLD variability within stimulus processing regions support the hypothesis that cholinergic augmentation results in enhanced neural efficiency. This article is part of a Special Issue entitled 'Cognitive Enhancers'. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. Exhaustive Search for Sparse Variable Selection in Linear Regression

    NASA Astrophysics Data System (ADS)

    Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato

    2018-04-01

    We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.

  12. Embroidered Electromyography: A Systematic Design Guide.

    PubMed

    Shafti, Ali; Ribas Manero, Roger B; Borg, Amanda M; Althoefer, Kaspar; Howard, Matthew J

    2017-09-01

    Muscle activity monitoring or electromyography (EMG) is a useful tool. However, EMG is typically invasive, expensive and difficult to use for untrained users. A possible solution is textile-based surface EMG (sEMG) integrated into clothing as a wearable device. This is, however, challenging due to 1) uncertainties in the electrical properties of conductive threads used for electrodes, 2) imprecise fabrication technologies (e.g., embroidery, sewing), and 3) lack of standardization in design variable selection. This paper, for the first time, provides a design guide for such sensors by performing a thorough examination of the effect of design variables on sEMG signal quality. Results show that imprecisions in digital embroidery lead to a trade-off between low electrode impedance and high manufacturing consistency. An optimum set of variables for this trade-off is identified and tested with sEMG during a variable force isometric grip exercise with n = 12 participants, compared with conventional gel-based electrodes. Results show that thread-based electrodes provide a similar level of sensitivity to force variation as gel-based electrodes with about 90% correlation to expected linear behavior. As proof of concept, jogging leggings with integrated embroidered sEMG are made and successfully tested for detection of muscle fatigue while running on different surfaces.

  13. Mining Feature of Data Fusion in the Classification of Beer Flavor Information Using E-Tongue and E-Nose

    PubMed Central

    Men, Hong; Shi, Yan; Fu, Songlin; Jiao, Yanan; Qiao, Yu; Liu, Jingjing

    2017-01-01

    Multi-sensor data fusion can provide more comprehensive and more accurate analysis results. However, it also brings some redundant information, which is an important issue with respect to finding a feature-mining method for intuitive and efficient analysis. This paper demonstrates a feature-mining method based on variable accumulation to find the best expression form and variables’ behavior affecting beer flavor. First, e-tongue and e-nose were used to gather the taste and olfactory information of beer, respectively. Second, principal component analysis (PCA), genetic algorithm-partial least squares (GA-PLS), and variable importance of projection (VIP) scores were applied to select feature variables of the original fusion set. Finally, the classification models based on support vector machine (SVM), random forests (RF), and extreme learning machine (ELM) were established to evaluate the efficiency of the feature-mining method. The result shows that the feature-mining method based on variable accumulation obtains the main feature affecting beer flavor information, and the best classification performance for the SVM, RF, and ELM models with 96.67%, 94.44%, and 98.33% prediction accuracy, respectively. PMID:28753917

  14. Analysis of urban residential environments using color infrared aerial photography: An examination of socioeconomic variables and physical characteristics of selected areas in the Los Angeles basin, with addendum: An application of the concepts of the Los Angeles residential environment study to the Ontario-Upland region of California

    NASA Technical Reports Server (NTRS)

    Mullens, R. H., Jr.; Senger, L. W.

    1969-01-01

    Aerial photographs taken with color infrared film were used to differentiate various types of residential areas in the Los Angeles basin, using characteristics of the physical environment which vary from one type of residential area to another. Residential areas of varying quality were classified based on these characteristics. Features of the physical environment, identifiable on CIR aerial photography were examined to determine which of these are the best indicators of quality of residential areas or social areas, as determined by the socioeconomic characteristics of the inhabitants of the selected areas. Association between several physical features and the socioeconomic variables was found to exist.

  15. Logistic-based patient grouping for multi-disciplinary treatment.

    PubMed

    Maruşter, Laura; Weijters, Ton; de Vries, Geerhard; van den Bosch, Antal; Daelemans, Walter

    2002-01-01

    Present-day healthcare witnesses a growing demand for coordination of patient care. Coordination is needed especially in those cases in which hospitals have structured healthcare into specialty-oriented units, while a substantial portion of patient care is not limited to single units. From a logistic point of view, this multi-disciplinary patient care creates a tension between controlling the hospital's units, and the need for a control of the patient flow between units. A possible solution is the creation of new units in which different specialties work together for specific groups of patients. A first step in this solution is to identify the salient patient groups in need of multi-disciplinary care. Grouping techniques seem to offer a solution. However, most grouping approaches in medicine are driven by a search for pathophysiological homogeneity. In this paper, we present an alternative logistic-driven grouping approach. The starting point of our approach is a database with medical cases for 3,603 patients with peripheral arterial vascular (PAV) diseases. For these medical cases, six basic logistic variables (such as the number of visits to different specialist) are selected. Using these logistic variables, clustering techniques are used to group the medical cases in logistically homogeneous groups. In our approach, the quality of the resulting grouping is not measured by statistical significance, but by (i) the usefulness of the grouping for the creation of new multi-disciplinary units; (ii) how well patients can be selected for treatment in the new units. Given a priori knowledge of a patient (e.g. age, diagnosis), machine learning techniques are employed to induce rules that can be used for the selection of the patients eligible for treatment in the new units. In the paper, we describe the results of the above-proposed methodology for patients with PAV diseases. Two groupings and the accompanied classification rule sets are presented. One grouping is based on all the logistic variables, and another grouping is based on two latent factors found by applying factor analysis. On the basis of the experimental results, we can conclude that it is possible to search for medical logistic homogenous groups (i) that can be characterized by rules based on the aggregated logistic variables; (ii) for which we can formulate rules to predict to which cluster new patients belong.

  16. The role of density-dependent and -independent processes in spawning habitat selection by salmon in an Arctic riverscape.

    PubMed

    Huntsman, Brock M; Falke, Jeffrey A; Savereide, James W; Bennett, Katrina E

    2017-01-01

    Density-dependent (DD) and density-independent (DI) habitat selection is strongly linked to a species' evolutionary history. Determining the relative importance of each is necessary because declining populations are not always the result of altered DI mechanisms but can often be the result of DD via a reduced carrying capacity. We developed spatially and temporally explicit models throughout the Chena River, Alaska to predict important DI mechanisms that influence Chinook salmon spawning success. We used resource-selection functions to predict suitable spawning habitat based on geomorphic characteristics, a semi-distributed water-and-energy balance hydrologic model to generate stream flow metrics, and modeled stream temperature as a function of climatic variables. Spawner counts were predicted throughout the core and periphery spawning sections of the Chena River from escapement estimates (DD) and DI variables. Additionally, we used isodar analysis to identify whether spawners actively defend spawning habitat or follow an ideal free distribution along the riverscape. Aerial counts were best explained by escapement and reference to the core or periphery, while no models with DI variables were supported in the candidate set. Furthermore, isodar plots indicated habitat selection was best explained by ideal free distributions, although there was strong evidence for active defense of core spawning habitat. Our results are surprising, given salmon commonly defend spawning resources, and are likely due to competition occurring at finer spatial scales than addressed in this study.

  17. The role of density-dependent and –independent processes in spawning habitat selection by salmon in an Arctic riverscape

    DOE PAGES

    Huntsman, Brock M.; Falke, Jeffrey A.; Savereide, James W.; ...

    2017-05-22

    Density-dependent (DD) and density-independent (DI) habitat selection is strongly linked to a species’ evolutionary history. Determining the relative importance of each is necessary because declining populations are not always the result of altered DI mechanisms but can often be the result of DD via a reduced carrying capacity. Here, we developed spatially and temporally explicit models throughout the Chena River, Alaska to predict important DI mechanisms that influence Chinook salmon spawning success. We used resource-selection functions to predict suitable spawning habitat based on geomorphic characteristics, a semi-distributed water-and-energy balance hydrologic model to generate stream flow metrics, and modeled stream temperaturemore » as a function of climatic variables. Spawner counts were predicted throughout the core and periphery spawning sections of the Chena River from escapement estimates (DD) and DI variables. In addition, we used isodar analysis to identify whether spawners actively defend spawning habitat or follow an ideal free distribution along the riverscape. Aerial counts were best explained by escapement and reference to the core or periphery, while no models with DI variables were supported in the candidate set. Moreover, isodar plots indicated habitat selection was best explained by ideal free distributions, although there was strong evidence for active defense of core spawning habitat. These results are surprising, given salmon commonly defend spawning resources, and are likely due to competition occurring at finer spatial scales than addressed in this study.« less

  18. Path analysis and multi-criteria decision making: an approach for multivariate model selection and analysis in health.

    PubMed

    Vasconcelos, A G; Almeida, R M; Nobre, F F

    2001-08-01

    This paper introduces an approach that includes non-quantitative factors for the selection and assessment of multivariate complex models in health. A goodness-of-fit based methodology combined with fuzzy multi-criteria decision-making approach is proposed for model selection. Models were obtained using the Path Analysis (PA) methodology in order to explain the interrelationship between health determinants and the post-neonatal component of infant mortality in 59 municipalities of Brazil in the year 1991. Socioeconomic and demographic factors were used as exogenous variables, and environmental, health service and agglomeration as endogenous variables. Five PA models were developed and accepted by statistical criteria of goodness-of fit. These models were then submitted to a group of experts, seeking to characterize their preferences, according to predefined criteria that tried to evaluate model relevance and plausibility. Fuzzy set techniques were used to rank the alternative models according to the number of times a model was superior to ("dominated") the others. The best-ranked model explained above 90% of the endogenous variables variation, and showed the favorable influences of income and education levels on post-neonatal mortality. It also showed the unfavorable effect on mortality of fast population growth, through precarious dwelling conditions and decreased access to sanitation. It was possible to aggregate expert opinions in model evaluation. The proposed procedure for model selection allowed the inclusion of subjective information in a clear and systematic manner.

  19. The role of density-dependent and –independent processes in spawning habitat selection by salmon in an Arctic riverscape

    USGS Publications Warehouse

    Huntsman, Brock M.; Falke, Jeffrey A.; Savereide, James W.; Bennett, Katrina E.

    2017-01-01

    Density-dependent (DD) and density-independent (DI) habitat selection is strongly linked to a species’ evolutionary history. Determining the relative importance of each is necessary because declining populations are not always the result of altered DI mechanisms but can often be the result of DD via a reduced carrying capacity. We developed spatially and temporally explicit models throughout the Chena River, Alaska to predict important DI mechanisms that influence Chinook salmon spawning success. We used resource-selection functions to predict suitable spawning habitat based on geomorphic characteristics, a semi-distributed water-and-energy balance hydrologic model to generate stream flow metrics, and modeled stream temperature as a function of climatic variables. Spawner counts were predicted throughout the core and periphery spawning sections of the Chena River from escapement estimates (DD) and DI variables. Additionally, we used isodar analysis to identify whether spawners actively defend spawning habitat or follow an ideal free distribution along the riverscape. Aerial counts were best explained by escapement and reference to the core or periphery, while no models with DI variables were supported in the candidate set. Furthermore, isodar plots indicated habitat selection was best explained by ideal free distributions, although there was strong evidence for active defense of core spawning habitat. Our results are surprising, given salmon commonly defend spawning resources, and are likely due to competition occurring at finer spatial scales than addressed in this study.

  20. Mental health courts and their selection processes: modeling variation for consistency.

    PubMed

    Wolff, Nancy; Fabrikant, Nicole; Belenko, Steven

    2011-10-01

    Admission into mental health courts is based on a complicated and often variable decision-making process that involves multiple parties representing different expertise and interests. To the extent that eligibility criteria of mental health courts are more suggestive than deterministic, selection bias can be expected. Very little research has focused on the selection processes underpinning problem-solving courts even though such processes may dominate the performance of these interventions. This article describes a qualitative study designed to deconstruct the selection and admission processes of mental health courts. In this article, we describe a multi-stage, complex process for screening and admitting clients into mental health courts. The selection filtering model that is described has three eligibility screening stages: initial, assessment, and evaluation. The results of this study suggest that clients selected by mental health courts are shaped by the formal and informal selection criteria, as well as by the local treatment system.

  1. Red-shouldered hawk nesting habitat preference in south Texas

    USGS Publications Warehouse

    Strobel, Bradley N.; Boal, Clint W.

    2010-01-01

    We examined nesting habitat preference by red-shouldered hawks Buteo lineatus using conditional logistic regression on characteristics measured at 27 occupied nest sites and 68 unused sites in 2005–2009 in south Texas. We measured vegetation characteristics of individual trees (nest trees and unused trees) and corresponding 0.04-ha plots. We evaluated the importance of tree and plot characteristics to nesting habitat selection by comparing a priori tree-specific and plot-specific models using Akaike's information criterion. Models with only plot variables carried 14% more weight than models with only center tree variables. The model-averaged odds ratios indicated red-shouldered hawks selected to nest in taller trees and in areas with higher average diameter at breast height than randomly available within the forest stand. Relative to randomly selected areas, each 1-m increase in nest tree height and 1-cm increase in the plot average diameter at breast height increased the probability of selection by 85% and 10%, respectively. Our results indicate that red-shouldered hawks select nesting habitat based on vegetation characteristics of individual trees as well as the 0.04-ha area surrounding the tree. Our results indicate forest management practices resulting in tall forest stands with large average diameter at breast height would benefit red-shouldered hawks in south Texas.

  2. Firstline treatment for chronic phase chronic myeloid leukemia patients should be based on a holistic approach.

    PubMed

    Breccia, Massimo; Alimena, Giuliana

    2015-02-01

    New selective and more potent drugs for the cure of chronic phase chronic myeloid leukemia patients are now available: physicians in some countries must decide the best option, selecting one of the drugs available. What the main prognostic factors are in order to make this selection remains a matter of discussion. Introducing a 'holistic approach' for the first time in chronic myeloid leukemia, as practiced in other diseases, and looking at the patient in a complete picture, considering several variables, such as comorbidities, age, concomitant drugs, lifestyle and patient expectations, may be of help to understand, patient by patient, the best therapeutic strategy.

  3. Similar Processes but Different Environmental Filters for Soil Bacterial and Fungal Community Composition Turnover on a Broad Spatial Scale

    PubMed Central

    Chemidlin Prévost-Bouré, Nicolas; Dequiedt, Samuel; Thioulouse, Jean; Lelièvre, Mélanie; Saby, Nicolas P. A.; Jolivet, Claudy; Arrouays, Dominique; Plassart, Pierre; Lemanceau, Philippe; Ranjard, Lionel

    2014-01-01

    Spatial scaling of microorganisms has been demonstrated over the last decade. However, the processes and environmental filters shaping soil microbial community structure on a broad spatial scale still need to be refined and ranked. Here, we compared bacterial and fungal community composition turnovers through a biogeographical approach on the same soil sampling design at a broad spatial scale (area range: 13300 to 31000 km2): i) to examine their spatial structuring; ii) to investigate the relative importance of environmental selection and spatial autocorrelation in determining their community composition turnover; and iii) to identify and rank the relevant environmental filters and scales involved in their spatial variations. Molecular fingerprinting of soil bacterial and fungal communities was performed on 413 soils from four French regions of contrasting environmental heterogeneity (Landes

  4. Similar processes but different environmental filters for soil bacterial and fungal community composition turnover on a broad spatial scale.

    PubMed

    Chemidlin Prévost-Bouré, Nicolas; Dequiedt, Samuel; Thioulouse, Jean; Lelièvre, Mélanie; Saby, Nicolas P A; Jolivet, Claudy; Arrouays, Dominique; Plassart, Pierre; Lemanceau, Philippe; Ranjard, Lionel

    2014-01-01

    Spatial scaling of microorganisms has been demonstrated over the last decade. However, the processes and environmental filters shaping soil microbial community structure on a broad spatial scale still need to be refined and ranked. Here, we compared bacterial and fungal community composition turnovers through a biogeographical approach on the same soil sampling design at a broad spatial scale (area range: 13300 to 31000 km2): i) to examine their spatial structuring; ii) to investigate the relative importance of environmental selection and spatial autocorrelation in determining their community composition turnover; and iii) to identify and rank the relevant environmental filters and scales involved in their spatial variations. Molecular fingerprinting of soil bacterial and fungal communities was performed on 413 soils from four French regions of contrasting environmental heterogeneity (Landes

  5. Selecting Populations for Non-Analogous Climate Conditions Using Universal Response Functions: The Case of Douglas-Fir in Central Europe

    PubMed Central

    Chakraborty, Debojyoti; Wang, Tongli; Andre, Konrad; Konnert, Monika; Lexer, Manfred J.; Matulla, Christoph; Schueler, Silvio

    2015-01-01

    Identifying populations within tree species potentially adapted to future climatic conditions is an important requirement for reforestation and assisted migration programmes. Such populations can be identified either by empirical response functions based on correlations of quantitative traits with climate variables or by climate envelope models that compare the climate of seed sources and potential growing areas. In the present study, we analyzed the intraspecific variation in climate growth response of Douglas-fir planted within the non-analogous climate conditions of Central and continental Europe. With data from 50 common garden trials, we developed Universal Response Functions (URF) for tree height and mean basal area and compared the growth performance of the selected best performing populations with that of populations identified through a climate envelope approach. Climate variables of the trial location were found to be stronger predictors of growth performance than climate variables of the population origin. Although the precipitation regime of the population sources varied strongly none of the precipitation related climate variables of population origin was found to be significant within the models. Overall, the URFs explained more than 88% of variation in growth performance. Populations identified by the URF models originate from western Cascades and coastal areas of Washington and Oregon and show significantly higher growth performance than populations identified by the climate envelope approach under both current and climate change scenarios. The URFs predict decreasing growth performance at low and middle elevations of the case study area, but increasing growth performance on high elevation sites. Our analysis suggests that population recommendations based on empirical approaches should be preferred and population selections by climate envelope models without considering climatic constrains of growth performance should be carefully appraised before transferring populations to planting locations with novel or dissimilar climate. PMID:26288363

  6. The variability of software scoring of the CDMAM phantom associated with a limited number of images

    NASA Astrophysics Data System (ADS)

    Yang, Chang-Ying J.; Van Metter, Richard

    2007-03-01

    Software scoring approaches provide an attractive alternative to human evaluation of CDMAM images from digital mammography systems, particularly for annual quality control testing as recommended by the European Protocol for the Quality Control of the Physical and Technical Aspects of Mammography Screening (EPQCM). Methods for correlating CDCOM-based results with human observer performance have been proposed. A common feature of all methods is the use of a small number (at most eight) of CDMAM images to evaluate the system. This study focuses on the potential variability in the estimated system performance that is associated with these methods. Sets of 36 CDMAM images were acquired under carefully controlled conditions from three different digital mammography systems. The threshold visibility thickness (TVT) for each disk diameter was determined using previously reported post-analysis methods from the CDCOM scorings for a randomly selected group of eight images for one measurement trial. This random selection process was repeated 3000 times to estimate the variability in the resulting TVT values for each disk diameter. The results from using different post-analysis methods, different random selection strategies and different digital systems were compared. Additional variability of the 0.1 mm disk diameter was explored by comparing the results from two different image data sets acquired under the same conditions from the same system. The magnitude and the type of error estimated for experimental data was explained through modeling. The modeled results also suggest a limitation in the current phantom design for the 0.1 mm diameter disks. Through modeling, it was also found that, because of the binomial statistic nature of the CDMAM test, the true variability of the test could be underestimated by the commonly used method of random re-sampling.

  7. Ecological and personal predictors of science achievement in an urban center

    NASA Astrophysics Data System (ADS)

    Guidubaldi, John Michael

    This study sought to examine selected personal and environmental factors that predict urban students' achievement test scores on the science subject area of the Ohio standardized test. Variables examined were in the general categories of teacher/classroom, student, and parent/home. It assumed that these clusters might add independent variance to a best predictor model, and that discovering relative strength of different predictors might lead to better selection of intervention strategies to improve student performance. This study was conducted in an urban school district and was comprised of teachers and students enrolled in ninth grade science in three of this district's high schools. Consenting teachers (9), students (196), and parents (196) received written surveys with questions designed to examine the predictive power of each variable cluster. Regression analyses were used to determine which factors best correlate with student scores and classroom science grades. Selected factors were then compiled into a best predictive model, predicting success on standardized science tests. Students t tests of gender and racial subgroups confirmed that there were racial differences in OPT scores, and both gender and racial differences in science grades. Additional examinations were therefore conducted for all 12 variables to determine whether gender and race had an impact on the strength of individual variable predictions and on the final best predictor model. Of the 15 original OPT and cluster variable hypotheses, eight showed significant positive relationships that occurred in the expected direction. However, when more broadly based end-of-the-year science class grade was used as a criterion, 13 of the 15 hypotheses showed significant relationships in the expected direction. With both criteria, significant gender and racial differences were observed in the strength of individual predictors and in the composition of best predictor models.

  8. An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

    PubMed Central

    Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055

  9. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.

    PubMed

    Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

  10. From Metaphors to Formalism: A Heuristic Approach to Holistic Assessments of Ecosystem Health.

    PubMed

    Fock, Heino O; Kraus, Gerd

    2016-01-01

    Environmental policies employ metaphoric objectives such as ecosystem health, resilience and sustainable provision of ecosystem services, which influence corresponding sustainability assessments by means of normative settings such as assumptions on system description, indicator selection, aggregation of information and target setting. A heuristic approach is developed for sustainability assessments to avoid ambiguity and applications to the EU Marine Strategy Framework Directive (MSFD) and OSPAR assessments are presented. For MSFD, nineteen different assessment procedures have been proposed, but at present no agreed assessment procedure is available. The heuristic assessment framework is a functional-holistic approach comprising an ex-ante/ex-post assessment framework with specifically defined normative and systemic dimensions (EAEPNS). The outer normative dimension defines the ex-ante/ex-post framework, of which the latter branch delivers one measure of ecosystem health based on indicators and the former allows to account for the multi-dimensional nature of sustainability (social, economic, ecological) in terms of modeling approaches. For MSFD, the ex-ante/ex-post framework replaces the current distinction between assessments based on pressure and state descriptors. The ex-ante and the ex-post branch each comprise an inner normative and a systemic dimension. The inner normative dimension in the ex-post branch considers additive utility models and likelihood functions to standardize variables normalized with Bayesian modeling. Likelihood functions allow precautionary target setting. The ex-post systemic dimension considers a posteriori indicator selection by means of analysis of indicator space to avoid redundant indicator information as opposed to a priori indicator selection in deconstructive-structural approaches. Indicator information is expressed in terms of ecosystem variability by means of multivariate analysis procedures. The application to the OSPAR assessment for the southern North Sea showed, that with the selected 36 indicators 48% of ecosystem variability could be explained. Tools for the ex-ante branch are risk and ecosystem models with the capability to analyze trade-offs, generating model output for each of the pressure chains to allow for a phasing-out of human pressures. The Bayesian measure of ecosystem health is sensitive to trends in environmental features, but robust to ecosystem variability in line with state space models. The combination of the ex-ante and ex-post branch is essential to evaluate ecosystem resilience and to adopt adaptive management. Based on requirements of the heuristic approach, three possible developments of this concept can be envisioned, i.e. a governance driven approach built upon participatory processes, a science driven functional-holistic approach requiring extensive monitoring to analyze complete ecosystem variability, and an approach with emphasis on ex-ante modeling and ex-post assessment of well-studied subsystems.

  11. From Metaphors to Formalism: A Heuristic Approach to Holistic Assessments of Ecosystem Health

    PubMed Central

    Kraus, Gerd

    2016-01-01

    Environmental policies employ metaphoric objectives such as ecosystem health, resilience and sustainable provision of ecosystem services, which influence corresponding sustainability assessments by means of normative settings such as assumptions on system description, indicator selection, aggregation of information and target setting. A heuristic approach is developed for sustainability assessments to avoid ambiguity and applications to the EU Marine Strategy Framework Directive (MSFD) and OSPAR assessments are presented. For MSFD, nineteen different assessment procedures have been proposed, but at present no agreed assessment procedure is available. The heuristic assessment framework is a functional-holistic approach comprising an ex-ante/ex-post assessment framework with specifically defined normative and systemic dimensions (EAEPNS). The outer normative dimension defines the ex-ante/ex-post framework, of which the latter branch delivers one measure of ecosystem health based on indicators and the former allows to account for the multi-dimensional nature of sustainability (social, economic, ecological) in terms of modeling approaches. For MSFD, the ex-ante/ex-post framework replaces the current distinction between assessments based on pressure and state descriptors. The ex-ante and the ex-post branch each comprise an inner normative and a systemic dimension. The inner normative dimension in the ex-post branch considers additive utility models and likelihood functions to standardize variables normalized with Bayesian modeling. Likelihood functions allow precautionary target setting. The ex-post systemic dimension considers a posteriori indicator selection by means of analysis of indicator space to avoid redundant indicator information as opposed to a priori indicator selection in deconstructive-structural approaches. Indicator information is expressed in terms of ecosystem variability by means of multivariate analysis procedures. The application to the OSPAR assessment for the southern North Sea showed, that with the selected 36 indicators 48% of ecosystem variability could be explained. Tools for the ex-ante branch are risk and ecosystem models with the capability to analyze trade-offs, generating model output for each of the pressure chains to allow for a phasing-out of human pressures. The Bayesian measure of ecosystem health is sensitive to trends in environmental features, but robust to ecosystem variability in line with state space models. The combination of the ex-ante and ex-post branch is essential to evaluate ecosystem resilience and to adopt adaptive management. Based on requirements of the heuristic approach, three possible developments of this concept can be envisioned, i.e. a governance driven approach built upon participatory processes, a science driven functional-holistic approach requiring extensive monitoring to analyze complete ecosystem variability, and an approach with emphasis on ex-ante modeling and ex-post assessment of well-studied subsystems. PMID:27509185

  12. Selected questions on biomechanical exposures for surveillance of upper-limb work-related musculoskeletal disorders

    PubMed Central

    Descatha, Alexis; Roquelaure, Yves; Evanoff, Bradley; Niedhammer, Isabelle; Chastang, Jean François; Mariot, Camille; Ha, Catherine; Imbernon, Ellen; Goldberg, Marcel; Leclerc, Annette

    2007-01-01

    Objective Questionnaires for assessment of biomechanical exposure are frequently used in surveillance programs, though few studies have evaluated which key questions are needed. We sought to reduce the number of variables on a surveillance questionnaire by identifying which variables best summarized biomechanical exposure in a survey of the French working population. Methods We used data from the 2002–2003 French experimental network of Upper-limb work-related musculoskeletal disorders (UWMSD), performed on 2685 subjects in which 37 variables assessing biomechanical exposures were available (divided into four ordinal categories, according to the task frequency or duration). Principal Component Analysis (PCA) with orthogonal rotation was performed on these variables. Variables closely associated with factors issued from PCA were retained, except those highly correlated to another variable (rho>0.70). In order to study the relevance of the final list of variables, correlations between a score based on retained variables (PCA score) and the exposure score suggested by the SALTSA group were calculated. The associations between the PCA score and the prevalence of UWMSD were also studied. In a final step, we added back to the list a few variables not retained by PCA, because of their established recognition as risk factors. Results According to the results of the PCA, seven interpretable factors were identified: posture exposures, repetitiveness, handling of heavy loads, distal biomechanical exposures, computer use, forklift operator specific task, and recovery time. Twenty variables strongly correlated with the factors obtained from PCA were retained. The PCA score was strongly correlated both with the SALTSA score and with UWMSD prevalence (p<0.0001). In the final step, six variables were reintegrated. Conclusion Twenty-six variables out of 37 were efficiently selected according to their ability to summarize major biomechanical constraints in a working population, with an approach combining statistical analyses and existing knowledge. PMID:17476519

  13. Interpretation of tropospheric ozone variability in data with different vertical and temporal resolution

    NASA Astrophysics Data System (ADS)

    Petropavlovskikh, I. V.; Disterhoft, P.; Johnson, B. J.; Rieder, H. E.; Manney, G. L.; Daffer, W.

    2012-12-01

    This work attributes tropospheric ozone variability derived from the ground-based Dobson and Brewer Umkehr measurements and from ozone sonde data to local sources and transport. It assesses capability and limitations in both types of measurements that are often used to analyze long- and short-term variability in tropospheric ozone time series. We will address the natural and instrument-related contribution to the variability found in both Umkehr and sonde data. Validation of Umkehr methods is often done by intercomparisons against independent ozone measuring techniques such as ozone sounding. We will use ozone-sounding in its original and AK-smoothed vertical profiles for assessment of ozone inter-annual variability over Boulder, CO. We will discuss possible reasons for differences between different ozone measuring techniques and its effects on the derived ozone trends. Next to standard evaluation techniques we utilize a STL-decomposition method to address temporal variability and trends in the Boulder Umkehr data. Further, we apply a statistical modeling approach to the ozone data set to attribute ozone variability to individual driving forces associated with natural and anthropogenic causes. To this aim we follow earlier work applying a backward selection method (i.e., a stepwise elimination procedure out of a set of total 44 explanatory variables) to determine those explanatory variables which contribute most significantly to the observed variability. We will present also some results associated with completeness (sampling rate) of the existing data sets. We will also use MERRA (Modern-Era Retrospective analysis for Research and Applications) re-analysis results selected for Boulder location as a transfer function in understanding of the effects that the temporal sampling and vertical resolution bring into trend and ozone variability analysis. Analyzing intra-annual variability in ozone measurements over Boulder, CO, in relation to the upper tropospheric subtropical and polar jets, we will address the stratospheric and tropospheric intrusions in the middle latitude troposphere ozone field.

  14. Can You Hear Me Now? Musical Training Shapes Functional Brain Networks for Selective Auditory Attention and Hearing Speech in Noise

    PubMed Central

    Strait, Dana L.; Kraus, Nina

    2011-01-01

    Even in the quietest of rooms, our senses are perpetually inundated by a barrage of sounds, requiring the auditory system to adapt to a variety of listening conditions in order to extract signals of interest (e.g., one speaker's voice amidst others). Brain networks that promote selective attention are thought to sharpen the neural encoding of a target signal, suppressing competing sounds and enhancing perceptual performance. Here, we ask: does musical training benefit cortical mechanisms that underlie selective attention to speech? To answer this question, we assessed the impact of selective auditory attention on cortical auditory-evoked response variability in musicians and non-musicians. Outcomes indicate strengthened brain networks for selective auditory attention in musicians in that musicians but not non-musicians demonstrate decreased prefrontal response variability with auditory attention. Results are interpreted in the context of previous work documenting perceptual and subcortical advantages in musicians for the hearing and neural encoding of speech in background noise. Musicians’ neural proficiency for selectively engaging and sustaining auditory attention to language indicates a potential benefit of music for auditory training. Given the importance of auditory attention for the development and maintenance of language-related skills, musical training may aid in the prevention, habilitation, and remediation of individuals with a wide range of attention-based language, listening and learning impairments. PMID:21716636

  15. Quantifying Uncertainties from Presence Data Sampling Methods for Species Distribution Modeling: Focused on Vegetation.

    NASA Astrophysics Data System (ADS)

    Sung, S.; Kim, H. G.; Lee, D. K.; Park, J. H.; Mo, Y.; Kil, S.; Park, C.

    2016-12-01

    The impact of climate change has been observed throughout the globe. The ecosystem experiences rapid changes such as vegetation shift, species extinction. In these context, Species Distribution Model (SDM) is one of the popular method to project impact of climate change on the ecosystem. SDM basically based on the niche of certain species with means to run SDM present point data is essential to find biological niche of species. To run SDM for plants, there are certain considerations on the characteristics of vegetation. Normally, to make vegetation data in large area, remote sensing techniques are used. In other words, the exact point of presence data has high uncertainties as we select presence data set from polygons and raster dataset. Thus, sampling methods for modeling vegetation presence data should be carefully selected. In this study, we used three different sampling methods for selection of presence data of vegetation: Random sampling, Stratified sampling and Site index based sampling. We used one of the R package BIOMOD2 to access uncertainty from modeling. At the same time, we included BioCLIM variables and other environmental variables as input data. As a result of this study, despite of differences among the 10 SDMs, the sampling methods showed differences in ROC values, random sampling methods showed the lowest ROC value while site index based sampling methods showed the highest ROC value. As a result of this study the uncertainties from presence data sampling methods and SDM can be quantified.

  16. Chemometric classification of casework arson samples based on gasoline content.

    PubMed

    Sinkov, Nikolai A; Sandercock, P Mark L; Harynuk, James J

    2014-02-01

    Detection and identification of ignitable liquids (ILs) in arson debris is a critical part of arson investigations. The challenge of this task is due to the complex and unpredictable chemical nature of arson debris, which also contains pyrolysis products from the fire. ILs, most commonly gasoline, are complex chemical mixtures containing hundreds of compounds that will be consumed or otherwise weathered by the fire to varying extents depending on factors such as temperature, air flow, the surface on which IL was placed, etc. While methods such as ASTM E-1618 are effective, data interpretation can be a costly bottleneck in the analytical process for some laboratories. In this study, we address this issue through the application of chemometric tools. Prior to the application of chemometric tools such as PLS-DA and SIMCA, issues of chromatographic alignment and variable selection need to be addressed. Here we use an alignment strategy based on a ladder consisting of perdeuterated n-alkanes. Variable selection and model optimization was automated using a hybrid backward elimination (BE) and forward selection (FS) approach guided by the cluster resolution (CR) metric. In this work, we demonstrate the automated construction, optimization, and application of chemometric tools to casework arson data. The resulting PLS-DA and SIMCA classification models, trained with 165 training set samples, have provided classification of 55 validation set samples based on gasoline content with 100% specificity and sensitivity. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  17. Evaluation of alternative model selection criteria in the analysis of unimodal response curves using CART

    USGS Publications Warehouse

    Ribic, C.A.; Miller, T.W.

    1998-01-01

    We investigated CART performance with a unimodal response curve for one continuous response and four continuous explanatory variables, where two variables were important (ie directly related to the response) and the other two were not. We explored performance under three relationship strengths and two explanatory variable conditions: equal importance and one variable four times as important as the other. We compared CART variable selection performance using three tree-selection rules ('minimum risk', 'minimum risk complexity', 'one standard error') to stepwise polynomial ordinary least squares (OLS) under four sample size conditions. The one-standard-error and minimum-risk-complexity methods performed about as well as stepwise OLS with large sample sizes when the relationship was strong. With weaker relationships, equally important explanatory variables and larger sample sizes, the one-standard-error and minimum-risk-complexity rules performed better than stepwise OLS. With weaker relationships and explanatory variables of unequal importance, tree-structured methods did not perform as well as stepwise OLS. Comparing performance within tree-structured methods, with a strong relationship and equally important explanatory variables, the one-standard-error-rule was more likely to choose the correct model than were the other tree-selection rules 1) with weaker relationships and equally important explanatory variables; and 2) under all relationship strengths when explanatory variables were of unequal importance and sample sizes were lower.

  18. Preformulation considerations for controlled release dosage forms. Part III. Candidate form selection using numerical weighting and scoring.

    PubMed

    Chrzanowski, Frank

    2008-01-01

    Two numerical methods, Decision Analysis (DA) and Potential Problem Analysis (PPA) are presented as alternative selection methods to the logical method presented in Part I. In DA properties are weighted and outcomes are scored. The weighted scores for each candidate are totaled and final selection is based on the totals. Higher scores indicate better candidates. In PPA potential problems are assigned a seriousness factor and test outcomes are used to define the probability of occurrence. The seriousness-probability products are totaled and forms with minimal scores are preferred. DA and PPA have never been compared to the logical-elimination method. Additional data were available for two forms of McN-5707 to provide complete preformulation data for five candidate forms. Weight and seriousness factors (independent variables) were obtained from a survey of experienced formulators. Scores and probabilities (dependent variables) were provided independently by Preformulation. The rankings of the five candidate forms, best to worst, were similar for all three methods. These results validate the applicability of DA and PPA for candidate form selection. DA and PPA are particularly applicable in cases where there are many candidate forms and where each form has some degree of unfavorable properties.

  19. Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

    PubMed Central

    Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio

    2014-01-01

    Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686

  20. Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

    NASA Astrophysics Data System (ADS)

    Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

    2008-04-01

    Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

  1. Selecting climate change scenarios for regional hydrologic impact studies based on climate extremes indices

    NASA Astrophysics Data System (ADS)

    Seo, Seung Beom; Kim, Young-Oh; Kim, Youngil; Eum, Hyung-Il

    2018-04-01

    When selecting a subset of climate change scenarios (GCM models), the priority is to ensure that the subset reflects the comprehensive range of possible model results for all variables concerned. Though many studies have attempted to improve the scenario selection, there is a lack of studies that discuss methods to ensure that the results from a subset of climate models contain the same range of uncertainty in hydrologic variables as when all models are considered. We applied the Katsavounidis-Kuo-Zhang (KKZ) algorithm to select a subset of climate change scenarios and demonstrated its ability to reduce the number of GCM models in an ensemble, while the ranges of multiple climate extremes indices were preserved. First, we analyzed the role of 27 ETCCDI climate extremes indices for scenario selection and selected the representative climate extreme indices. Before the selection of a subset, we excluded a few deficient GCM models that could not represent the observed climate regime. Subsequently, we discovered that a subset of GCM models selected by the KKZ algorithm with the representative climate extreme indices could not capture the full potential range of changes in hydrologic extremes (e.g., 3-day peak flow and 7-day low flow) in some regional case studies. However, the application of the KKZ algorithm with a different set of climate indices, which are correlated to the hydrologic extremes, enabled the overcoming of this limitation. Key climate indices, dependent on the hydrologic extremes to be projected, must therefore be determined prior to the selection of a subset of GCM models.

  2. Covariate Selection for Multilevel Models with Missing Data

    PubMed Central

    Marino, Miguel; Buxton, Orfeu M.; Li, Yi

    2017-01-01

    Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457

  3. A progressive data compression scheme based upon adaptive transform coding: Mixture block coding of natural images

    NASA Technical Reports Server (NTRS)

    Rost, Martin C.; Sayood, Khalid

    1991-01-01

    A method for efficiently coding natural images using a vector-quantized variable-blocksized transform source coder is presented. The method, mixture block coding (MBC), incorporates variable-rate coding by using a mixture of discrete cosine transform (DCT) source coders. Which coders are selected to code any given image region is made through a threshold driven distortion criterion. In this paper, MBC is used in two different applications. The base method is concerned with single-pass low-rate image data compression. The second is a natural extension of the base method which allows for low-rate progressive transmission (PT). Since the base method adapts easily to progressive coding, it offers the aesthetic advantage of progressive coding without incorporating extensive channel overhead. Image compression rates of approximately 0.5 bit/pel are demonstrated for both monochrome and color images.

  4. A COMPARISON OF INTER-ANALYST DIFFERENCES IN THE CLASSIFICATION OF A LANDSAT TEM+ SCENE IN SOUTH-CENTRAL VIRGINIA

    EPA Science Inventory

    This study examined inter-analyst classification variability based on training site signature selection only for six classifications from a 10 km2 Landsat ETM+ image centered over a highly heterogeneous area in south-central Virginia. Six analysts classified the image...

  5. The Association between Bullying and Psychological Health among Senior High School Students in Ghana, West Africa

    ERIC Educational Resources Information Center

    Owusu, Andrew; Hart, Peter; Oliver, Brittney; Kang, Minsoo

    2011-01-01

    Background: School-based bullying, a global challenge, negatively impacts the health and development of both victims and perpetrators. This study examined the relationship between bullying victimization and selected psychological variables among senior high school (SHS) students in Ghana, West Africa. Methods: This study utilized data from the…

  6. Parents' Perspectives on Coping with Duchenne Muscular Dystrophy and Concomitant Specific Learning Disabilities

    ERIC Educational Resources Information Center

    Webb, Carol L.

    2005-01-01

    This study addresses parental perspectives and coping strategies related to Duchenne muscular dystrophy and specific learning disabilities. Data were collected through individual semi-structured in-depth interviews with fifteen sets of parents. Participants were selected based on variables such as age of children, number of children with both…

  7. EMI-Sensor Data to Identify Areas of Manure Accumulation on a Feedlot Surface

    USDA-ARS?s Scientific Manuscript database

    A study was initiated to test the validity of using electromagnetic induction (EMI) survey data, a prediction-based sampling strategy and ordinary linear regression modeling to predict spatially variable feedlot surface manure accumulation. A 30 m × 60 m feedlot pen with a central mound was selecte...

  8. USE OF GIS AND ANCILLARY VARIABLES TO PREDICT VOLATILE ORGANIC COMPOUND AND NITROGEN DIOXIDE LEVELS AT UNMONITORED LOCATIONS

    EPA Science Inventory

    This paper presents a GIS-based regression spatial method, known as land-use regression (LUR) modeling, to estimate ambient air pollution exposures used in the EPA El Paso Children's Health Study. Passive measurements of select volatile organic compounds (VOC) and nitrogen dioxi...

  9. Calibrating SALT: a sampling scheme to improve estimates of suspended sediment yield

    Treesearch

    Robert B. Thomas

    1986-01-01

    Abstract - SALT (Selection At List Time) is a variable probability sampling scheme that provides unbiased estimates of suspended sediment yield and its variance. SALT performs better than standard schemes which are estimate variance. Sampling probabilities are based on a sediment rating function which promotes greater sampling intensity during periods of high...

  10. 78 FR 64566 - Self-Regulatory Organizations; Financial Industry Regulatory Authority, Inc.; Notice of Filing...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-10-29

    ... Effectiveness of a Proposed Rule Change To Revise the Series 6 Examination Program October 23, 2013. Pursuant to... selection specifications for the Investment Company and Variable Contracts Products Representative (Series 6... corresponding revisions to the Series 6 question bank. Based on instruction from SEC staff, FINRA is submitting...

  11. Effects of unsteady conditions on propulsion generated by the hand's motion in swimming: a systematic review.

    PubMed

    Gomes, Lara Elena; Loss, Jefferson Fagundes

    2015-01-01

    The understanding of swimming propulsion is a key factor in the improvement of performance in this sport. Propulsive forces have been quantified under steady conditions since the 1970s, but actual swimming involves unsteady conditions. Thus, the purpose of the present article was to review the effects of unsteady conditions on swimming propulsion based on studies that have compared steady and unsteady conditions while exploring their methods, their limitations and their results, as well as encouraging new studies based on the findings of this systematic review. A multiple database search was performed, and only those studies that met all eligibility criteria were included. Six studies that compared steady and unsteady conditions using physical experiments or numerical simulations were selected. The selected studies verified the effects of one or more factors that characterise a condition as unsteady on the propulsive forces. Consequently, much research is necessary to understand the effect of each individual variable that characterises a condition as unsteady on swimming propulsion, as well as the effects of these variables as a whole on swimming propulsion.

  12. Measuring high-density built environment for public health research: Uncertainty with respect to data, indicator design and spatial scale.

    PubMed

    Sun, Guibo; Webster, Chris; Ni, Michael Y; Zhang, Xiaohu

    2018-05-07

    Uncertainty with respect to built environment (BE) data collection, measure conceptualization and spatial scales is evident in urban health research, but most findings are from relatively lowdensity contexts. We selected Hong Kong, an iconic high-density city, as the study area as limited research has been conducted on uncertainty in such areas. We used geocoded home addresses (n=5732) from a large population-based cohort in Hong Kong to extract BE measures for the participants' place of residence based on an internationally recognized BE framework. Variability of the measures was mapped and Spearman's rank correlation calculated to assess how well the relationships among indicators are preserved across variables and spatial scales. We found extreme variations and uncertainties for the 180 measures collected using comprehensive data and advanced geographic information systems modelling techniques. We highlight the implications of methodological selection and spatial scales of the measures. The results suggest that more robust information regarding urban health research in high-density city would emerge if greater consideration were given to BE data, design methods and spatial scales of the BE measures.

  13. [Prediction of regional soil quality based on mutual information theory integrated with decision tree algorithm].

    PubMed

    Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu

    2012-02-01

    In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.

  14. Identification and ranking of environmental threats with ecosystem vulnerability distributions.

    PubMed

    Zijp, Michiel C; Huijbregts, Mark A J; Schipper, Aafke M; Mulder, Christian; Posthuma, Leo

    2017-08-24

    Responses of ecosystems to human-induced stress vary in space and time, because both stressors and ecosystem vulnerabilities vary in space and time. Presently, ecosystem impact assessments mainly take into account variation in stressors, without considering variation in ecosystem vulnerability. We developed a method to address ecosystem vulnerability variation by quantifying ecosystem vulnerability distributions (EVDs) based on monitoring data of local species compositions and environmental conditions. The method incorporates spatial variation of both abiotic and biotic variables to quantify variation in responses among species and ecosystems. We show that EVDs can be derived based on a selection of locations, existing monitoring data and a selected impact boundary, and can be used in stressor identification and ranking for a region. A case study on Ohio's freshwater ecosystems, with freshwater fish as target species group, showed that physical habitat impairment and nutrient loads ranked highest as current stressors, with species losses higher than 5% for at least 6% of the locations. EVDs complement existing approaches of stressor assessment and management, which typically account only for variability in stressors, by accounting for variation in the vulnerability of the responding ecosystems.

  15. Combining cow and bull reference populations to increase accuracy of genomic prediction and genome-wide association studies.

    PubMed

    Calus, M P L; de Haas, Y; Veerkamp, R F

    2013-10-01

    Genomic selection holds the promise to be particularly beneficial for traits that are difficult or expensive to measure, such that access to phenotypes on large daughter groups of bulls is limited. Instead, cow reference populations can be generated, potentially supplemented with existing information from the same or (highly) correlated traits available on bull reference populations. The objective of this study, therefore, was to develop a model to perform genomic predictions and genome-wide association studies based on a combined cow and bull reference data set, with the accuracy of the phenotypes differing between the cow and bull genomic selection reference populations. The developed bivariate Bayesian stochastic search variable selection model allowed for an unbalanced design by imputing residuals in the residual updating scheme for all missing records. The performance of this model is demonstrated on a real data example, where the analyzed trait, being milk fat or protein yield, was either measured only on a cow or a bull reference population, or recorded on both. Our results were that the developed bivariate Bayesian stochastic search variable selection model was able to analyze 2 traits, even though animals had measurements on only 1 of 2 traits. The Bayesian stochastic search variable selection model yielded consistently higher accuracy for fat yield compared with a model without variable selection, both for the univariate and bivariate analyses, whereas the accuracy of both models was very similar for protein yield. The bivariate model identified several additional quantitative trait loci peaks compared with the single-trait models on either trait. In addition, the bivariate models showed a marginal increase in accuracy of genomic predictions for the cow traits (0.01-0.05), although a greater increase in accuracy is expected as the size of the bull population increases. Our results emphasize that the chosen value of priors in Bayesian genomic prediction models are especially important in small data sets. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  16. Mean-variance model for portfolio optimization with background risk based on uncertainty theory

    NASA Astrophysics Data System (ADS)

    Zhai, Jia; Bai, Manying

    2018-04-01

    The aim of this paper is to develop a mean-variance model for portfolio optimization considering the background risk, liquidity and transaction cost based on uncertainty theory. In portfolio selection problem, returns of securities and assets liquidity are assumed as uncertain variables because of incidents or lacking of historical data, which are common in economic and social environment. We provide crisp forms of the model and a hybrid intelligent algorithm to solve it. Under a mean-variance framework, we analyze the portfolio frontier characteristic considering independently additive background risk. In addition, we discuss some effects of background risk and liquidity constraint on the portfolio selection. Finally, we demonstrate the proposed models by numerical simulations.

  17. Classification of the European Union member states according to the relative level of sustainable development.

    PubMed

    Anna, Bluszcz

    Nowadays methods of measurement and assessment of the level of sustained development at the international, national and regional level are a current research problem, which requires multi-dimensional analysis. The relative assessment of the sustainability level of the European Union member states and the comparative analysis of the position of Poland relative to other countries was the aim of the conducted studies in the article. EU member states were treated as objects in the multi-dimensional space. Dimensions of space were specified by ten diagnostic variables describing the sustainability level of UE countries in three dimensions, i.e., social, economic and environmental. Because the compiled statistical data were expressed in different units of measure, taxonomic methods were used for building an aggregated measure to assess the level of sustainable development of EU member states, which through normalisation of variables enabled the comparative analysis between countries. Methodology of studies consisted of eight stages, which included, among others: defining data matrices, calculating the variability coefficient for all variables, which variability coefficient was under 10 %, division of variables into stimulants and destimulants, selection of the method of variable normalisation, developing matrices of normalised data, selection of the formula and calculating the aggregated indicator of the relative level of sustainable development of the EU countries, calculating partial development indicators for three studies dimensions: social, economic and environmental and the classification of the EU countries according to the relative level of sustainable development. Statistical date were collected based on the Polish Central Statistical Office publication.

  18. Using Bayesian variable selection to analyze regular resolution IV two-level fractional factorial designs

    DOE PAGES

    Chipman, Hugh A.; Hamada, Michael S.

    2016-06-02

    Regular two-level fractional factorial designs have complete aliasing in which the associated columns of multiple effects are identical. Here, we show how Bayesian variable selection can be used to analyze experiments that use such designs. In addition to sparsity and hierarchy, Bayesian variable selection naturally incorporates heredity . This prior information is used to identify the most likely combinations of active terms. We also demonstrate the method on simulated and real experiments.

  19. Variable Selection in Logistic Regression.

    DTIC Science & Technology

    1987-06-01

    23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah

  20. Using Bayesian variable selection to analyze regular resolution IV two-level fractional factorial designs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chipman, Hugh A.; Hamada, Michael S.

    Regular two-level fractional factorial designs have complete aliasing in which the associated columns of multiple effects are identical. Here, we show how Bayesian variable selection can be used to analyze experiments that use such designs. In addition to sparsity and hierarchy, Bayesian variable selection naturally incorporates heredity . This prior information is used to identify the most likely combinations of active terms. We also demonstrate the method on simulated and real experiments.

Top