Sample records for variable selection methods

  1. Variable Selection in the Presence of Missing Data: Imputation-based Methods.

    PubMed

    Zhao, Yize; Long, Qi

    2017-01-01

    Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.

  2. Efficient Variable Selection Method for Exposure Variables on Binary Data

    NASA Astrophysics Data System (ADS)

    Ohno, Manabu; Tarumi, Tomoyuki

    In this paper, we propose a new variable selection method for "robust" exposure variables. We define "robust" as property that the same variable can select among original data and perturbed data. There are few studies of effective for the selection method. The problem that selects exposure variables is almost the same as a problem that extracts correlation rules without robustness. [Brin 97] is suggested that correlation rules are possible to extract efficiently using chi-squared statistic of contingency table having monotone property on binary data. But the chi-squared value does not have monotone property, so it's is easy to judge the method to be not independent with an increase in the dimension though the variable set is completely independent, and the method is not usable in variable selection for robust exposure variables. We assume anti-monotone property for independent variables to select robust independent variables and use the apriori algorithm for it. The apriori algorithm is one of the algorithms which find association rules from the market basket data. The algorithm use anti-monotone property on the support which is defined by association rules. But independent property does not completely have anti-monotone property on the AIC of independent probability model, but the tendency to have anti-monotone property is strong. Therefore, selected variables with anti-monotone property on the AIC have robustness. Our method judges whether a certain variable is exposure variable for the independent variable using previous comparison of the AIC. Our numerical experiments show that our method can select robust exposure variables efficiently and precisely.

  3. Identification of solid state fermentation degree with FT-NIR spectroscopy: Comparison of wavelength variable selection methods of CARS and SCARS.

    PubMed

    Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai

    2015-01-01

    The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Identification of solid state fermentation degree with FT-NIR spectroscopy: Comparison of wavelength variable selection methods of CARS and SCARS

    NASA Astrophysics Data System (ADS)

    Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai

    2015-10-01

    The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree.

  5. Group Variable Selection Via Convex Log-Exp-Sum Penalty with Application to a Breast Cancer Survivor Study

    PubMed Central

    Geng, Zhigeng; Wang, Sijian; Yu, Menggang; Monahan, Patrick O.; Champion, Victoria; Wahba, Grace

    2017-01-01

    Summary In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this article, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors. PMID:25257196

  6. Variables selection methods in near-infrared spectroscopy.

    PubMed

    Xiaobo, Zou; Jiewen, Zhao; Povey, Malcolm J W; Holmes, Mel; Hanpin, Mao

    2010-05-14

    Near-infrared (NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields, such as the petrochemical, pharmaceutical, environmental, clinical, agricultural, food and biomedical sectors during the past 15 years. A NIR spectrum of a sample is typically measured by modern scanning instruments at hundreds of equally spaced wavelengths. The large number of spectral variables in most data sets encountered in NIR spectral chemometrics often renders the prediction of a dependent variable unreliable. Recently, considerable effort has been directed towards developing and evaluating different procedures that objectively identify variables which contribute useful information and/or eliminate variables containing mostly noise. This review focuses on the variable selection methods in NIR spectroscopy. Selection methods include some classical approaches, such as manual approach (knowledge based selection), "Univariate" and "Sequential" selection methods; sophisticated methods such as successive projections algorithm (SPA) and uninformative variable elimination (UVE), elaborate search-based strategies such as simulated annealing (SA), artificial neural networks (ANN) and genetic algorithms (GAs) and interval base algorithms such as interval partial least squares (iPLS), windows PLS and iterative PLS. Wavelength selection with B-spline, Kalman filtering, Fisher's weights and Bayesian are also mentioned. Finally, the websites of some variable selection software and toolboxes for non-commercial use are given. Copyright 2010 Elsevier B.V. All rights reserved.

  7. Variable Selection through Correlation Sifting

    NASA Astrophysics Data System (ADS)

    Huang, Jim C.; Jojic, Nebojsa

    Many applications of computational biology require a variable selection procedure to sift through a large number of input variables and select some smaller number that influence a target variable of interest. For example, in virology, only some small number of viral protein fragments influence the nature of the immune response during viral infection. Due to the large number of variables to be considered, a brute-force search for the subset of variables is in general intractable. To approximate this, methods based on ℓ1-regularized linear regression have been proposed and have been found to be particularly successful. It is well understood however that such methods fail to choose the correct subset of variables if these are highly correlated with other "decoy" variables. We present a method for sifting through sets of highly correlated variables which leads to higher accuracy in selecting the correct variables. The main innovation is a filtering step that reduces correlations among variables to be selected, making the ℓ1-regularization effective for datasets on which many methods for variable selection fail. The filtering step changes both the values of the predictor variables and output values by projections onto components obtained through a computationally-inexpensive principal components analysis. In this paper we demonstrate the usefulness of our method on synthetic datasets and on novel applications in virology. These include HIV viral load analysis based on patients' HIV sequences and immune types, as well as the analysis of seasonal variation in influenza death rates based on the regions of the influenza genome that undergo diversifying selection in the previous season.

  8. Robust check loss-based variable selection of high-dimensional single-index varying-coefficient model

    NASA Astrophysics Data System (ADS)

    Song, Yunquan; Lin, Lu; Jian, Ling

    2016-07-01

    Single-index varying-coefficient model is an important mathematical modeling method to model nonlinear phenomena in science and engineering. In this paper, we develop a variable selection method for high-dimensional single-index varying-coefficient models using a shrinkage idea. The proposed procedure can simultaneously select significant nonparametric components and parametric components. Under defined regularity conditions, with appropriate selection of tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Moreover, due to the robustness of the check loss function to outliers in the finite samples, our proposed variable selection method is more robust than the ones based on the least squares criterion. Finally, the method is illustrated with numerical simulations.

  9. Exhaustive Search for Sparse Variable Selection in Linear Regression

    NASA Astrophysics Data System (ADS)

    Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato

    2018-04-01

    We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.

  10. Collective feature selection to identify crucial epistatic variants.

    PubMed

    Verma, Shefali S; Lucas, Anastasia; Zhang, Xinyuan; Veturi, Yogasudha; Dudek, Scott; Li, Binglan; Li, Ruowang; Urbanowicz, Ryan; Moore, Jason H; Kim, Dokyoon; Ritchie, Marylyn D

    2018-01-01

    Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach. Through our simulation study we propose a collective feature selection approach to select features that are in the "union" of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~ 44,000 samples obtained from Geisinger's MyCode Community Health Initiative (on behalf of DiscovEHR collaboration). In this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.

  11. Selecting predictors for discriminant analysis of species performance: an example from an amphibious softwater plant.

    PubMed

    Vanderhaeghe, F; Smolders, A J P; Roelofs, J G M; Hoffmann, M

    2012-03-01

    Selecting an appropriate variable subset in linear multivariate methods is an important methodological issue for ecologists. Interest often exists in obtaining general predictive capacity or in finding causal inferences from predictor variables. Because of a lack of solid knowledge on a studied phenomenon, scientists explore predictor variables in order to find the most meaningful (i.e. discriminating) ones. As an example, we modelled the response of the amphibious softwater plant Eleocharis multicaulis using canonical discriminant function analysis. We asked how variables can be selected through comparison of several methods: univariate Pearson chi-square screening, principal components analysis (PCA) and step-wise analysis, as well as combinations of some methods. We expected PCA to perform best. The selected methods were evaluated through fit and stability of the resulting discriminant functions and through correlations between these functions and the predictor variables. The chi-square subset, at P < 0.05, followed by a step-wise sub-selection, gave the best results. In contrast to expectations, PCA performed poorly, as so did step-wise analysis. The different chi-square subset methods all yielded ecologically meaningful variables, while probable noise variables were also selected by PCA and step-wise analysis. We advise against the simple use of PCA or step-wise discriminant analysis to obtain an ecologically meaningful variable subset; the former because it does not take into account the response variable, the latter because noise variables are likely to be selected. We suggest that univariate screening techniques are a worthwhile alternative for variable selection in ecology. © 2011 German Botanical Society and The Royal Botanical Society of the Netherlands.

  12. A survey of variable selection methods in two Chinese epidemiology journals

    PubMed Central

    2010-01-01

    Background Although much has been written on developing better procedures for variable selection, there is little research on how it is practiced in actual studies. This review surveys the variable selection methods reported in two high-ranking Chinese epidemiology journals. Methods Articles published in 2004, 2006, and 2008 in the Chinese Journal of Epidemiology and the Chinese Journal of Preventive Medicine were reviewed. Five categories of methods were identified whereby variables were selected using: A - bivariate analyses; B - multivariable analysis; e.g. stepwise or individual significance testing of model coefficients; C - first bivariate analyses, followed by multivariable analysis; D - bivariate analyses or multivariable analysis; and E - other criteria like prior knowledge or personal judgment. Results Among the 287 articles that reported using variable selection methods, 6%, 26%, 30%, 21%, and 17% were in categories A through E, respectively. One hundred sixty-three studies selected variables using bivariate analyses, 80% (130/163) via multiple significance testing at the 5% alpha-level. Of the 219 multivariable analyses, 97 (44%) used stepwise procedures, 89 (41%) tested individual regression coefficients, but 33 (15%) did not mention how variables were selected. Sixty percent (58/97) of the stepwise routines also did not specify the algorithm and/or significance levels. Conclusions The variable selection methods reported in the two journals were limited in variety, and details were often missing. Many studies still relied on problematic techniques like stepwise procedures and/or multiple testing of bivariate associations at the 0.05 alpha-level. These deficiencies should be rectified to safeguard the scientific validity of articles published in Chinese epidemiology journals. PMID:20920252

  13. A Selective Overview of Variable Selection in High Dimensional Feature Space

    PubMed Central

    Fan, Jianqing

    2010-01-01

    High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976

  14. Automatic variable selection method and a comparison for quantitative analysis in laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Duan, Fajie; Fu, Xiao; Jiang, Jiajia; Huang, Tingting; Ma, Ling; Zhang, Cong

    2018-05-01

    In this work, an automatic variable selection method for quantitative analysis of soil samples using laser-induced breakdown spectroscopy (LIBS) is proposed, which is based on full spectrum correction (FSC) and modified iterative predictor weighting-partial least squares (mIPW-PLS). The method features automatic selection without artificial processes. To illustrate the feasibility and effectiveness of the method, a comparison with genetic algorithm (GA) and successive projections algorithm (SPA) for different elements (copper, barium and chromium) detection in soil was implemented. The experimental results showed that all the three methods could accomplish variable selection effectively, among which FSC-mIPW-PLS required significantly shorter computation time (12 s approximately for 40,000 initial variables) than the others. Moreover, improved quantification models were got with variable selection approaches. The root mean square errors of prediction (RMSEP) of models utilizing the new method were 27.47 (copper), 37.15 (barium) and 39.70 (chromium) mg/kg, which showed comparable prediction effect with GA and SPA.

  15. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.

    PubMed

    Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan

    2017-01-01

    Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  16. Novel harmonic regularization approach for variable selection in Cox's proportional hazards model.

    PubMed

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.

  17. Input variable selection and calibration data selection for storm water quality regression models.

    PubMed

    Sun, Siao; Bertrand-Krajewski, Jean-Luc

    2013-01-01

    Storm water quality models are useful tools in storm water management. Interest has been growing in analyzing existing data for developing models for urban storm water quality evaluations. It is important to select appropriate model inputs when many candidate explanatory variables are available. Model calibration and verification are essential steps in any storm water quality modeling. This study investigates input variable selection and calibration data selection in storm water quality regression models. The two selection problems are mutually interacted. A procedure is developed in order to fulfil the two selection tasks in order. The procedure firstly selects model input variables using a cross validation method. An appropriate number of variables are identified as model inputs to ensure that a model is neither overfitted nor underfitted. Based on the model input selection results, calibration data selection is studied. Uncertainty of model performances due to calibration data selection is investigated with a random selection method. An approach using the cluster method is applied in order to enhance model calibration practice based on the principle of selecting representative data for calibration. The comparison between results from the cluster selection method and random selection shows that the former can significantly improve performances of calibrated models. It is found that the information content in calibration data is important in addition to the size of calibration data.

  18. A Selective Review of Group Selection in High-Dimensional Models

    PubMed Central

    Huang, Jian; Breheny, Patrick; Ma, Shuangge

    2013-01-01

    Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707

  19. Novel Harmonic Regularization Approach for Variable Selection in Cox's Proportional Hazards Model

    PubMed Central

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods. PMID:25506389

  20. A non-linear data mining parameter selection algorithm for continuous variables

    PubMed Central

    Razavi, Marianne; Brady, Sean

    2017-01-01

    In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables. PMID:29131829

  1. Covariate Selection for Multilevel Models with Missing Data

    PubMed Central

    Marino, Miguel; Buxton, Orfeu M.; Li, Yi

    2017-01-01

    Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457

  2. Identification of spectral regions for the quantification of red wine tannins with fourier transform mid-infrared spectroscopy.

    PubMed

    Jensen, Jacob S; Egebo, Max; Meyer, Anne S

    2008-05-28

    Accomplishment of fast tannin measurements is receiving increased interest as tannins are important for the mouthfeel and color properties of red wines. Fourier transform mid-infrared spectroscopy allows fast measurement of different wine components, but quantification of tannins is difficult due to interferences from spectral responses of other wine components. Four different variable selection tools were investigated for the identification of the most important spectral regions which would allow quantification of tannins from the spectra using partial least-squares regression. The study included the development of a new variable selection tool, iterative backward elimination of changeable size intervals PLS. The spectral regions identified by the different variable selection methods were not identical, but all included two regions (1485-1425 and 1060-995 cm(-1)), which therefore were concluded to be particularly important for tannin quantification. The spectral regions identified from the variable selection methods were used to develop calibration models. All four variable selection methods identified regions that allowed an improved quantitative prediction of tannins (RMSEP = 69-79 mg of CE/L; r = 0.93-0.94) as compared to a calibration model developed using all variables (RMSEP = 115 mg of CE/L; r = 0.87). Only minor differences in the performance of the variable selection methods were observed.

  3. Firmness prediction in Prunus persica 'Calrico' peaches by visible/short-wave near infrared spectroscopy and acoustic measurements using optimised linear and non-linear chemometric models.

    PubMed

    Lafuente, Victoria; Herrera, Luis J; Pérez, María del Mar; Val, Jesús; Negueruela, Ignacio

    2015-08-15

    In this work, near infrared spectroscopy (NIR) and an acoustic measure (AWETA) (two non-destructive methods) were applied in Prunus persica fruit 'Calrico' (n = 260) to predict Magness-Taylor (MT) firmness. Separate and combined use of these measures was evaluated and compared using partial least squares (PLS) and least squares support vector machine (LS-SVM) regression methods. Also, a mutual-information-based variable selection method, seeking to find the most significant variables to produce optimal accuracy of the regression models, was applied to a joint set of variables (NIR wavelengths and AWETA measure). The newly proposed combined NIR-AWETA model gave good values of the determination coefficient (R(2)) for PLS and LS-SVM methods (0.77 and 0.78, respectively), improving the reliability of MT firmness prediction in comparison with separate NIR and AWETA predictions. The three variables selected by the variable selection method (AWETA measure plus NIR wavelengths 675 and 697 nm) achieved R(2) values 0.76 and 0.77, PLS and LS-SVM. These results indicated that the proposed mutual-information-based variable selection algorithm was a powerful tool for the selection of the most relevant variables. © 2014 Society of Chemical Industry.

  4. Petroleomics by electrospray ionization FT-ICR mass spectrometry coupled to partial least squares with variable selection methods: prediction of the total acid number of crude oils.

    PubMed

    Terra, Luciana A; Filgueiras, Paulo R; Tose, Lílian V; Romão, Wanderson; de Souza, Douglas D; de Castro, Eustáquio V R; de Oliveira, Mirela S L; Dias, Júlio C M; Poppi, Ronei J

    2014-10-07

    Negative-ion mode electrospray ionization, ESI(-), with Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was coupled to a Partial Least Squares (PLS) regression and variable selection methods to estimate the total acid number (TAN) of Brazilian crude oil samples. Generally, ESI(-)-FT-ICR mass spectra present a power of resolution of ca. 500,000 and a mass accuracy less than 1 ppm, producing a data matrix containing over 5700 variables per sample. These variables correspond to heteroatom-containing species detected as deprotonated molecules, [M - H](-) ions, which are identified primarily as naphthenic acids, phenols and carbazole analog species. The TAN values for all samples ranged from 0.06 to 3.61 mg of KOH g(-1). To facilitate the spectral interpretation, three methods of variable selection were studied: variable importance in the projection (VIP), interval partial least squares (iPLS) and elimination of uninformative variables (UVE). The UVE method seems to be more appropriate for selecting important variables, reducing the dimension of the variables to 183 and producing a root mean square error of prediction of 0.32 mg of KOH g(-1). By reducing the size of the data, it was possible to relate the selected variables with their corresponding molecular formulas, thus identifying the main chemical species responsible for the TAN values.

  5. GWASinlps: Nonlocal prior based iterative SNP selection tool for genome-wide association studies.

    PubMed

    Sanyal, Nilotpal; Lo, Min-Tzu; Kauppi, Karolina; Djurovic, Srdjan; Andreassen, Ole A; Johnson, Valen E; Chen, Chi-Hua

    2018-06-19

    Multiple marker analysis of the genome-wide association study (GWAS) data has gained ample attention in recent years. However, because of the ultra high-dimensionality of GWAS data, such analysis is challenging. Frequently used penalized regression methods often lead to large number of false positives, whereas Bayesian methods are computationally very expensive. Motivated to ameliorate these issues simultaneously, we consider the novel approach of using nonlocal priors in an iterative variable selection framework. We develop a variable selection method, named, iterative nonlocal prior based selection for GWAS, or GWASinlps, that combines, in an iterative variable selection framework, the computational efficiency of the screen-and-select approach based on some association learning and the parsimonious uncertainty quantification provided by the use of nonlocal priors. The hallmark of our method is the introduction of 'structured screen-and-select' strategy, that considers hierarchical screening, which is not only based on response-predictor associations, but also based on response-response associations, and concatenates variable selection within that hierarchy. Extensive simulation studies with SNPs having realistic linkage disequilibrium structures demonstrate the advantages of our computationally efficient method compared to several frequentist and Bayesian variable selection methods, in terms of true positive rate, false discovery rate, mean squared error, and effect size estimation error. Further, we provide empirical power analysis useful for study design. Finally, a real GWAS data application was considered with human height as phenotype. An R-package for implementing the GWASinlps method is available at https://cran.r-project.org/web/packages/GWASinlps/index.html. Supplementary data are available at Bioinformatics online.

  6. Bayesian Group Bridge for Bi-level Variable Selection.

    PubMed

    Mallick, Himel; Yi, Nengjun

    2017-06-01

    A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.

  7. Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 3. Variable selection in classification.

    PubMed

    Ballabio, Davide; Consonni, Viviana; Mauri, Andrea; Todeschini, Roberto

    2010-01-11

    In multivariate regression and classification issues variable selection is an important procedure used to select an optimal subset of variables with the aim of producing more parsimonious and eventually more predictive models. Variable selection is often necessary when dealing with methodologies that produce thousands of variables, such as Quantitative Structure-Activity Relationships (QSARs) and highly dimensional analytical procedures. In this paper a novel method for variable selection for classification purposes is introduced. This method exploits the recently proposed Canonical Measure of Correlation between two sets of variables (CMC index). The CMC index is in this case calculated for two specific sets of variables, the former being comprised of the independent variables and the latter of the unfolded class matrix. The CMC values, calculated by considering one variable at a time, can be sorted and a ranking of the variables on the basis of their class discrimination capabilities results. Alternatively, CMC index can be calculated for all the possible combinations of variables and the variable subset with the maximal CMC can be selected, but this procedure is computationally more demanding and classification performance of the selected subset is not always the best one. The effectiveness of the CMC index in selecting variables with discriminative ability was compared with that of other well-known strategies for variable selection, such as the Wilks' Lambda, the VIP index based on the Partial Least Squares-Discriminant Analysis, and the selection provided by classification trees. A variable Forward Selection based on the CMC index was finally used in conjunction of Linear Discriminant Analysis. This approach was tested on several chemical data sets. Obtained results were encouraging.

  8. Advances in variable selection methods II: Effect of variable selection method on classification of hydrologically similar watersheds in three Mid-Atlantic ecoregions

    EPA Science Inventory

    Hydrological flow predictions in ungauged and sparsely gauged watersheds use regionalization or classification of hydrologically similar watersheds to develop empirical relationships between hydrologic, climatic, and watershed variables. The watershed classifications may be based...

  9. Input variable selection for data-driven models of Coriolis flowmeters for two-phase flow measurement

    NASA Astrophysics Data System (ADS)

    Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao

    2017-03-01

    Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction.

  10. A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling.

    PubMed

    Deng, Bai-chuan; Yun, Yong-huan; Liang, Yi-zeng; Yi, Lun-zhao

    2014-10-07

    In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.

  11. Evaluation of variable selection methods for random forests and omics data sets.

    PubMed

    Degenhardt, Frauke; Seifert, Stephan; Szymczak, Silke

    2017-10-16

    Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our comparison included the Boruta algorithm, the Vita method, recurrent relative variable importance, a permutation approach and its parametric variant (Altmann) as well as recursive feature elimination (RFE). In our simulation studies, Boruta was the most powerful approach, followed closely by the Vita method. Both approaches demonstrated similar stability in variable selection, while Vita was the most robust approach under a pure null model without any predictor variables related to the outcome. In the analysis of the different experimental data sets, Vita demonstrated slightly better stability in variable selection and was less computationally intensive than Boruta.In conclusion, we recommend the Boruta and Vita approaches for the analysis of high-dimensional data sets. Vita is considerably faster than Boruta and thus more suitable for large data sets, but only Boruta can also be applied in low-dimensional settings. © The Author 2017. Published by Oxford University Press.

  12. An Ensemble Successive Project Algorithm for Liquor Detection Using Near Infrared Sensor.

    PubMed

    Qu, Fangfang; Ren, Dong; Wang, Jihua; Zhang, Zhong; Lu, Na; Meng, Lei

    2016-01-11

    Spectral analysis technique based on near infrared (NIR) sensor is a powerful tool for complex information processing and high precision recognition, and it has been widely applied to quality analysis and online inspection of agricultural products. This paper proposes a new method to address the instability of small sample sizes in the successive projections algorithm (SPA) as well as the lack of association between selected variables and the analyte. The proposed method is an evaluated bootstrap ensemble SPA method (EBSPA) based on a variable evaluation index (EI) for variable selection, and is applied to the quantitative prediction of alcohol concentrations in liquor using NIR sensor. In the experiment, the proposed EBSPA with three kinds of modeling methods are established to test their performance. In addition, the proposed EBSPA combined with partial least square is compared with other state-of-the-art variable selection methods. The results show that the proposed method can solve the defects of SPA and it has the best generalization performance and stability. Furthermore, the physical meaning of the selected variables from the near infrared sensor data is clear, which can effectively reduce the variables and improve their prediction accuracy.

  13. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.

    PubMed

    Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H

    2017-07-01

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.

  14. Assessing the accuracy and stability of variable selection ...

    EPA Pesticide Factsheets

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used, or stepwise procedures are employed which iteratively add/remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating dataset consists of the good/poor condition of n=1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p=212) of landscape features from the StreamCat dataset. Two types of RF models are compared: a full variable set model with all 212 predictors, and a reduced variable set model selected using a backwards elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors, and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substanti

  15. Multivariate fault isolation of batch processes via variable selection in partial least squares discriminant analysis.

    PubMed

    Yan, Zhengbing; Kuang, Te-Hui; Yao, Yuan

    2017-09-01

    In recent years, multivariate statistical monitoring of batch processes has become a popular research topic, wherein multivariate fault isolation is an important step aiming at the identification of the faulty variables contributing most to the detected process abnormality. Although contribution plots have been commonly used in statistical fault isolation, such methods suffer from the smearing effect between correlated variables. In particular, in batch process monitoring, the high autocorrelations and cross-correlations that exist in variable trajectories make the smearing effect unavoidable. To address such a problem, a variable selection-based fault isolation method is proposed in this research, which transforms the fault isolation problem into a variable selection problem in partial least squares discriminant analysis and solves it by calculating a sparse partial least squares model. As different from the traditional methods, the proposed method emphasizes the relative importance of each process variable. Such information may help process engineers in conducting root-cause diagnosis. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  16. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology

    EPA Science Inventory

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...

  17. Selection Practices of Group Leaders: A National Survey.

    ERIC Educational Resources Information Center

    Riva, Maria T.; Lippert, Laurel; Tackett, M. Jan

    2000-01-01

    Study surveys the selection practices of group leaders. Explores methods of selection, variables used to make selection decisions, and the types of selection errors that leaders have experienced. Results suggest that group leaders use clinical judgment to make selection decisions and endorse using some specific variables in selection. (Contains 22…

  18. Selection of key ambient particulate variables for epidemiological studies - applying cluster and heatmap analyses as tools for data reduction.

    PubMed

    Gu, Jianwei; Pitz, Mike; Breitner, Susanne; Birmili, Wolfram; von Klot, Stephanie; Schneider, Alexandra; Soentgen, Jens; Reller, Armin; Peters, Annette; Cyrys, Josef

    2012-10-01

    The success of epidemiological studies depends on the use of appropriate exposure variables. The purpose of this study is to extract a relatively small selection of variables characterizing ambient particulate matter from a large measurement data set. The original data set comprised a total of 96 particulate matter variables that have been continuously measured since 2004 at an urban background aerosol monitoring site in the city of Augsburg, Germany. Many of the original variables were derived from measured particle size distribution (PSD) across the particle diameter range 3 nm to 10 μm, including size-segregated particle number concentration, particle length concentration, particle surface concentration and particle mass concentration. The data set was complemented by integral aerosol variables. These variables were measured by independent instruments, including black carbon, sulfate, particle active surface concentration and particle length concentration. It is obvious that such a large number of measured variables cannot be used in health effect analyses simultaneously. The aim of this study is a pre-screening and a selection of the key variables that will be used as input in forthcoming epidemiological studies. In this study, we present two methods of parameter selection and apply them to data from a two-year period from 2007 to 2008. We used the agglomerative hierarchical cluster method to find groups of similar variables. In total, we selected 15 key variables from 9 clusters which are recommended for epidemiological analyses. We also applied a two-dimensional visualization technique called "heatmap" analysis to the Spearman correlation matrix. 12 key variables were selected using this method. Moreover, the positive matrix factorization (PMF) method was applied to the PSD data to characterize the possible particle sources. Correlations between the variables and PMF factors were used to interpret the meaning of the cluster and the heatmap analyses. Copyright © 2012 Elsevier B.V. All rights reserved.

  19. Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

    PubMed

    Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

    2016-07-01

    We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.

  20. Transferability of optimally-selected climate models in the quantification of climate change impacts on hydrology

    NASA Astrophysics Data System (ADS)

    Chen, Jie; Brissette, François P.; Lucas-Picher, Philippe

    2016-11-01

    Given the ever increasing number of climate change simulations being carried out, it has become impractical to use all of them to cover the uncertainty of climate change impacts. Various methods have been proposed to optimally select subsets of a large ensemble of climate simulations for impact studies. However, the behaviour of optimally-selected subsets of climate simulations for climate change impacts is unknown, since the transfer process from climate projections to the impact study world is usually highly non-linear. Consequently, this study investigates the transferability of optimally-selected subsets of climate simulations in the case of hydrological impacts. Two different methods were used for the optimal selection of subsets of climate scenarios, and both were found to be capable of adequately representing the spread of selected climate model variables contained in the original large ensemble. However, in both cases, the optimal subsets had limited transferability to hydrological impacts. To capture a similar variability in the impact model world, many more simulations have to be used than those that are needed to simply cover variability from the climate model variables' perspective. Overall, both optimal subset selection methods were better than random selection when small subsets were selected from a large ensemble for impact studies. However, as the number of selected simulations increased, random selection often performed better than the two optimal methods. To ensure adequate uncertainty coverage, the results of this study imply that selecting as many climate change simulations as possible is the best avenue. Where this was not possible, the two optimal methods were found to perform adequately.

  1. Variable selection under multiple imputation using the bootstrap in a prognostic study

    PubMed Central

    Heymans, Martijn W; van Buuren, Stef; Knol, Dirk L; van Mechelen, Willem; de Vet, Henrica CW

    2007-01-01

    Background Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. Method In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. Results We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. Conclusion We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values. PMID:17629912

  2. A New Variable Weighting and Selection Procedure for K-Means Cluster Analysis

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these…

  3. Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification

    PubMed Central

    Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang; Bushel, Pierre R.

    2013-01-01

    Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html. PMID:23966761

  4. Balancing precision and risk: should multiple detection methods be analyzed separately in N-mixture models?

    USGS Publications Warehouse

    Graves, Tabitha A.; Royle, J. Andrew; Kendall, Katherine C.; Beier, Paul; Stetz, Jeffrey B.; Macleod, Amy C.

    2012-01-01

    Using multiple detection methods can increase the number, kind, and distribution of individuals sampled, which may increase accuracy and precision and reduce cost of population abundance estimates. However, when variables influencing abundance are of interest, if individuals detected via different methods are influenced by the landscape differently, separate analysis of multiple detection methods may be more appropriate. We evaluated the effects of combining two detection methods on the identification of variables important to local abundance using detections of grizzly bears with hair traps (systematic) and bear rubs (opportunistic). We used hierarchical abundance models (N-mixture models) with separate model components for each detection method. If both methods sample the same population, the use of either data set alone should (1) lead to the selection of the same variables as important and (2) provide similar estimates of relative local abundance. We hypothesized that the inclusion of 2 detection methods versus either method alone should (3) yield more support for variables identified in single method analyses (i.e. fewer variables and models with greater weight), and (4) improve precision of covariate estimates for variables selected in both separate and combined analyses because sample size is larger. As expected, joint analysis of both methods increased precision as well as certainty in variable and model selection. However, the single-method analyses identified different variables and the resulting predicted abundances had different spatial distributions. We recommend comparing single-method and jointly modeled results to identify the presence of individual heterogeneity between detection methods in N-mixture models, along with consideration of detection probabilities, correlations among variables, and tolerance to risk of failing to identify variables important to a subset of the population. The benefits of increased precision should be weighed against those risks. The analysis framework presented here will be useful for other species exhibiting heterogeneity by detection method.

  5. [Application of characteristic NIR variables selection in portable detection of soluble solids content of apple by near infrared spectroscopy].

    PubMed

    Fan, Shu-Xiang; Huang, Wen-Qian; Li, Jiang-Bo; Guo, Zhi-Ming; Zhaq, Chun-Jiang

    2014-10-01

    In order to detect the soluble solids content(SSC)of apple conveniently and rapidly, a ring fiber probe and a portable spectrometer were applied to obtain the spectroscopy of apple. Different wavelength variable selection methods, including unin- formative variable elimination (UVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm (GA) were pro- posed to select effective wavelength variables of the NIR spectroscopy of the SSC in apple based on PLS. The back interval LS- SVM (BiLS-SVM) and GA were used to select effective wavelength variables based on LS-SVM. Selected wavelength variables and full wavelength range were set as input variables of PLS model and LS-SVM model, respectively. The results indicated that PLS model built using GA-CARS on 50 characteristic variables selected from full-spectrum which had 1512 wavelengths achieved the optimal performance. The correlation coefficient (Rp) and root mean square error of prediction (RMSEP) for prediction sets were 0.962, 0.403°Brix respectively for SSC. The proposed method of GA-CARS could effectively simplify the portable detection model of SSC in apple based on near infrared spectroscopy and enhance the predictive precision. The study can provide a reference for the development of portable apple soluble solids content spectrometer.

  6. Evaluation of alternative model selection criteria in the analysis of unimodal response curves using CART

    USGS Publications Warehouse

    Ribic, C.A.; Miller, T.W.

    1998-01-01

    We investigated CART performance with a unimodal response curve for one continuous response and four continuous explanatory variables, where two variables were important (ie directly related to the response) and the other two were not. We explored performance under three relationship strengths and two explanatory variable conditions: equal importance and one variable four times as important as the other. We compared CART variable selection performance using three tree-selection rules ('minimum risk', 'minimum risk complexity', 'one standard error') to stepwise polynomial ordinary least squares (OLS) under four sample size conditions. The one-standard-error and minimum-risk-complexity methods performed about as well as stepwise OLS with large sample sizes when the relationship was strong. With weaker relationships, equally important explanatory variables and larger sample sizes, the one-standard-error and minimum-risk-complexity rules performed better than stepwise OLS. With weaker relationships and explanatory variables of unequal importance, tree-structured methods did not perform as well as stepwise OLS. Comparing performance within tree-structured methods, with a strong relationship and equally important explanatory variables, the one-standard-error-rule was more likely to choose the correct model than were the other tree-selection rules 1) with weaker relationships and equally important explanatory variables; and 2) under all relationship strengths when explanatory variables were of unequal importance and sample sizes were lower.

  7. Variable selection in discrete survival models including heterogeneity.

    PubMed

    Groll, Andreas; Tutz, Gerhard

    2017-04-01

    Several variable selection procedures are available for continuous time-to-event data. However, if time is measured in a discrete way and therefore many ties occur models for continuous time are inadequate. We propose penalized likelihood methods that perform efficient variable selection in discrete survival modeling with explicit modeling of the heterogeneity in the population. The method is based on a combination of ridge and lasso type penalties that are tailored to the case of discrete survival. The performance is studied in simulation studies and an application to the birth of the first child.

  8. A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks.

    PubMed

    Mujalli, Randa Oqab; de Oña, Juan

    2011-10-01

    This study describes a method for reducing the number of variables frequently considered in modeling the severity of traffic accidents. The method's efficiency is assessed by constructing Bayesian networks (BN). It is based on a two stage selection process. Several variable selection algorithms, commonly used in data mining, are applied in order to select subsets of variables. BNs are built using the selected subsets and their performance is compared with the original BN (with all the variables) using five indicators. The BNs that improve the indicators' values are further analyzed for identifying the most significant variables (accident type, age, atmospheric factors, gender, lighting, number of injured, and occupant involved). A new BN is built using these variables, where the results of the indicators indicate, in most of the cases, a statistically significant improvement with respect to the original BN. It is possible to reduce the number of variables used to model traffic accidents injury severity through BNs without reducing the performance of the model. The study provides the safety analysts a methodology that could be used to minimize the number of variables used in order to determine efficiently the injury severity of traffic accidents without reducing the performance of the model. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents

    PubMed Central

    Guo, Pi; Zeng, Fangfang; Hu, Xiaomin; Zhang, Dingmei; Zhu, Shuming; Deng, Yu; Hao, Yuantao

    2015-01-01

    Objectives In epidemiological studies, it is important to identify independent associations between collective exposures and a health outcome. The current stepwise selection technique ignores stochastic errors and suffers from a lack of stability. The alternative LASSO-penalized regression model can be applied to detect significant predictors from a pool of candidate variables. However, this technique is prone to false positives and tends to create excessive biases. It remains challenging to develop robust variable selection methods and enhance predictability. Material and methods Two improved algorithms denoted the two-stage hybrid and bootstrap ranking procedures, both using a LASSO-type penalty, were developed for epidemiological association analysis. The performance of the proposed procedures and other methods including conventional LASSO, Bolasso, stepwise and stability selection models were evaluated using intensive simulation. In addition, methods were compared by using an empirical analysis based on large-scale survey data of hepatitis B infection-relevant factors among Guangdong residents. Results The proposed procedures produced comparable or less biased selection results when compared to conventional variable selection models. In total, the two newly proposed procedures were stable with respect to various scenarios of simulation, demonstrating a higher power and a lower false positive rate during variable selection than the compared methods. In empirical analysis, the proposed procedures yielding a sparse set of hepatitis B infection-relevant factors gave the best predictive performance and showed that the procedures were able to select a more stringent set of factors. The individual history of hepatitis B vaccination, family and individual history of hepatitis B infection were associated with hepatitis B infection in the studied residents according to the proposed procedures. Conclusions The newly proposed procedures improve the identification of significant variables and enable us to derive a new insight into epidemiological association analysis. PMID:26214802

  10. Methodological development for selection of significant predictors explaining fatal road accidents.

    PubMed

    Dadashova, Bahar; Arenas-Ramírez, Blanca; Mira-McWilliams, José; Aparicio-Izquierdo, Francisco

    2016-05-01

    Identification of the most relevant factors for explaining road accident occurrence is an important issue in road safety research, particularly for future decision-making processes in transport policy. However model selection for this particular purpose is still an ongoing research. In this paper we propose a methodological development for model selection which addresses both explanatory variable and adequate model selection issues. A variable selection procedure, TIM (two-input model) method is carried out by combining neural network design and statistical approaches. The error structure of the fitted model is assumed to follow an autoregressive process. All models are estimated using Markov Chain Monte Carlo method where the model parameters are assigned non-informative prior distributions. The final model is built using the results of the variable selection. For the application of the proposed methodology the number of fatal accidents in Spain during 2000-2011 was used. This indicator has experienced the maximum reduction internationally during the indicated years thus making it an interesting time series from a road safety policy perspective. Hence the identification of the variables that have affected this reduction is of particular interest for future decision making. The results of the variable selection process show that the selected variables are main subjects of road safety policy measures. Published by Elsevier Ltd.

  11. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    USGS Publications Warehouse

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.

  12. [Variable selection methods combined with local linear embedding theory used for optimization of near infrared spectral quantitative models].

    PubMed

    Hao, Yong; Sun, Xu-Dong; Yang, Qiang

    2012-12-01

    Variables selection strategy combined with local linear embedding (LLE) was introduced for the analysis of complex samples by using near infrared spectroscopy (NIRS). Three methods include Monte Carlo uninformation variable elimination (MCUVE), successive projections algorithm (SPA) and MCUVE connected with SPA were used for eliminating redundancy spectral variables. Partial least squares regression (PLSR) and LLE-PLSR were used for modeling complex samples. The results shown that MCUVE can both extract effective informative variables and improve the precision of models. Compared with PLSR models, LLE-PLSR models can achieve more accurate analysis results. MCUVE combined with LLE-PLSR is an effective modeling method for NIRS quantitative analysis.

  13. Unbiased split variable selection for random survival forests using maximally selected rank statistics.

    PubMed

    Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

    2017-04-15

    The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  14. Concave 1-norm group selection

    PubMed Central

    Jiang, Dingfeng; Huang, Jian

    2015-01-01

    Grouping structures arise naturally in many high-dimensional problems. Incorporation of such information can improve model fitting and variable selection. Existing group selection methods, such as the group Lasso, require correct membership. However, in practice it can be difficult to correctly specify group membership of all variables. Thus, it is important to develop group selection methods that are robust against group mis-specification. Also, it is desirable to select groups as well as individual variables in many applications. We propose a class of concave \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$1$\\end{document}-norm group penalties that is robust to grouping structure and can perform bi-level selection. A coordinate descent algorithm is developed to calculate solutions of the proposed group selection method. Theoretical convergence of the algorithm is proved under certain regularity conditions. Comparison with other methods suggests the proposed method is the most robust approach under membership mis-specification. Simulation studies and real data application indicate that the \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$1$\\end{document}-norm concave group selection approach achieves better control of false discovery rates. An R package grppenalty implementing the proposed method is available at CRAN. PMID:25417206

  15. [Measurement of Water COD Based on UV-Vis Spectroscopy Technology].

    PubMed

    Wang, Xiao-ming; Zhang, Hai-liang; Luo, Wei; Liu, Xue-mei

    2016-01-01

    Ultraviolet/visible (UV/Vis) spectroscopy technology was used to measure water COD. A total of 135 water samples were collected from Zhejiang province. Raw spectra with 3 different pretreatment methods (Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV) and 1st Derivatives were compared to determine the optimal pretreatment method for analysis. Spectral variable selection is an important strategy in spectrum modeling analysis, because it tends to parsimonious data representation and can lead to multivariate models with better performance. In order to simply calibration models, the preprocessed spectra were then used to select sensitive wavelengths by competitive adaptive reweighted sampling (CARS), Random frog and Successive Genetic Algorithm (GA) methods. Different numbers of sensitive wavelengths were selected by different variable selection methods with SNV preprocessing method. Partial least squares (PLS) was used to build models with the full spectra, and Extreme Learning Machine (ELM) was applied to build models with the selected wavelength variables. The overall results showed that ELM model performed better than PLS model, and the ELM model with the selected wavelengths based on CARS obtained the best results with the determination coefficient (R2), RMSEP and RPD were 0.82, 14.48 and 2.34 for prediction set. The results indicated that it was feasible to use UV/Vis with characteristic wavelengths which were obtained by CARS variable selection method, combined with ELM calibration could apply for the rapid and accurate determination of COD in aquaculture water. Moreover, this study laid the foundation for further implementation of online analysis of aquaculture water and rapid determination of other water quality parameters.

  16. Variable selection in subdistribution hazard frailty models with competing risks data

    PubMed Central

    Do Ha, Il; Lee, Minjung; Oh, Seungyoung; Jeong, Jong-Hyeon; Sylvester, Richard; Lee, Youngjo

    2014-01-01

    The proportional subdistribution hazards model (i.e. Fine-Gray model) has been widely used for analyzing univariate competing risks data. Recently, this model has been extended to clustered competing risks data via frailty. To the best of our knowledge, however, there has been no literature on variable selection method for such competing risks frailty models. In this paper, we propose a simple but unified procedure via a penalized h-likelihood (HL) for variable selection of fixed effects in a general class of subdistribution hazard frailty models, in which random effects may be shared or correlated. We consider three penalty functions (LASSO, SCAD and HL) in our variable selection procedure. We show that the proposed method can be easily implemented using a slight modification to existing h-likelihood estimation approaches. Numerical studies demonstrate that the proposed procedure using the HL penalty performs well, providing a higher probability of choosing the true model than LASSO and SCAD methods without losing prediction accuracy. The usefulness of the new method is illustrated using two actual data sets from multi-center clinical trials. PMID:25042872

  17. Variable selection in semiparametric cure models based on penalized likelihood, with application to breast cancer clinical trials.

    PubMed

    Liu, Xiang; Peng, Yingwei; Tu, Dongsheng; Liang, Hua

    2012-10-30

    Survival data with a sizable cure fraction are commonly encountered in cancer research. The semiparametric proportional hazards cure model has been recently used to analyze such data. As seen in the analysis of data from a breast cancer study, a variable selection approach is needed to identify important factors in predicting the cure status and risk of breast cancer recurrence. However, no specific variable selection method for the cure model is available. In this paper, we present a variable selection approach with penalized likelihood for the cure model. The estimation can be implemented easily by combining the computational methods for penalized logistic regression and the penalized Cox proportional hazards models with the expectation-maximization algorithm. We illustrate the proposed approach on data from a breast cancer study. We conducted Monte Carlo simulations to evaluate the performance of the proposed method. We used and compared different penalty functions in the simulation studies. Copyright © 2012 John Wiley & Sons, Ltd.

  18. Diversified models for portfolio selection based on uncertain semivariance

    NASA Astrophysics Data System (ADS)

    Chen, Lin; Peng, Jin; Zhang, Bo; Rosyida, Isnaini

    2017-02-01

    Since the financial markets are complex, sometimes the future security returns are represented mainly based on experts' estimations due to lack of historical data. This paper proposes a semivariance method for diversified portfolio selection, in which the security returns are given subjective to experts' estimations and depicted as uncertain variables. In the paper, three properties of the semivariance of uncertain variables are verified. Based on the concept of semivariance of uncertain variables, two types of mean-semivariance diversified models for uncertain portfolio selection are proposed. Since the models are complex, a hybrid intelligent algorithm which is based on 99-method and genetic algorithm is designed to solve the models. In this hybrid intelligent algorithm, 99-method is applied to compute the expected value and semivariance of uncertain variables, and genetic algorithm is employed to seek the best allocation plan for portfolio selection. At last, several numerical examples are presented to illustrate the modelling idea and the effectiveness of the algorithm.

  19. Variable Selection for Regression Models of Percentile Flows

    NASA Astrophysics Data System (ADS)

    Fouad, G.

    2017-12-01

    Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high degree of multicollinearity, possibly illustrating the co-evolution of climatic and physiographic conditions. Given the ineffectiveness of many variables used here, future work should develop new variables that target specific processes associated with percentile flows.

  20. Delay correlation analysis and representation for vital complaint VHDL models

    DOEpatents

    Rich, Marvin J.; Misra, Ashutosh

    2004-11-09

    A method and system unbind a rise/fall tuple of a VHDL generic variable and create rise time and fall time generics of each generic variable that are independent of each other. Then, according to a predetermined correlation policy, the method and system collect delay values in a VHDL standard delay file, sort the delay values, remove duplicate delay values, group the delay values into correlation sets, and output an analysis file. The correlation policy may include collecting all generic variables in a VHDL standard delay file, selecting each generic variable, and performing reductions on the set of delay values associated with each selected generic variable.

  1. Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology

    PubMed Central

    2012-01-01

    Background An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. Results We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. Conclusions The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge. PMID:22578440

  2. Bayesian inference for the genetic control of water deficit tolerance in spring wheat by stochastic search variable selection.

    PubMed

    Safari, Parviz; Danyali, Syyedeh Fatemeh; Rahimi, Mehdi

    2018-06-02

    Drought is the main abiotic stress seriously influencing wheat production. Information about the inheritance of drought tolerance is necessary to determine the most appropriate strategy to develop tolerant cultivars and populations. In this study, generation means analysis to identify the genetic effects controlling grain yield inheritance in water deficit and normal conditions was considered as a model selection problem in a Bayesian framework. Stochastic search variable selection (SSVS) was applied to identify the most important genetic effects and the best fitted models using different generations obtained from two crosses applying two water regimes in two growing seasons. The SSVS is used to evaluate the effect of each variable on the dependent variable via posterior variable inclusion probabilities. The model with the highest posterior probability is selected as the best model. In this study, the grain yield was controlled by the main effects (additive and non-additive effects) and epistatic. The results demonstrate that breeding methods such as recurrent selection and subsequent pedigree method and hybrid production can be useful to improve grain yield.

  3. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data

    NASA Astrophysics Data System (ADS)

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-01

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively.

  4. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data.

    PubMed

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-05

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Integrative Bayesian variable selection with gene-based informative priors for genome-wide association studies.

    PubMed

    Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei

    2014-12-10

    Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.

  6. Detection of Nitrogen Content in Rubber Leaves Using Near-Infrared (NIR) Spectroscopy with Correlation-Based Successive Projections Algorithm (SPA).

    PubMed

    Tang, Rongnian; Chen, Xupeng; Li, Chuang

    2018-05-01

    Near-infrared spectroscopy is an efficient, low-cost technology that has potential as an accurate method in detecting the nitrogen content of natural rubber leaves. Successive projections algorithm (SPA) is a widely used variable selection method for multivariate calibration, which uses projection operations to select a variable subset with minimum multi-collinearity. However, due to the fluctuation of correlation between variables, high collinearity may still exist in non-adjacent variables of subset obtained by basic SPA. Based on analysis to the correlation matrix of the spectra data, this paper proposed a correlation-based SPA (CB-SPA) to apply the successive projections algorithm in regions with consistent correlation. The result shows that CB-SPA can select variable subsets with more valuable variables and less multi-collinearity. Meanwhile, models established by the CB-SPA subset outperform basic SPA subsets in predicting nitrogen content in terms of both cross-validation and external prediction. Moreover, CB-SPA is assured to be more efficient, for the time cost in its selection procedure is one-twelfth that of the basic SPA.

  7. Determination of main fruits in adulterated nectars by ATR-FTIR spectroscopy combined with multivariate calibration and variable selection methods.

    PubMed

    Miaw, Carolina Sheng Whei; Assis, Camila; Silva, Alessandro Rangel Carolino Sales; Cunha, Maria Luísa; Sena, Marcelo Martins; de Souza, Scheilla Vitorino Carvalho

    2018-07-15

    Grape, orange, peach and passion fruit nectars were formulated and adulterated by dilution with syrup, apple and cashew juices at 10 levels for each adulterant. Attenuated total reflectance Fourier transform mid infrared (ATR-FTIR) spectra were obtained. Partial least squares (PLS) multivariate calibration models allied to different variable selection methods, such as interval partial least squares (iPLS), ordered predictors selection (OPS) and genetic algorithm (GA), were used to quantify the main fruits. PLS improved by iPLS-OPS variable selection showed the highest predictive capacity to quantify the main fruit contents. The selected variables in the final models varied from 72 to 100; the root mean square errors of prediction were estimated from 0.5 to 2.6%; the correlation coefficients of prediction ranged from 0.948 to 0.990; and, the mean relative errors of prediction varied from 3.0 to 6.7%. All of the developed models were validated. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Model selection bias and Freedman's paradox

    USGS Publications Warehouse

    Lukacs, P.M.; Burnham, K.P.; Anderson, D.R.

    2010-01-01

    In situations where limited knowledge of a system exists and the ratio of data points to variables is small, variable selection methods can often be misleading. Freedman (Am Stat 37:152-155, 1983) demonstrated how common it is to select completely unrelated variables as highly "significant" when the number of data points is similar in magnitude to the number of variables. A new type of model averaging estimator based on model selection with Akaike's AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly "best" model from a (often large) set of models employing many predictor variables. The new model averaging estimator helps reduce these problems and provides confidence interval coverage at the nominal level while traditional stepwise selection has poor inferential properties. ?? The Institute of Statistical Mathematics, Tokyo 2009.

  9. Free variable selection QSPR study to predict 19F chemical shifts of some fluorinated organic compounds using Random Forest and RBF-PLS methods

    NASA Astrophysics Data System (ADS)

    Goudarzi, Nasser

    2016-04-01

    In this work, two new and powerful chemometrics methods are applied for the modeling and prediction of the 19F chemical shift values of some fluorinated organic compounds. The radial basis function-partial least square (RBF-PLS) and random forest (RF) are employed to construct the models to predict the 19F chemical shifts. In this study, we didn't used from any variable selection method and RF method can be used as variable selection and modeling technique. Effects of the important parameters affecting the ability of the RF prediction power such as the number of trees (nt) and the number of randomly selected variables to split each node (m) were investigated. The root-mean-square errors of prediction (RMSEP) for the training set and the prediction set for the RBF-PLS and RF models were 44.70, 23.86, 29.77, and 23.69, respectively. Also, the correlation coefficients of the prediction set for the RBF-PLS and RF models were 0.8684 and 0.9313, respectively. The results obtained reveal that the RF model can be used as a powerful chemometrics tool for the quantitative structure-property relationship (QSPR) studies.

  10. A regularized variable selection procedure in additive hazards model with stratified case-cohort design.

    PubMed

    Ni, Ai; Cai, Jianwen

    2018-07-01

    Case-cohort designs are commonly used in large epidemiological studies to reduce the cost associated with covariate measurement. In many such studies the number of covariates is very large. An efficient variable selection method is needed for case-cohort studies where the covariates are only observed in a subset of the sample. Current literature on this topic has been focused on the proportional hazards model. However, in many studies the additive hazards model is preferred over the proportional hazards model either because the proportional hazards assumption is violated or the additive hazards model provides more relevent information to the research question. Motivated by one such study, the Atherosclerosis Risk in Communities (ARIC) study, we investigate the properties of a regularized variable selection procedure in stratified case-cohort design under an additive hazards model with a diverging number of parameters. We establish the consistency and asymptotic normality of the penalized estimator and prove its oracle property. Simulation studies are conducted to assess the finite sample performance of the proposed method with a modified cross-validation tuning parameter selection methods. We apply the variable selection procedure to the ARIC study to demonstrate its practical use.

  11. An imbalance fault detection method based on data normalization and EMD for marine current turbines.

    PubMed

    Zhang, Milu; Wang, Tianzhen; Tang, Tianhao; Benbouzid, Mohamed; Diallo, Demba

    2017-05-01

    This paper proposes an imbalance fault detection method based on data normalization and Empirical Mode Decomposition (EMD) for variable speed direct-drive Marine Current Turbine (MCT) system. The method is based on the MCT stator current under the condition of wave and turbulence. The goal of this method is to extract blade imbalance fault feature, which is concealed by the supply frequency and the environment noise. First, a Generalized Likelihood Ratio Test (GLRT) detector is developed and the monitoring variable is selected by analyzing the relationship between the variables. Then, the selected monitoring variable is converted into a time series through data normalization, which makes the imbalance fault characteristic frequency into a constant. At the end, the monitoring variable is filtered out by EMD method to eliminate the effect of turbulence. The experiments show that the proposed method is robust against turbulence through comparing the different fault severities and the different turbulence intensities. Comparison with other methods, the experimental results indicate the feasibility and efficacy of the proposed method. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  12. A Study of Quasar Selection in the Supernova Fields of the Dark Energy Survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tie, S. S.; Martini, P.; Mudd, D.

    In this paper, we present a study of quasar selection using the supernova fields of the Dark Energy Survey (DES). We used a quasar catalog from an overlapping portion of the SDSS Stripe 82 region to quantify the completeness and efficiency of selection methods involving color, probabilistic modeling, variability, and combinations of color/probabilistic modeling with variability. In all cases, we considered only objects that appear as point sources in the DES images. We examine color selection methods based on the Wide-field Infrared Survey Explorer (WISE) mid-IR W1-W2 color, a mixture of WISE and DES colors (g - i and i-W1),more » and a mixture of Vista Hemisphere Survey and DES colors (g - i and i - K). For probabilistic quasar selection, we used XDQSO, an algorithm that employs an empirical multi-wavelength flux model of quasars to assign quasar probabilities. Our variability selection uses the multi-band χ 2-probability that sources are constant in the DES Year 1 griz-band light curves. The completeness and efficiency are calculated relative to an underlying sample of point sources that are detected in the required selection bands and pass our data quality and photometric error cuts. We conduct our analyses at two magnitude limits, i < 19.8 mag and i < 22 mag. For the subset of sources with W1 and W2 detections, the W1-W2 color or XDQSOz method combined with variability gives the highest completenesses of >85% for both i-band magnitude limits and efficiencies of >80% to the bright limit and >60% to the faint limit; however, the giW1 and giW1+variability methods give the highest quasar surface densities. The XDQSOz method and combinations of W1W2/giW1/XDQSOz with variability are among the better selection methods when both high completeness and high efficiency are desired. We also present the OzDES Quasar Catalog of 1263 spectroscopically confirmed quasars from three years of OzDES observation in the 30 deg 2 of the DES supernova fields. Finally, the catalog includes quasars with redshifts up to z ~ 4 and brighter than i = 22 mag, although the catalog is not complete up to this magnitude limit.« less

  13. A Study of Quasar Selection in the Supernova Fields of the Dark Energy Survey

    DOE PAGES

    Tie, S. S.; Martini, P.; Mudd, D.; ...

    2017-02-15

    In this paper, we present a study of quasar selection using the supernova fields of the Dark Energy Survey (DES). We used a quasar catalog from an overlapping portion of the SDSS Stripe 82 region to quantify the completeness and efficiency of selection methods involving color, probabilistic modeling, variability, and combinations of color/probabilistic modeling with variability. In all cases, we considered only objects that appear as point sources in the DES images. We examine color selection methods based on the Wide-field Infrared Survey Explorer (WISE) mid-IR W1-W2 color, a mixture of WISE and DES colors (g - i and i-W1),more » and a mixture of Vista Hemisphere Survey and DES colors (g - i and i - K). For probabilistic quasar selection, we used XDQSO, an algorithm that employs an empirical multi-wavelength flux model of quasars to assign quasar probabilities. Our variability selection uses the multi-band χ 2-probability that sources are constant in the DES Year 1 griz-band light curves. The completeness and efficiency are calculated relative to an underlying sample of point sources that are detected in the required selection bands and pass our data quality and photometric error cuts. We conduct our analyses at two magnitude limits, i < 19.8 mag and i < 22 mag. For the subset of sources with W1 and W2 detections, the W1-W2 color or XDQSOz method combined with variability gives the highest completenesses of >85% for both i-band magnitude limits and efficiencies of >80% to the bright limit and >60% to the faint limit; however, the giW1 and giW1+variability methods give the highest quasar surface densities. The XDQSOz method and combinations of W1W2/giW1/XDQSOz with variability are among the better selection methods when both high completeness and high efficiency are desired. We also present the OzDES Quasar Catalog of 1263 spectroscopically confirmed quasars from three years of OzDES observation in the 30 deg 2 of the DES supernova fields. Finally, the catalog includes quasars with redshifts up to z ~ 4 and brighter than i = 22 mag, although the catalog is not complete up to this magnitude limit.« less

  14. TIMSS 2011 Student and Teacher Predictors for Mathematics Achievement Explored and Identified via Elastic Net.

    PubMed

    Yoo, Jin Eun

    2018-01-01

    A substantial body of research has been conducted on variables relating to students' mathematics achievement with TIMSS. However, most studies have employed conventional statistical methods, and have focused on selected few indicators instead of utilizing hundreds of variables TIMSS provides. This study aimed to find a prediction model for students' mathematics achievement using as many TIMSS student and teacher variables as possible. Elastic net, the selected machine learning technique in this study, takes advantage of both LASSO and ridge in terms of variable selection and multicollinearity, respectively. A logistic regression model was also employed to predict TIMSS 2011 Korean 4th graders' mathematics achievement. Ten-fold cross-validation with mean squared error was employed to determine the elastic net regularization parameter. Among 162 TIMSS variables explored, 12 student and 5 teacher variables were selected in the elastic net model, and the prediction accuracy, sensitivity, and specificity were 76.06, 70.23, and 80.34%, respectively. This study showed that the elastic net method can be successfully applied to educational large-scale data by selecting a subset of variables with reasonable prediction accuracy and finding new variables to predict students' mathematics achievement. Newly found variables via machine learning can shed light on the existing theories from a totally different perspective, which in turn propagates creation of a new theory or complement of existing ones. This study also examined the current scale development convention from a machine learning perspective.

  15. TIMSS 2011 Student and Teacher Predictors for Mathematics Achievement Explored and Identified via Elastic Net

    PubMed Central

    Yoo, Jin Eun

    2018-01-01

    A substantial body of research has been conducted on variables relating to students' mathematics achievement with TIMSS. However, most studies have employed conventional statistical methods, and have focused on selected few indicators instead of utilizing hundreds of variables TIMSS provides. This study aimed to find a prediction model for students' mathematics achievement using as many TIMSS student and teacher variables as possible. Elastic net, the selected machine learning technique in this study, takes advantage of both LASSO and ridge in terms of variable selection and multicollinearity, respectively. A logistic regression model was also employed to predict TIMSS 2011 Korean 4th graders' mathematics achievement. Ten-fold cross-validation with mean squared error was employed to determine the elastic net regularization parameter. Among 162 TIMSS variables explored, 12 student and 5 teacher variables were selected in the elastic net model, and the prediction accuracy, sensitivity, and specificity were 76.06, 70.23, and 80.34%, respectively. This study showed that the elastic net method can be successfully applied to educational large-scale data by selecting a subset of variables with reasonable prediction accuracy and finding new variables to predict students' mathematics achievement. Newly found variables via machine learning can shed light on the existing theories from a totally different perspective, which in turn propagates creation of a new theory or complement of existing ones. This study also examined the current scale development convention from a machine learning perspective. PMID:29599736

  16. CORRELATION PURSUIT: FORWARD STEPWISE VARIABLE SELECTION FOR INDEX MODELS

    PubMed Central

    Zhong, Wenxuan; Zhang, Tingting; Zhu, Yu; Liu, Jun S.

    2012-01-01

    In this article, a stepwise procedure, correlation pursuit (COP), is developed for variable selection under the sufficient dimension reduction framework, in which the response variable Y is influenced by the predictors X1, X2, …, Xp through an unknown function of a few linear combinations of them. Unlike linear stepwise regression, COP does not impose a special form of relationship (such as linear) between the response variable and the predictor variables. The COP procedure selects variables that attain the maximum correlation between the transformed response and the linear combination of the variables. Various asymptotic properties of the COP procedure are established, and in particular, its variable selection performance under diverging number of predictors and sample size has been investigated. The excellent empirical performance of the COP procedure in comparison with existing methods are demonstrated by both extensive simulation studies and a real example in functional genomics. PMID:23243388

  17. Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have difficulties when non-informative variables (i.e., random noise) are included in the model. Furthermore, the distribution of the random noise greatly impacts the…

  18. Variable selection based near infrared spectroscopy quantitative and qualitative analysis on wheat wet gluten

    NASA Astrophysics Data System (ADS)

    Lü, Chengxu; Jiang, Xunpeng; Zhou, Xingfan; Zhang, Yinqiao; Zhang, Naiqian; Wei, Chongfeng; Mao, Wenhua

    2017-10-01

    Wet gluten is a useful quality indicator for wheat, and short wave near infrared spectroscopy (NIRS) is a high performance technique with the advantage of economic rapid and nondestructive test. To study the feasibility of short wave NIRS analyzing wet gluten directly from wheat seed, 54 representative wheat seed samples were collected and scanned by spectrometer. 8 spectral pretreatment method and genetic algorithm (GA) variable selection method were used to optimize analysis. Both quantitative and qualitative model of wet gluten were built by partial least squares regression and discriminate analysis. For quantitative analysis, normalization is the optimized pretreatment method, 17 wet gluten sensitive variables are selected by GA, and GA model performs a better result than that of all variable model, with R2V=0.88, and RMSEV=1.47. For qualitative analysis, automatic weighted least squares baseline is the optimized pretreatment method, all variable models perform better results than those of GA models. The correct classification rates of 3 class of <24%, 24-30%, >30% wet gluten content are 95.45, 84.52, and 90.00%, respectively. The short wave NIRS technique shows potential for both quantitative and qualitative analysis of wet gluten for wheat seed.

  19. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    PubMed Central

    Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

    2014-01-01

    Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565

  20. Minimization search method for data inversion

    NASA Technical Reports Server (NTRS)

    Fymat, A. L.

    1975-01-01

    Technique has been developed for determining values of selected subsets of independent variables in mathematical formulations. Required computation time increases with first power of the number of variables. This is in contrast with classical minimization methods for which computational time increases with third power of the number of variables.

  1. Using Bayesian variable selection to analyze regular resolution IV two-level fractional factorial designs

    DOE PAGES

    Chipman, Hugh A.; Hamada, Michael S.

    2016-06-02

    Regular two-level fractional factorial designs have complete aliasing in which the associated columns of multiple effects are identical. Here, we show how Bayesian variable selection can be used to analyze experiments that use such designs. In addition to sparsity and hierarchy, Bayesian variable selection naturally incorporates heredity . This prior information is used to identify the most likely combinations of active terms. We also demonstrate the method on simulated and real experiments.

  2. Using Bayesian variable selection to analyze regular resolution IV two-level fractional factorial designs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chipman, Hugh A.; Hamada, Michael S.

    Regular two-level fractional factorial designs have complete aliasing in which the associated columns of multiple effects are identical. Here, we show how Bayesian variable selection can be used to analyze experiments that use such designs. In addition to sparsity and hierarchy, Bayesian variable selection naturally incorporates heredity . This prior information is used to identify the most likely combinations of active terms. We also demonstrate the method on simulated and real experiments.

  3. Selecting minimum dataset soil variables using PLSR as a regressive multivariate method

    NASA Astrophysics Data System (ADS)

    Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.

    2017-04-01

    Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP) statistics was used to quantitatively assess the predictors most relevant for response variable estimation and then for variable selection (Andersen and Bro, 2010). PCA and SDA returned TOC and RFC as influential variables both on the set of chemical and physical data analyzed separately as well as on the whole dataset (Stellacci et al., 2016). Highly weighted variables in PCA were also TEC, followed by K, and AC, followed by Pmac and BD, in the first PC (41.2% of total variance); Olsen P and HA-FA in the second PC (12.6%), Ca in the third (10.6%) component. Variables enabling maximum discrimination among treatments for SDA were WEOC, on the whole dataset, humic substances, followed by Olsen P, EC and clay, in the separate data analyses. The highest PLS-VIP statistics were recorded for Olsen P and Pmac, followed by TOC, TEC, pH and Mg for chemical variables and clay, RFC and AC for the physical variables. Results show that different methods may provide different ranking of the selected variables and the presence of a response variable, in regressive techniques, may affect variable selection. Further investigation with different response variables and with multi-year datasets would allow to better define advantages and limits of single or combined approaches. Acknowledgment The work was supported by the projects "BIOTILLAGE, approcci innovative per il miglioramento delle performances ambientali e produttive dei sistemi cerealicoli no-tillage", financed by PSR-Basilicata 2007-2013, and "DESERT, Low-cost water desalination and sensor technology compact module" financed by ERANET-WATERWORKS 2014. References Andersen C.M. and Bro R., 2010. Variable selection in regression - a tutorial. Journal of Chemometrics, 24 728-737. Armenise et al., 2013. Developing a soil quality index to compare soil fitness for agricultural use under different managements in the mediterranean environment. Soil and Tillage Research, 130:91-98. de Paul Obade et al., 2016. A standardized soil quality index for diverse field conditions. Sci. Total Env. 541:424-434. Pulido Moncada et al., 2014. Data-driven analysis of soil quality indicators using limited data. Geoderma, 235:271-278. Stellacci et al., 2016. Comparison of different multivariate methods to select key soil variables for soil quality indices computation. XLV Congress of the Italian Society of Agronomy (SIA), Sassari, 20-22 September 2016.

  4. Identification of Coffee Varieties Using Laser-Induced Breakdown Spectroscopy and Chemometrics.

    PubMed

    Zhang, Chu; Shen, Tingting; Liu, Fei; He, Yong

    2017-12-31

    We linked coffee quality to its different varieties. This is of interest because the identification of coffee varieties should help coffee trading and consumption. Laser-induced breakdown spectroscopy (LIBS) combined with chemometric methods was used to identify coffee varieties. Wavelet transform (WT) was used to reduce LIBS spectra noise. Partial least squares-discriminant analysis (PLS-DA), radial basis function neural network (RBFNN), and support vector machine (SVM) were used to build classification models. Loadings of principal component analysis (PCA) were used to select the spectral variables contributing most to the identification of coffee varieties. Twenty wavelength variables corresponding to C I, Mg I, Mg II, Al II, CN, H, Ca II, Fe I, K I, Na I, N I, and O I were selected. PLS-DA, RBFNN, and SVM models on selected wavelength variables showed acceptable results. SVM and RBFNN models performed better with a classification accuracy of over 80% in the prediction set, for both full spectra and the selected variables. The overall results indicated that it was feasible to use LIBS and chemometric methods to identify coffee varieties. For further studies, more samples are needed to produce robust classification models, research should be conducted on which methods to use to select spectral peaks that correspond to the elements contributing most to identification, and the methods for acquiring stable spectra should also be studied.

  5. Identification of Coffee Varieties Using Laser-Induced Breakdown Spectroscopy and Chemometrics

    PubMed Central

    Zhang, Chu; Shen, Tingting

    2017-01-01

    We linked coffee quality to its different varieties. This is of interest because the identification of coffee varieties should help coffee trading and consumption. Laser-induced breakdown spectroscopy (LIBS) combined with chemometric methods was used to identify coffee varieties. Wavelet transform (WT) was used to reduce LIBS spectra noise. Partial least squares-discriminant analysis (PLS-DA), radial basis function neural network (RBFNN), and support vector machine (SVM) were used to build classification models. Loadings of principal component analysis (PCA) were used to select the spectral variables contributing most to the identification of coffee varieties. Twenty wavelength variables corresponding to C I, Mg I, Mg II, Al II, CN, H, Ca II, Fe I, K I, Na I, N I, and O I were selected. PLS-DA, RBFNN, and SVM models on selected wavelength variables showed acceptable results. SVM and RBFNN models performed better with a classification accuracy of over 80% in the prediction set, for both full spectra and the selected variables. The overall results indicated that it was feasible to use LIBS and chemometric methods to identify coffee varieties. For further studies, more samples are needed to produce robust classification models, research should be conducted on which methods to use to select spectral peaks that correspond to the elements contributing most to identification, and the methods for acquiring stable spectra should also be studied. PMID:29301228

  6. Selecting risk factors: a comparison of discriminant analysis, logistic regression and Cox's regression model using data from the Tromsø Heart Study.

    PubMed

    Brenn, T; Arnesen, E

    1985-01-01

    For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.

  7. Non-destructive technique for determining the viability of soybean (Glycine max) seeds using FT-NIR spectroscopy.

    PubMed

    Kusumaningrum, Dewi; Lee, Hoonsoo; Lohumi, Santosh; Mo, Changyeun; Kim, Moon S; Cho, Byoung-Kwan

    2018-03-01

    The viability of seeds is important for determining their quality. A high-quality seed is one that has a high capability of germination that is necessary to ensure high productivity. Hence, developing technology for the detection of seed viability is a high priority in agriculture. Fourier transform near-infrared (FT-NIR) spectroscopy is one of the most popular devices among other vibrational spectroscopies. This study aims to use FT-NIR spectroscopy to determine the viability of soybean seeds. Viable and artificial ageing seeds as non-viable soybeans were used in this research. The FT-NIR spectra of soybean seeds were collected and analysed using a partial least-squares discriminant analysis (PLS-DA) to classify viable and non-viable soybean seeds. Moreover, the variable importance in projection (VIP) method for variable selection combined with the PLS-DA was employed. The most effective wavelengths were selected by the VIP method, which selected 146 optimal variables from the full set of 1557 variables. The results demonstrated that the FT-NIR spectral analysis with the PLS-DA method that uses all variables or the selected variables showed good performance based on the high value of prediction accuracy for soybean viability with an accuracy close to 100%. Hence, FT-NIR techniques with a chemometric analysis have the potential for rapidly measuring soybean seed viability. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

  8. Applications of information theory, genetic algorithms, and neural models to predict oil flow

    NASA Astrophysics Data System (ADS)

    Ludwig, Oswaldo; Nunes, Urbano; Araújo, Rui; Schnitman, Leizer; Lepikson, Herman Augusto

    2009-07-01

    This work introduces a new information-theoretic methodology for choosing variables and their time lags in a prediction setting, particularly when neural networks are used in non-linear modeling. The first contribution of this work is the Cross Entropy Function (XEF) proposed to select input variables and their lags in order to compose the input vector of black-box prediction models. The proposed XEF method is more appropriate than the usually applied Cross Correlation Function (XCF) when the relationship among the input and output signals comes from a non-linear dynamic system. The second contribution is a method that minimizes the Joint Conditional Entropy (JCE) between the input and output variables by means of a Genetic Algorithm (GA). The aim is to take into account the dependence among the input variables when selecting the most appropriate set of inputs for a prediction problem. In short, theses methods can be used to assist the selection of input training data that have the necessary information to predict the target data. The proposed methods are applied to a petroleum engineering problem; predicting oil production. Experimental results obtained with a real-world dataset are presented demonstrating the feasibility and effectiveness of the method.

  9. Curve fitting and modeling with splines using statistical variable selection techniques

    NASA Technical Reports Server (NTRS)

    Smith, P. L.

    1982-01-01

    The successful application of statistical variable selection techniques to fit splines is demonstrated. Major emphasis is given to knot selection, but order determination is also discussed. Two FORTRAN backward elimination programs, using the B-spline basis, were developed. The program for knot elimination is compared in detail with two other spline-fitting methods and several statistical software packages. An example is also given for the two-variable case using a tensor product basis, with a theoretical discussion of the difficulties of their use.

  10. Fitting multidimensional splines using statistical variable selection techniques

    NASA Technical Reports Server (NTRS)

    Smith, P. L.

    1982-01-01

    This report demonstrates the successful application of statistical variable selection techniques to fit splines. Major emphasis is given to knot selection, but order determination is also discussed. Two FORTRAN backward elimination programs using the B-spline basis were developed, and the one for knot elimination is compared in detail with two other spline-fitting methods and several statistical software packages. An example is also given for the two-variable case using a tensor product basis, with a theoretical discussion of the difficulties of their use.

  11. Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany

    PubMed Central

    Wang, Zhu; Shuangge, Ma; Wang, Ching-Yun

    2017-01-01

    In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using an open-source R package mpath. PMID:26059498

  12. Sex Bias in Research Design.

    ERIC Educational Resources Information Center

    Grady, Kathleen E.

    1981-01-01

    Presents feminist criticisms of selected aspects of research methods in psychology. Reviews data relevant to sex bias in topic selection, subject selection and single-sex designs, operationalization of variables, testing for sex differences, and interpretation of results. Suggestions for achieving more "sex fair" research methods are discussed.…

  13. Variable screening via quantile partial correlation

    PubMed Central

    Ma, Shujie; Tsai, Chih-Ling

    2016-01-01

    In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented. PMID:28943683

  14. Total sulfur determination in residues of crude oil distillation using FT-IR/ATR and variable selection methods

    NASA Astrophysics Data System (ADS)

    Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; Mello, Paola de Azevedo; Ferrão, Marco Flores; dos Santos, Maria de Fátima Pereira; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes

    2012-04-01

    Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm-1). This model produced a RMSECV of 400 mg kg-1 S and RMSEP of 420 mg kg-1 S, showing a correlation coefficient of 0.990.

  15. Selection of climate change scenario data for impact modelling.

    PubMed

    Sloth Madsen, M; Maule, C Fox; MacKellar, N; Olesen, J E; Christensen, J Hesselbjerg

    2012-01-01

    Impact models investigating climate change effects on food safety often need detailed climate data. The aim of this study was to select climate change projection data for selected crop phenology and mycotoxin impact models. Using the ENSEMBLES database of climate model output, this study illustrates how the projected climate change signal of important variables as temperature, precipitation and relative humidity depends on the choice of the climate model. Using climate change projections from at least two different climate models is recommended to account for model uncertainty. To make the climate projections suitable for impact analysis at the local scale a weather generator approach was adopted. As the weather generator did not treat all the necessary variables, an ad-hoc statistical method was developed to synthesise realistic values of missing variables. The method is presented in this paper, applied to relative humidity, but it could be adopted to other variables if needed.

  16. Using variable rate models to identify genes under selection in sequence pairs: their validity and limitations for EST sequences.

    PubMed

    Church, Sheri A; Livingstone, Kevin; Lai, Zhao; Kozik, Alexander; Knapp, Steven J; Michelmore, Richard W; Rieseberg, Loren H

    2007-02-01

    Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes.

  17. Variable selection for distribution-free models for longitudinal zero-inflated count responses.

    PubMed

    Chen, Tian; Wu, Pan; Tang, Wan; Zhang, Hui; Feng, Changyong; Kowalski, Jeanne; Tu, Xin M

    2016-07-20

    Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  18. Natural image classification driven by human brain activity

    NASA Astrophysics Data System (ADS)

    Zhang, Dai; Peng, Hanyang; Wang, Jinqiao; Tang, Ming; Xue, Rong; Zuo, Zhentao

    2016-03-01

    Natural image classification has been a hot topic in computer vision and pattern recognition research field. Since the performance of an image classification system can be improved by feature selection, many image feature selection methods have been developed. However, the existing supervised feature selection methods are typically driven by the class label information that are identical for different samples from the same class, ignoring with-in class image variability and therefore degrading the feature selection performance. In this study, we propose a novel feature selection method, driven by human brain activity signals collected using fMRI technique when human subjects were viewing natural images of different categories. The fMRI signals associated with subjects viewing different images encode the human perception of natural images, and therefore may capture image variability within- and cross- categories. We then select image features with the guidance of fMRI signals from brain regions with active response to image viewing. Particularly, bag of words features based on GIST descriptor are extracted from natural images for classification, and a sparse regression base feature selection method is adapted to select image features that can best predict fMRI signals. Finally, a classification model is built on the select image features to classify images without fMRI signals. The validation experiments for classifying images from 4 categories of two subjects have demonstrated that our method could achieve much better classification performance than the classifiers built on image feature selected by traditional feature selection methods.

  19. Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany.

    PubMed

    Wang, Zhu; Ma, Shuangge; Wang, Ching-Yun

    2015-09-01

    In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open-source R package mpath. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Identifying public water facilities with low spatial variability of disinfection by-products for epidemiological investigations

    PubMed Central

    Hinckley, A; Bachand, A; Nuckols, J; Reif, J

    2005-01-01

    Background and Aims: Epidemiological studies of disinfection by-products (DBPs) and reproductive outcomes have been hampered by misclassification of exposure. In most epidemiological studies conducted to date, all persons living within the boundaries of a water distribution system have been assigned a common exposure value based on facility-wide averages of trihalomethane (THM) concentrations. Since THMs do not develop uniformly throughout a distribution system, assignment of facility-wide averages may be inappropriate. One approach to mitigate this potential for misclassification is to select communities for epidemiological investigations that are served by distribution systems with consistently low spatial variability of THMs. Methods and Results: A feasibility study was conducted to develop methods for community selection using the Information Collection Rule (ICR) database, assembled by the US Environmental Protection Agency. The ICR database contains quarterly DBP concentrations collected between 1997 and 1998 from the distribution systems of 198 public water facilities with minimum service populations of 100 000 persons. Facilities with low spatial variation of THMs were identified using two methods; 33 facilities were found with low spatial variability based on one or both methods. Because brominated THMs may be important predictors of risk for adverse reproductive outcomes, sites were categorised into three exposure profiles according to proportion of brominated THM species and average TTHM concentration. The correlation between THMs and haloacetic acids (HAAs) in these facilities was evaluated to see whether selection by total trihalomethanes (TTHMs) corresponds to low spatial variability for HAAs. TTHMs were only moderately correlated with HAAs (r = 0.623). Conclusions: Results provide a simple method for a priori selection of sites with low spatial variability from state or national public water facility datasets as a means to reduce exposure misclassification in epidemiological studies of DBPs. PMID:15961627

  1. Harmonize input selection for sediment transport prediction

    NASA Astrophysics Data System (ADS)

    Afan, Haitham Abdulmohsin; Keshtegar, Behrooz; Mohtar, Wan Hanna Melini Wan; El-Shafie, Ahmed

    2017-09-01

    In this paper, three modeling approaches using a Neural Network (NN), Response Surface Method (RSM) and response surface method basis Global Harmony Search (GHS) are applied to predict the daily time series suspended sediment load. Generally, the input variables for forecasting the suspended sediment load are manually selected based on the maximum correlations of input variables in the modeling approaches based on NN and RSM. The RSM is improved to select the input variables by using the errors terms of training data based on the GHS, namely as response surface method and global harmony search (RSM-GHS) modeling method. The second-order polynomial function with cross terms is applied to calibrate the time series suspended sediment load with three, four and five input variables in the proposed RSM-GHS. The linear, square and cross corrections of twenty input variables of antecedent values of suspended sediment load and water discharge are investigated to achieve the best predictions of the RSM based on the GHS method. The performances of the NN, RSM and proposed RSM-GHS including both accuracy and simplicity are compared through several comparative predicted and error statistics. The results illustrated that the proposed RSM-GHS is as uncomplicated as the RSM but performed better, where fewer errors and better correlation was observed (R = 0.95, MAE = 18.09 (ton/day), RMSE = 25.16 (ton/day)) compared to the ANN (R = 0.91, MAE = 20.17 (ton/day), RMSE = 33.09 (ton/day)) and RSM (R = 0.91, MAE = 20.06 (ton/day), RMSE = 31.92 (ton/day)) for all types of input variables.

  2. Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

    NASA Astrophysics Data System (ADS)

    Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

    2015-11-01

    One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.

  3. Total sulfur determination in residues of crude oil distillation using FT-IR/ATR and variable selection methods.

    PubMed

    Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; de Azevedo Mello, Paola; Ferrão, Marco Flores; de Fátima Pereira dos Santos, Maria; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes

    2012-04-01

    Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm(-1)). This model produced a RMSECV of 400 mg kg(-1) S and RMSEP of 420 mg kg(-1) S, showing a correlation coefficient of 0.990. Copyright © 2011 Elsevier B.V. All rights reserved.

  4. Linear and nonlinear variable selection in competing risks data.

    PubMed

    Ren, Xiaowei; Li, Shanshan; Shen, Changyu; Yu, Zhangsheng

    2018-06-15

    Subdistribution hazard model for competing risks data has been applied extensively in clinical researches. Variable selection methods of linear effects for competing risks data have been studied in the past decade. There is no existing work on selection of potential nonlinear effects for subdistribution hazard model. We propose a two-stage procedure to select the linear and nonlinear covariate(s) simultaneously and estimate the selected covariate effect(s). We use spectral decomposition approach to distinguish the linear and nonlinear parts of each covariate and adaptive LASSO to select each of the 2 components. Extensive numerical studies are conducted to demonstrate that the proposed procedure can achieve good selection accuracy in the first stage and small estimation biases in the second stage. The proposed method is applied to analyze a cardiovascular disease data set with competing death causes. Copyright © 2018 John Wiley & Sons, Ltd.

  5. Digital mapping of soil properties in Canadian managed forests at 250 m of resolution using the k-nearest neighbor method

    NASA Astrophysics Data System (ADS)

    Mansuy, N. R.; Paré, D.; Thiffault, E.

    2015-12-01

    Large-scale mapping of soil properties is increasingly important for environmental resource management. Whileforested areas play critical environmental roles at local and global scales, forest soil maps are typically at lowresolution.The objective of this study was to generate continuous national maps of selected soil variables (C, N andsoil texture) for the Canadian managed forest landbase at 250 m resolution. We produced these maps using thekNN method with a training dataset of 538 ground-plots fromthe National Forest Inventory (NFI) across Canada,and 18 environmental predictor variables. The best predictor variables were selected (7 topographic and 5 climaticvariables) using the Least Absolute Shrinkage and Selection Operator method. On average, for all soil variables,topographic predictors explained 37% of the total variance versus 64% for the climatic predictors. Therelative root mean square error (RMSE%) calculated with the leave-one-out cross-validation method gave valuesranging between 22% and 99%, depending on the soil variables tested. RMSE values b 40% can be considered agood imputation in light of the low density of points used in this study. The study demonstrates strong capabilitiesfor mapping forest soil properties at 250m resolution, compared with the current Soil Landscape of CanadaSystem, which is largely oriented towards the agricultural landbase. The methodology used here can potentiallycontribute to the national and international need for spatially explicit soil information in resource managementscience.

  6. Diagnostic for two-mode variable valve activation device

    DOEpatents

    Fedewa, Andrew M

    2014-01-07

    A method is provided for diagnosing a multi-mode valve train device which selectively provides high lift and low lift to a combustion valve of an internal combustion engine having a camshaft phaser actuated by an electric motor. The method includes applying a variable electric current to the electric motor to achieve a desired camshaft phaser operational mode and commanding the multi-mode valve train device to a desired valve train device operational mode selected from a high lift mode and a low lift mode. The method also includes monitoring the variable electric current and calculating a first characteristic of the parameter. The method also includes comparing the calculated first characteristic against a predetermined value of the first characteristic measured when the multi-mode valve train device is known to be in the desired valve train device operational mode.

  7. Evaluation of aircraft crash hazard at Los Alamos National Laboratory facilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Selvage, R.D.

    This report selects a method for use in calculating the frequency of an aircraft crash occurring at selected facilities at the Los Alamos National Laboratory (the Laboratory). The Solomon method was chosen to determine these probabilities. Each variable in the Solomon method is defined and a value for each variable is selected for fourteen facilities at the Laboratory. These values and calculated probabilities are to be used in all safety analysis reports and hazards analyses for the facilities addressed in this report. This report also gives detailed directions to perform aircraft-crash frequency calculations for other facilities. This will ensure thatmore » future aircraft-crash frequency calculations are consistent with calculations in this report.« less

  8. Robust Variable Selection with Exponential Squared Loss.

    PubMed

    Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping

    2013-04-01

    Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are [Formula: see text] and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods.

  9. Robust Variable Selection with Exponential Squared Loss

    PubMed Central

    Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping

    2013-01-01

    Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are n-consistent and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods. PMID:23913996

  10. Variable selection for confounder control, flexible modeling and Collaborative Targeted Minimum Loss-based Estimation in causal inference

    PubMed Central

    Schnitzer, Mireille E.; Lok, Judith J.; Gruber, Susan

    2015-01-01

    This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low-and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios. PMID:26226129

  11. Variable Selection for Confounder Control, Flexible Modeling and Collaborative Targeted Minimum Loss-Based Estimation in Causal Inference.

    PubMed

    Schnitzer, Mireille E; Lok, Judith J; Gruber, Susan

    2016-05-01

    This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010 [27]) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low- and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios.

  12. Evaluation of resource selection methods with different definitions of availability

    Treesearch

    Seth A. McClean; Mark A. Rumble; Rudy M. King; William L. Baker

    1998-01-01

    Because resource selection is of paramount importance to ecology and management of any species, we compared 6 statistical methods of analyzing resource selection data, given the known biological requirements of radiomarked Merriam’s wild turkey (Meleagris gallopavo merriami) hens with poults in the Black Hills of South Dakota. A single variable,...

  13. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    NASA Astrophysics Data System (ADS)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  14. Selecting AGN through Variability in SN Datasets

    NASA Astrophysics Data System (ADS)

    Boutsia, K.; Leibundgut, B.; Trevese, D.; Vagnetti, F.

    2010-07-01

    Variability is a main property of Active Galactic Nuclei (AGN) and it was adopted as a selection criterion using multi epoch surveys conducted for the detection of supernovae (SNe). We have used two SN datasets. First we selected the AXAF field of the STRESS project, centered in the Chandra Deep Field South where, besides the deep X-ray surveys also various optical catalogs exist. Our method yielded 132 variable AGN candidates. We then extended our method including the dataset of the ESSENCE project that has been active for 6 years, producing high quality light curves in the R and I bands. We obtained a sample of ˜4800 variable sources, down to R=22, in the whole 12 deg2 ESSENCE field. Among them, a subsample of ˜500 high priority AGN candidates was created using as secondary criterion the shape of the structure function. In a pilot spectroscopic run we have confirmed the AGN nature for nearly all of our candidates.

  15. Hyperspectral imaging for predicting the allicin and soluble solid content of garlic with variable selection algorithms and chemometric models.

    PubMed

    Rahman, Anisur; Faqeerzada, Mohammad A; Cho, Byoung-Kwan

    2018-03-14

    Allicin and soluble solid content (SSC) in garlic is the responsible for its pungent flavor and odor. However, current conventional methods such as the use of high-pressure liquid chromatography and a refractometer have critical drawbacks in that they are time-consuming, labor-intensive and destructive procedures. The present study aimed to predict allicin and SSC in garlic using hyperspectral imaging in combination with variable selection algorithms and calibration models. Hyperspectral images of 100 garlic cloves were acquired that covered two spectral ranges, from which the mean spectra of each clove were extracted. The calibration models included partial least squares (PLS) and least squares-support vector machine (LS-SVM) regression, as well as different spectral pre-processing techniques, from which the highest performing spectral preprocessing technique and spectral range were selected. Then, variable selection methods, such as regression coefficients, variable importance in projection (VIP) and the successive projections algorithm (SPA), were evaluated for the selection of effective wavelengths (EWs). Furthermore, PLS and LS-SVM regression methods were applied to quantitatively predict the quality attributes of garlic using the selected EWs. Of the established models, the SPA-LS-SVM model obtained an Rpred2 of 0.90 and standard error of prediction (SEP) of 1.01% for SSC prediction, whereas the VIP-LS-SVM model produced the best result with an Rpred2 of 0.83 and SEP of 0.19 mg g -1 for allicin prediction in the range 1000-1700 nm. Furthermore, chemical images of garlic were developed using the best predictive model to facilitate visualization of the spatial distributions of allicin and SSC. The present study clearly demonstrates that hyperspectral imaging combined with an appropriate chemometrics method can potentially be employed as a fast, non-invasive method to predict the allicin and SSC in garlic. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.

  16. No difference in variability of unique hue selections and binary hue selections.

    PubMed

    Bosten, J M; Lawrance-Owen, A J

    2014-04-01

    If unique hues have special status in phenomenological experience as perceptually pure, it seems reasonable to assume that they are represented more precisely by the visual system than are other colors. Following the method of Malkoc et al. (J. Opt. Soc. Am. A22, 2154 [2005]), we gathered unique and binary hue selections from 50 subjects. For these subjects we repeated the measurements in two separate sessions, allowing us to measure test-retest reliabilities (0.52≤ρ≤0.78; p≪0.01). We quantified the within-individual variability for selections of each hue. Adjusting for the differences in variability intrinsic to different regions of chromaticity space, we compared the within-individual variability for unique hues to that for binary hues. Surprisingly, we found that selections of unique hues did not show consistently lower variability than selections of binary hues. We repeated hue measurements in a single session for an independent sample of 58 subjects, using a different relative scaling of the cardinal axes of MacLeod-Boynton chromaticity space. Again, we found no consistent difference in adjusted within-individual variability for selections of unique and binary hues. Our finding does not depend on the particular scaling chosen for the Y axis of MacLeod-Boynton chromaticity space.

  17. Use of data mining techniques to classify soil CO2 emission induced by crop management in sugarcane field.

    PubMed

    Farhate, Camila Viana Vieira; Souza, Zigomar Menezes de; Oliveira, Stanley Robson de Medeiros; Tavares, Rose Luiza Moraes; Carvalho, João Luís Nunes

    2018-01-01

    Soil CO2 emissions are regarded as one of the largest flows of the global carbon cycle and small changes in their magnitude can have a large effect on the CO2 concentration in the atmosphere. Thus, a better understanding of this attribute would enable the identification of promoters and the development of strategies to mitigate the risks of climate change. Therefore, our study aimed at using data mining techniques to predict the soil CO2 emission induced by crop management in sugarcane areas in Brazil. To do so, we used different variable selection methods (correlation, chi-square, wrapper) and classification (Decision tree, Bayesian models, neural networks, support vector machine, bagging with logistic regression), and finally we tested the efficiency of different approaches through the Receiver Operating Characteristic (ROC) curve. The original dataset consisted of 19 variables (18 independent variables and one dependent (or response) variable). The association between cover crop and minimum tillage are effective strategies to promote the mitigation of soil CO2 emissions, in which the average CO2 emissions are 63 kg ha-1 day-1. The variables soil moisture, soil temperature (Ts), rainfall, pH, and organic carbon were most frequently selected for soil CO2 emission classification using different methods for attribute selection. According to the results of the ROC curve, the best approaches for soil CO2 emission classification were the following: (I)-the Multilayer Perceptron classifier with attribute selection through the wrapper method, that presented rate of false positive of 13,50%, true positive of 94,20% area under the curve (AUC) of 89,90% (II)-the Bagging classifier with logistic regression with attribute selection through the Chi-square method, that presented rate of false positive of 13,50%, true positive of 94,20% AUC of 89,90%. However, the (I) approach stands out in relation to (II) for its higher positive class accuracy (high CO2 emission) and lower computational cost.

  18. Use of data mining techniques to classify soil CO2 emission induced by crop management in sugarcane field

    PubMed Central

    de Souza, Zigomar Menezes; Oliveira, Stanley Robson de Medeiros; Tavares, Rose Luiza Moraes; Carvalho, João Luís Nunes

    2018-01-01

    Soil CO2 emissions are regarded as one of the largest flows of the global carbon cycle and small changes in their magnitude can have a large effect on the CO2 concentration in the atmosphere. Thus, a better understanding of this attribute would enable the identification of promoters and the development of strategies to mitigate the risks of climate change. Therefore, our study aimed at using data mining techniques to predict the soil CO2 emission induced by crop management in sugarcane areas in Brazil. To do so, we used different variable selection methods (correlation, chi-square, wrapper) and classification (Decision tree, Bayesian models, neural networks, support vector machine, bagging with logistic regression), and finally we tested the efficiency of different approaches through the Receiver Operating Characteristic (ROC) curve. The original dataset consisted of 19 variables (18 independent variables and one dependent (or response) variable). The association between cover crop and minimum tillage are effective strategies to promote the mitigation of soil CO2 emissions, in which the average CO2 emissions are 63 kg ha-1 day-1. The variables soil moisture, soil temperature (Ts), rainfall, pH, and organic carbon were most frequently selected for soil CO2 emission classification using different methods for attribute selection. According to the results of the ROC curve, the best approaches for soil CO2 emission classification were the following: (I)–the Multilayer Perceptron classifier with attribute selection through the wrapper method, that presented rate of false positive of 13,50%, true positive of 94,20% area under the curve (AUC) of 89,90% (II)–the Bagging classifier with logistic regression with attribute selection through the Chi-square method, that presented rate of false positive of 13,50%, true positive of 94,20% AUC of 89,90%. However, the (I) approach stands out in relation to (II) for its higher positive class accuracy (high CO2 emission) and lower computational cost. PMID:29513765

  19. Missing in space: an evaluation of imputation methods for missing data in spatial analysis of risk factors for type II diabetes.

    PubMed

    Baker, Jannah; White, Nicole; Mengersen, Kerrie

    2014-11-20

    Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.

  20. The variability of software scoring of the CDMAM phantom associated with a limited number of images

    NASA Astrophysics Data System (ADS)

    Yang, Chang-Ying J.; Van Metter, Richard

    2007-03-01

    Software scoring approaches provide an attractive alternative to human evaluation of CDMAM images from digital mammography systems, particularly for annual quality control testing as recommended by the European Protocol for the Quality Control of the Physical and Technical Aspects of Mammography Screening (EPQCM). Methods for correlating CDCOM-based results with human observer performance have been proposed. A common feature of all methods is the use of a small number (at most eight) of CDMAM images to evaluate the system. This study focuses on the potential variability in the estimated system performance that is associated with these methods. Sets of 36 CDMAM images were acquired under carefully controlled conditions from three different digital mammography systems. The threshold visibility thickness (TVT) for each disk diameter was determined using previously reported post-analysis methods from the CDCOM scorings for a randomly selected group of eight images for one measurement trial. This random selection process was repeated 3000 times to estimate the variability in the resulting TVT values for each disk diameter. The results from using different post-analysis methods, different random selection strategies and different digital systems were compared. Additional variability of the 0.1 mm disk diameter was explored by comparing the results from two different image data sets acquired under the same conditions from the same system. The magnitude and the type of error estimated for experimental data was explained through modeling. The modeled results also suggest a limitation in the current phantom design for the 0.1 mm diameter disks. Through modeling, it was also found that, because of the binomial statistic nature of the CDMAM test, the true variability of the test could be underestimated by the commonly used method of random re-sampling.

  1. Multivariate calibration on NIR data: development of a model for the rapid evaluation of ethanol content in bakery products.

    PubMed

    Bello, Alessandra; Bianchi, Federica; Careri, Maria; Giannetto, Marco; Mori, Giovanni; Musci, Marilena

    2007-11-05

    A new NIR method based on multivariate calibration for determination of ethanol in industrially packed wholemeal bread was developed and validated. GC-FID was used as reference method for the determination of actual ethanol concentration of different samples of wholemeal bread with proper content of added ethanol, ranging from 0 to 3.5% (w/w). Stepwise discriminant analysis was carried out on the NIR dataset, in order to reduce the number of original variables by selecting those that were able to discriminate between the samples of different ethanol concentrations. With the so selected variables a multivariate calibration model was then obtained by multiple linear regression. The prediction power of the linear model was optimized by a new "leave one out" method, so that the number of original variables resulted further reduced.

  2. Strategies in identifying individuals in a segregant population of common bean and implications of genotype x environment interaction in the success of selection.

    PubMed

    Mendes, M P; Ramalho, M A P; Abreu, A F B

    2012-04-10

    The objective of this study was to compare the BLUP selection method with different selection strategies in F(2:4) and assess the efficiency of this method on the early choice of the best common bean (Phaseolus vulgaris) lines. Fifty-one F(2:4) progenies were produced from a cross between the CVIII8511 x RP-26 lines. A randomized block design was used with 20 replications and one-plant field plots. Character data on plant architecture and grain yield were obtained and then the sum of the standardized variables was estimated for simultaneous selection of both traits. Analysis was carried out by mixed models (BLUP) and the least squares method to compare different selection strategies, like mass selection, stratified mass selection and between and within progeny selection. The progenies selected by BLUP were assessed in advanced generations, always selecting the greatest and smallest sum of the standardized variables. Analyses by the least squares method and BLUP procedure ranked the progenies in the same way. The coincidence of the individuals identified by BLUP and between and within progeny selection was high and of the greatest magnitude when BLUP was compared with mass selection. Although BLUP is the best estimator of genotypic value, its efficiency in the response to long term selection is not different from any of the other methods, because it is also unable to predict the future effect of the progenies x environments interaction. It was inferred that selection success will always depend on the most accurate possible progeny assessment and using alternatives to reduce the progenies x environments interaction effect.

  3. K-Means Algorithm Performance Analysis With Determining The Value Of Starting Centroid With Random And KD-Tree Method

    NASA Astrophysics Data System (ADS)

    Sirait, Kamson; Tulus; Budhiarti Nababan, Erna

    2017-12-01

    Clustering methods that have high accuracy and time efficiency are necessary for the filtering process. One method that has been known and applied in clustering is K-Means Clustering. In its application, the determination of the begining value of the cluster center greatly affects the results of the K-Means algorithm. This research discusses the results of K-Means Clustering with starting centroid determination with a random and KD-Tree method. The initial determination of random centroid on the data set of 1000 student academic data to classify the potentially dropout has a sse value of 952972 for the quality variable and 232.48 for the GPA, whereas the initial centroid determination by KD-Tree has a sse value of 504302 for the quality variable and 214,37 for the GPA variable. The smaller sse values indicate that the result of K-Means Clustering with initial KD-Tree centroid selection have better accuracy than K-Means Clustering method with random initial centorid selection.

  4. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

    PubMed

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

  5. SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

    PubMed Central

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306

  6. Quality by design approach for understanding the critical quality attributes of cyclosporine ophthalmic emulsion.

    PubMed

    Rahman, Ziyaur; Xu, Xiaoming; Katragadda, Usha; Krishnaiah, Yellela S R; Yu, Lawrence; Khan, Mansoor A

    2014-03-03

    Restasis is an ophthalmic cyclosporine emulsion used for the treatment of dry eye syndrome. There are no generic products for this product, probably because of the limitations on establishing in vivo bioequivalence methods and lack of alternative in vitro bioequivalence testing methods. The present investigation was carried out to understand and identify the appropriate in vitro methods that can discriminate the effect of formulation and process variables on critical quality attributes (CQA) of cyclosporine microemulsion formulations having the same qualitative (Q1) and quantitative (Q2) composition as that of Restasis. Quality by design (QbD) approach was used to understand the effect of formulation and process variables on critical quality attributes (CQA) of cyclosporine microemulsion. The formulation variables chosen were mixing order method, phase volume ratio, and pH adjustment method, while the process variables were temperature of primary and raw emulsion formation, microfluidizer pressure, and number of pressure cycles. The responses selected were particle size, turbidity, zeta potential, viscosity, osmolality, surface tension, contact angle, pH, and drug diffusion. The selected independent variables showed statistically significant (p < 0.05) effect on droplet size, zeta potential, viscosity, turbidity, and osmolality. However, the surface tension, contact angle, pH, and drug diffusion were not significantly affected by independent variables. In summary, in vitro methods can detect formulation and manufacturing changes and would thus be important for quality control or sameness of cyclosporine ophthalmic products.

  7. Characterizing the Optical Variability of Bright Blazars: Variability-based Selection of Fermi Active Galactic Nuclei

    NASA Astrophysics Data System (ADS)

    Ruan, John J.; Anderson, Scott F.; MacLeod, Chelsea L.; Becker, Andrew C.; Burnett, T. H.; Davenport, James R. A.; Ivezić, Željko; Kochanek, Christopher S.; Plotkin, Richard M.; Sesar, Branimir; Stuart, J. Scott

    2012-11-01

    We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the ~30% of γ-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability τ, and driving amplitudes on short timescales \\hat{\\sigma }. Imposing cuts on minimum τ and \\hat{\\sigma } allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of γ-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously associated optical counterparts to Fermi active galactic nuclei with E >= 88% and C = 88% in Fermi 95% confidence error ellipses having semimajor axis r < 8'. We find that the suggested radio counterpart to Fermi source 2FGL J1649.6+5238 has optical variability consistent with other γ-ray blazars and is likely to be the γ-ray source. Our results suggest that the variability of the non-thermal jet emission in blazars is stochastic in nature, with unique variability properties due to the effects of relativistic beaming. After correcting for beaming, we estimate that the characteristic timescale of blazar variability is ~3 years in the rest frame of the jet, in contrast with the ~320 day disk flux timescale observed in quasars. The variability-based selection method presented will be useful for blazar identification in time-domain optical surveys and is also a probe of jet physics.

  8. New variable selection methods for zero-inflated count data with applications to the substance abuse field

    PubMed Central

    Buu, Anne; Johnson, Norman J.; Li, Runze; Tan, Xianming

    2011-01-01

    Zero-inflated count data are very common in health surveys. This study develops new variable selection methods for the zero-inflated Poisson regression model. Our simulations demonstrate the negative consequences which arise from the ignorance of zero-inflation. Among the competing methods, the one-step SCAD method is recommended because it has the highest specificity, sensitivity, exact fit, and lowest estimation error. The design of the simulations is based on the special features of two large national databases commonly used in the alcoholism and substance abuse field so that our findings can be easily generalized to the real settings. Applications of the methodology are demonstrated by empirical analyses on the data from a well-known alcohol study. PMID:21563207

  9. Effect of genetic algorithm as a variable selection method on different chemometric models applied for the analysis of binary mixture of amoxicillin and flucloxacillin: A comparative study

    NASA Astrophysics Data System (ADS)

    Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed

    2016-03-01

    Different chemometric models were applied for the quantitative analysis of amoxicillin (AMX), and flucloxacillin (FLX) in their binary mixtures, namely, partial least squares (PLS), spectral residual augmented classical least squares (SRACLS), concentration residual augmented classical least squares (CRACLS) and artificial neural networks (ANNs). All methods were applied with and without variable selection procedure (genetic algorithm GA). The methods were used for the quantitative analysis of the drugs in laboratory prepared mixtures and real market sample via handling the UV spectral data. Robust and simpler models were obtained by applying GA. The proposed methods were found to be rapid, simple and required no preliminary separation steps.

  10. Efficient robust doubly adaptive regularized regression with applications.

    PubMed

    Karunamuni, Rohana J; Kong, Linglong; Tu, Wei

    2018-01-01

    We consider the problem of estimation and variable selection for general linear regression models. Regularized regression procedures have been widely used for variable selection, but most existing methods perform poorly in the presence of outliers. We construct a new penalized procedure that simultaneously attains full efficiency and maximum robustness. Furthermore, the proposed procedure satisfies the oracle properties. The new procedure is designed to achieve sparse and robust solutions by imposing adaptive weights on both the decision loss and the penalty function. The proposed method of estimation and variable selection attains full efficiency when the model is correct and, at the same time, achieves maximum robustness when outliers are present. We examine the robustness properties using the finite-sample breakdown point and an influence function. We show that the proposed estimator attains the maximum breakdown point. Furthermore, there is no loss in efficiency when there are no outliers or the error distribution is normal. For practical implementation of the proposed method, we present a computational algorithm. We examine the finite-sample and robustness properties using Monte Carlo studies. Two datasets are also analyzed.

  11. Rank-based estimation in the {ell}1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data.

    PubMed

    Johnson, Brent A

    2009-10-01

    We consider estimation and variable selection in the partial linear model for censored data. The partial linear model for censored data is a direct extension of the accelerated failure time model, the latter of which is a very important alternative model to the proportional hazards model. We extend rank-based lasso-type estimators to a model that may contain nonlinear effects. Variable selection in such partial linear model has direct application to high-dimensional survival analyses that attempt to adjust for clinical predictors. In the microarray setting, previous methods can adjust for other clinical predictors by assuming that clinical and gene expression data enter the model linearly in the same fashion. Here, we select important variables after adjusting for prognostic clinical variables but the clinical effects are assumed nonlinear. Our estimator is based on stratification and can be extended naturally to account for multiple nonlinear effects. We illustrate the utility of our method through simulation studies and application to the Wisconsin prognostic breast cancer data set.

  12. Advanced reliability methods for structural evaluation

    NASA Technical Reports Server (NTRS)

    Wirsching, P. H.; Wu, Y.-T.

    1985-01-01

    Fast probability integration (FPI) methods, which can yield approximate solutions to such general structural reliability problems as the computation of the probabilities of complicated functions of random variables, are known to require one-tenth the computer time of Monte Carlo methods for a probability level of 0.001; lower probabilities yield even more dramatic differences. A strategy is presented in which a computer routine is run k times with selected perturbed values of the variables to obtain k solutions for a response variable Y. An approximating polynomial is fit to the k 'data' sets, and FPI methods are employed for this explicit form.

  13. Selection of latent variables for multiple mixed-outcome models

    PubMed Central

    ZHOU, LING; LIN, HUAZHEN; SONG, XINYUAN; LI, YI

    2014-01-01

    Latent variable models have been widely used for modeling the dependence structure of multiple outcomes data. However, the formulation of a latent variable model is often unknown a priori, the misspecification will distort the dependence structure and lead to unreliable model inference. Moreover, multiple outcomes with varying types present enormous analytical challenges. In this paper, we present a class of general latent variable models that can accommodate mixed types of outcomes. We propose a novel selection approach that simultaneously selects latent variables and estimates parameters. We show that the proposed estimator is consistent, asymptotically normal and has the oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of the World Values Survey, a global research project that explores peoples’ values and beliefs and the social and personal characteristics that might influence them. PMID:27642219

  14. Penalized regression procedures for variable selection in the potential outcomes framework

    PubMed Central

    Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.

    2015-01-01

    A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185

  15. Covariate selection with group lasso and doubly robust estimation of causal effects

    PubMed Central

    Koch, Brandon; Vock, David M.; Wolfson, Julian

    2017-01-01

    Summary The efficiency of doubly robust estimators of the average causal effect (ACE) of a treatment can be improved by including in the treatment and outcome models only those covariates which are related to both treatment and outcome (i.e., confounders) or related only to the outcome. However, it is often challenging to identify such covariates among the large number that may be measured in a given study. In this paper, we propose GLiDeR (Group Lasso and Doubly Robust Estimation), a novel variable selection technique for identifying confounders and predictors of outcome using an adaptive group lasso approach that simultaneously performs coefficient selection, regularization, and estimation across the treatment and outcome models. The selected variables and corresponding coefficient estimates are used in a standard doubly robust ACE estimator. We provide asymptotic results showing that, for a broad class of data generating mechanisms, GLiDeR yields a consistent estimator of the ACE when either the outcome or treatment model is correctly specified. A comprehensive simulation study shows that GLiDeR is more efficient than doubly robust methods using standard variable selection techniques and has substantial computational advantages over a recently proposed doubly robust Bayesian model averaging method. We illustrate our method by estimating the causal treatment effect of bilateral versus single-lung transplant on forced expiratory volume in one year after transplant using an observational registry. PMID:28636276

  16. Covariate selection with group lasso and doubly robust estimation of causal effects.

    PubMed

    Koch, Brandon; Vock, David M; Wolfson, Julian

    2018-03-01

    The efficiency of doubly robust estimators of the average causal effect (ACE) of a treatment can be improved by including in the treatment and outcome models only those covariates which are related to both treatment and outcome (i.e., confounders) or related only to the outcome. However, it is often challenging to identify such covariates among the large number that may be measured in a given study. In this article, we propose GLiDeR (Group Lasso and Doubly Robust Estimation), a novel variable selection technique for identifying confounders and predictors of outcome using an adaptive group lasso approach that simultaneously performs coefficient selection, regularization, and estimation across the treatment and outcome models. The selected variables and corresponding coefficient estimates are used in a standard doubly robust ACE estimator. We provide asymptotic results showing that, for a broad class of data generating mechanisms, GLiDeR yields a consistent estimator of the ACE when either the outcome or treatment model is correctly specified. A comprehensive simulation study shows that GLiDeR is more efficient than doubly robust methods using standard variable selection techniques and has substantial computational advantages over a recently proposed doubly robust Bayesian model averaging method. We illustrate our method by estimating the causal treatment effect of bilateral versus single-lung transplant on forced expiratory volume in one year after transplant using an observational registry. © 2017, The International Biometric Society.

  17. Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.

    PubMed

    Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen

    2011-04-01

    Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.

  18. Surface Estimation, Variable Selection, and the Nonparametric Oracle Property

    PubMed Central

    Storlie, Curtis B.; Bondell, Howard D.; Reich, Brian J.; Zhang, Hao Helen

    2010-01-01

    Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting. PMID:21603586

  19. Classification of 'Chemlali' accessions according to the geographical area using chemometric methods of phenolic profiles analysed by HPLC-ESI-TOF-MS.

    PubMed

    Taamalli, Amani; Arráez Román, David; Zarrouk, Mokhtar; Segura-Carretero, Antonio; Fernández-Gutiérrez, Alberto

    2012-05-01

    The present work describes a classification method of Tunisian 'Chemlali' olive oils based on their phenolic composition and geographical area. For this purpose, the data obtained by HPLC-ESI-TOF-MS from 13 samples of extra virgin olive oils, obtained from different production area throughout the country, were used for this study focusing in 23 phenolics compounds detected. The quantitative results showed a significant variability among the analysed oil samples. Factor analysis method using principal component was applied to the data in order to reduce the number of factors which explain the variability of the selected compounds. The data matrix constructed was subjected to a canonical discriminant analysis (CDA) in order to classify the oil samples. These results showed that 100% of cross-validated original group cases were correctly classified, which proves the usefulness of the selected variables. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. Study of a Vocal Feature Selection Method and Vocal Properties for Discriminating Four Constitution Types

    PubMed Central

    Kim, Keun Ho; Ku, Boncho; Kang, Namsik; Kim, Young-Su; Jang, Jun-Su; Kim, Jong Yeol

    2012-01-01

    The voice has been used to classify the four constitution types, and to recognize a subject's health condition by extracting meaningful physical quantities, in traditional Korean medicine. In this paper, we propose a method of selecting the reliable variables from various voice features, such as frequency derivative features, frequency band ratios, and intensity, from vowels and a sentence. Further, we suggest a process to extract independent variables by eliminating explanatory variables and reducing their correlation and remove outlying data to enable reliable discriminant analysis. Moreover, the suitable division of data for analysis, according to the gender and age of subjects, is discussed. Finally, the vocal features are applied to a discriminant analysis to classify each constitution type. This method of voice classification can be widely used in the u-Healthcare system of personalized medicine and for improving diagnostic accuracy. PMID:22529874

  1. Adhesive bonding using variable frequency microwave energy

    DOEpatents

    Lauf, Robert J.; McMillan, April D.; Paulauskas, Felix L.; Fathi, Zakaryae; Wei, Jianghua

    1998-01-01

    Methods of facilitating the adhesive bonding of various components with variable frequency microwave energy are disclosed. The time required to cure a polymeric adhesive is decreased by placing components to be bonded via the adhesive in a microwave heating apparatus having a multimode cavity and irradiated with microwaves of varying frequencies. Methods of uniformly heating various articles having conductive fibers disposed therein are provided. Microwave energy may be selectively oriented to enter an edge portion of an article having conductive fibers therein. An edge portion of an article having conductive fibers therein may be selectively shielded from microwave energy.

  2. Adhesive bonding using variable frequency microwave energy

    DOEpatents

    Lauf, R.J.; McMillan, A.D.; Paulauskas, F.L.; Fathi, Z.; Wei, J.

    1998-08-25

    Methods of facilitating the adhesive bonding of various components with variable frequency microwave energy are disclosed. The time required to cure a polymeric adhesive is decreased by placing components to be bonded via the adhesive in a microwave heating apparatus having a multimode cavity and irradiated with microwaves of varying frequencies. Methods of uniformly heating various articles having conductive fibers disposed therein are provided. Microwave energy may be selectively oriented to enter an edge portion of an article having conductive fibers therein. An edge portion of an article having conductive fibers therein may be selectively shielded from microwave energy. 26 figs.

  3. Adhesive bonding using variable frequency microwave energy

    DOEpatents

    Lauf, R.J.; McMillan, A.D.; Paulauskas, F.L.; Fathi, Z.; Wei, J.

    1998-09-08

    Methods of facilitating the adhesive bonding of various components with variable frequency microwave energy are disclosed. The time required to cure a polymeric adhesive is decreased by placing components to be bonded via the adhesive in a microwave heating apparatus having a multimode cavity and irradiated with microwaves of varying frequencies. Methods of uniformly heating various articles having conductive fibers disposed therein are provided. Microwave energy may be selectively oriented to enter an edge portion of an article having conductive fibers therein. An edge portion of an article having conductive fibers therein may be selectively shielded from microwave energy. 26 figs.

  4. VARIABLE SELECTION FOR QUALITATIVE INTERACTIONS IN PERSONALIZED MEDICINE WHILE CONTROLLING THE FAMILY-WISE ERROR RATE

    PubMed Central

    Gunter, Lacey; Zhu, Ji; Murphy, Susan

    2012-01-01

    For many years, subset analysis has been a popular topic for the biostatistics and clinical trials literature. In more recent years, the discussion has focused on finding subsets of genomes which play a role in the effect of treatment, often referred to as stratified or personalized medicine. Though highly sought after, methods for detecting subsets with altering treatment effects are limited and lacking in power. In this article we discuss variable selection for qualitative interactions with the aim to discover these critical patient subsets. We propose a new technique designed specifically to find these interaction variables among a large set of variables while still controlling for the number of false discoveries. We compare this new method against standard qualitative interaction tests using simulations and give an example of its use on data from a randomized controlled trial for the treatment of depression. PMID:22023676

  5. Identification of Genes Involved in Breast Cancer Metastasis by Integrating Protein-Protein Interaction Information with Expression Data.

    PubMed

    Tian, Xin; Xin, Mingyuan; Luo, Jian; Liu, Mingyao; Jiang, Zhenran

    2017-02-01

    The selection of relevant genes for breast cancer metastasis is critical for the treatment and prognosis of cancer patients. Although much effort has been devoted to the gene selection procedures by use of different statistical analysis methods or computational techniques, the interpretation of the variables in the resulting survival models has been limited so far. This article proposes a new Random Forest (RF)-based algorithm to identify important variables highly related with breast cancer metastasis, which is based on the important scores of two variable selection algorithms, including the mean decrease Gini (MDG) criteria of Random Forest and the GeneRank algorithm with protein-protein interaction (PPI) information. The new gene selection algorithm can be called PPIRF. The improved prediction accuracy fully illustrated the reliability and high interpretability of gene list selected by the PPIRF approach.

  6. An improved partial least-squares regression method for Raman spectroscopy

    NASA Astrophysics Data System (ADS)

    Momenpour Tehran Monfared, Ali; Anis, Hanan

    2017-10-01

    It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.

  7. [Study on Application of NIR Spectral Information Screening in Identification of Maca Origin].

    PubMed

    Wang, Yuan-zhong; Zhao, Yan-li; Zhang, Ji; Jin, Hang

    2016-02-01

    Medicinal and edible plant Maca is rich in various nutrients and owns great medicinal value. Based on near infrared diffuse reflectance spectra, 139 Maca samples collected from Peru and Yunnan were used to identify their geographical origins. Multiplication signal correction (MSC) coupled with second derivative (SD) and Norris derivative filter (ND) was employed in spectral pretreatment. Spectrum range (7,500-4,061 cm⁻¹) was chosen by spectrum standard deviation. Combined with principal component analysis-mahalanobis distance (PCA-MD), the appropriate number of principal components was selected as 5. Based on the spectrum range and the number of principal components selected, two abnormal samples were eliminated by modular group iterative singular sample diagnosis method. Then, four methods were used to filter spectral variable information, competitive adaptive reweighted sampling (CARS), monte carlo-uninformative variable elimination (MC-UVE), genetic algorithm (GA) and subwindow permutation analysis (SPA). The spectral variable information filtered was evaluated by model population analysis (MPA). The results showed that RMSECV(SPA) > RMSECV(CARS) > RMSECV(MC-UVE) > RMSECV(GA), were 2. 14, 2. 05, 2. 02, and 1. 98, and the spectral variables were 250, 240, 250 and 70, respectively. According to the spectral variable filtered, partial least squares discriminant analysis (PLS-DA) was used to build the model, with random selection of 97 samples as training set, and the other 40 samples as validation set. The results showed that, R²: GA > MC-UVE > CARS > SPA, RMSEC and RMSEP: GA < MC-UVE < CARS

  8. Preformulation considerations for controlled release dosage forms. Part III. Candidate form selection using numerical weighting and scoring.

    PubMed

    Chrzanowski, Frank

    2008-01-01

    Two numerical methods, Decision Analysis (DA) and Potential Problem Analysis (PPA) are presented as alternative selection methods to the logical method presented in Part I. In DA properties are weighted and outcomes are scored. The weighted scores for each candidate are totaled and final selection is based on the totals. Higher scores indicate better candidates. In PPA potential problems are assigned a seriousness factor and test outcomes are used to define the probability of occurrence. The seriousness-probability products are totaled and forms with minimal scores are preferred. DA and PPA have never been compared to the logical-elimination method. Additional data were available for two forms of McN-5707 to provide complete preformulation data for five candidate forms. Weight and seriousness factors (independent variables) were obtained from a survey of experienced formulators. Scores and probabilities (dependent variables) were provided independently by Preformulation. The rankings of the five candidate forms, best to worst, were similar for all three methods. These results validate the applicability of DA and PPA for candidate form selection. DA and PPA are particularly applicable in cases where there are many candidate forms and where each form has some degree of unfavorable properties.

  9. Classification and quantitation of milk powder by near-infrared spectroscopy and mutual information-based variable selection and partial least squares

    NASA Astrophysics Data System (ADS)

    Chen, Hui; Tan, Chao; Lin, Zan; Wu, Tong

    2018-01-01

    Milk is among the most popular nutrient source worldwide, which is of great interest due to its beneficial medicinal properties. The feasibility of the classification of milk powder samples with respect to their brands and the determination of protein concentration is investigated by NIR spectroscopy along with chemometrics. Two datasets were prepared for experiment. One contains 179 samples of four brands for classification and the other contains 30 samples for quantitative analysis. Principal component analysis (PCA) was used for exploratory analysis. Based on an effective model-independent variable selection method, i.e., minimal-redundancy maximal-relevance (MRMR), only 18 variables were selected to construct a partial least-square discriminant analysis (PLS-DA) model. On the test set, the PLS-DA model based on the selected variable set was compared with the full-spectrum PLS-DA model, both of which achieved 100% accuracy. In quantitative analysis, the partial least-square regression (PLSR) model constructed by the selected subset of 260 variables outperforms significantly the full-spectrum model. It seems that the combination of NIR spectroscopy, MRMR and PLS-DA or PLSR is a powerful tool for classifying different brands of milk and determining the protein content.

  10. Comparison of statistical methods for detection of serum lipid biomarkers for mesothelioma and asbestos exposure.

    PubMed

    Xu, Rengyi; Mesaros, Clementina; Weng, Liwei; Snyder, Nathaniel W; Vachani, Anil; Blair, Ian A; Hwang, Wei-Ting

    2017-07-01

    We compared three statistical methods in selecting a panel of serum lipid biomarkers for mesothelioma and asbestos exposure. Serum samples from mesothelioma, asbestos-exposed subjects and controls (40 per group) were analyzed. Three variable selection methods were considered: top-ranked predictors from univariate model, stepwise and least absolute shrinkage and selection operator. Crossed-validated area under the receiver operating characteristic curve was used to compare the prediction performance. Lipids with high crossed-validated area under the curve were identified. Lipid with mass-to-charge ratio of 372.31 was selected by all three methods comparing mesothelioma versus control. Lipids with mass-to-charge ratio of 1464.80 and 329.21 were selected by two models for asbestos exposure versus control. Different methods selected a similar set of serum lipids. Combining candidate biomarkers can improve prediction.

  11. C-fuzzy variable-branch decision tree with storage and classification error rate constraints

    NASA Astrophysics Data System (ADS)

    Yang, Shiueng-Bien

    2009-10-01

    The C-fuzzy decision tree (CFDT), which is based on the fuzzy C-means algorithm, has recently been proposed. The CFDT is grown by selecting the nodes to be split according to its classification error rate. However, the CFDT design does not consider the classification time taken to classify the input vector. Thus, the CFDT can be improved. We propose a new C-fuzzy variable-branch decision tree (CFVBDT) with storage and classification error rate constraints. The design of the CFVBDT consists of two phases-growing and pruning. The CFVBDT is grown by selecting the nodes to be split according to the classification error rate and the classification time in the decision tree. Additionally, the pruning method selects the nodes to prune based on the storage requirement and the classification time of the CFVBDT. Furthermore, the number of branches of each internal node is variable in the CFVBDT. Experimental results indicate that the proposed CFVBDT outperforms the CFDT and other methods.

  12. The predictive validity of a situational judgement test, a clinical problem solving test and the core medical training selection methods for performance in specialty training .

    PubMed

    Patterson, Fiona; Lopes, Safiatu; Harding, Stephen; Vaux, Emma; Berkin, Liz; Black, David

    2017-02-01

    The aim of this study was to follow up a sample of physicians who began core medical training (CMT) in 2009. This paper examines the long-term validity of CMT and GP selection methods in predicting performance in the Membership of Royal College of Physicians (MRCP(UK)) examinations. We performed a longitudinal study, examining the extent to which the GP and CMT selection methods (T1) predict performance in the MRCP(UK) examinations (T2). A total of 2,569 applicants from 2008-09 who completed CMT and GP selection methods were included in the study. Looking at MRCP(UK) part 1, part 2 written and PACES scores, both CMT and GP selection methods show evidence of predictive validity for the outcome variables, and hierarchical regressions show the GP methods add significant value to the CMT selection process. CMT selection methods predict performance in important outcomes and have good evidence of validity; the GP methods may have an additional role alongside the CMT selection methods. © Royal College of Physicians 2017. All rights reserved.

  13. Variable mechanical ventilation

    PubMed Central

    Fontela, Paula Caitano; Prestes, Renata Bernardy; Forgiarini Jr., Luiz Alberto; Friedman, Gilberto

    2017-01-01

    Objective To review the literature on the use of variable mechanical ventilation and the main outcomes of this technique. Methods Search, selection, and analysis of all original articles on variable ventilation, without restriction on the period of publication and language, available in the electronic databases LILACS, MEDLINE®, and PubMed, by searching the terms "variable ventilation" OR "noisy ventilation" OR "biologically variable ventilation". Results A total of 36 studies were selected. Of these, 24 were original studies, including 21 experimental studies and three clinical studies. Conclusion Several experimental studies reported the beneficial effects of distinct variable ventilation strategies on lung function using different models of lung injury and healthy lungs. Variable ventilation seems to be a viable strategy for improving gas exchange and respiratory mechanics and preventing lung injury associated with mechanical ventilation. However, further clinical studies are necessary to assess the potential of variable ventilation strategies for the clinical improvement of patients undergoing mechanical ventilation. PMID:28444076

  14. Advances in variable selection methods I: Causal selection methods versus stepwise regression and principal component analysis on data of known and unknown functional relationships

    EPA Science Inventory

    Hydrological predictions at a watershed scale are commonly based on extrapolation and upscaling of hydrological behavior at plot and hillslope scales. Yet, dominant hydrological drivers at a hillslope may not be as dominant at the watershed scale because of the heterogeneity of w...

  15. Quantifying natural delta variability using a multiple-point geostatistics prior uncertainty model

    NASA Astrophysics Data System (ADS)

    Scheidt, Céline; Fernandes, Anjali M.; Paola, Chris; Caers, Jef

    2016-10-01

    We address the question of quantifying uncertainty associated with autogenic pattern variability in a channelized transport system by means of a modern geostatistical method. This question has considerable relevance for practical subsurface applications as well, particularly those related to uncertainty quantification relying on Bayesian approaches. Specifically, we show how the autogenic variability in a laboratory experiment can be represented and reproduced by a multiple-point geostatistical prior uncertainty model. The latter geostatistical method requires selection of a limited set of training images from which a possibly infinite set of geostatistical model realizations, mimicking the training image patterns, can be generated. To that end, we investigate two methods to determine how many training images and what training images should be provided to reproduce natural autogenic variability. The first method relies on distance-based clustering of overhead snapshots of the experiment; the second method relies on a rate of change quantification by means of a computer vision algorithm termed the demon algorithm. We show quantitatively that with either training image selection method, we can statistically reproduce the natural variability of the delta formed in the experiment. In addition, we study the nature of the patterns represented in the set of training images as a representation of the "eigenpatterns" of the natural system. The eigenpattern in the training image sets display patterns consistent with previous physical interpretations of the fundamental modes of this type of delta system: a highly channelized, incisional mode; a poorly channelized, depositional mode; and an intermediate mode between the two.

  16. Using Propensity Score Methods to Approximate Factorial Experimental Designs to Analyze the Relationship between Two Variables and an Outcome

    ERIC Educational Resources Information Center

    Dong, Nianbo

    2015-01-01

    Researchers have become increasingly interested in programs' main and interaction effects of two variables (A and B, e.g., two treatment variables or one treatment variable and one moderator) on outcomes. A challenge for estimating main and interaction effects is to eliminate selection bias across A-by-B groups. I introduce Rubin's causal model to…

  17. Selectivity in reversed-phase separations: general influence of solvent type and mobile phase pH.

    PubMed

    Neue, Uwe D; Méndez, Alberto

    2007-05-01

    The influence of the mobile phase on retention is studied in this paper for a group of over 70 compounds with a broad range of multiple functional groups. We varied the pH of the mobile phase (pH 3, 7, and 10) and the organic modifier (methanol, acetonitrile (ACN), and tetrahydrofuran (THF)), using 15 different stationary phases. In this paper, we describe the overall retention and selectivity changes observed with these variables. We focus on the primary effects of solvent choice and pH. For example, transfer rules for solvent composition resulting in equivalent retention depend on the packing as well as on the type of analyte. Based on the retention patterns, one can calculate selectivity difference values for different variables. The selectivity difference is a measure of the importance of the different variables involved in method development. Selectivity changes specific to the type of analyte are described. The largest selectivity differences are obtained with pH changes.

  18. Application of different spectrophotometric methods for simultaneous determination of elbasvir and grazoprevir in pharmaceutical preparation

    NASA Astrophysics Data System (ADS)

    Attia, Khalid A. M.; El-Abasawi, Nasr M.; El-Olemy, Ahmed; Abdelazim, Ahmed H.

    2018-01-01

    The first three UV spectrophotometric methods have been developed of simultaneous determination of two new FDA approved drugs namely; elbasvir and grazoprevir in their combined pharmaceutical dosage form. These methods include simultaneous equation, partial least squares with and without variable selection procedure (genetic algorithm). For simultaneous equation method, the absorbance values at 369 (λmax of elbasvir) and 253 nm (λmax of grazoprevir) have been selected for the formation of two simultaneous equations required for the mathematical processing and quantitative analysis of the studied drugs. Alternatively, the partial least squares with and without variable selection procedure (genetic algorithm) have been applied in the spectra analysis because the synchronous inclusion of many unreal wavelengths rather than by using a single or dual wavelength which greatly increases the precision and predictive ability of the methods. Successfully assay of the drugs in their pharmaceutical formulation has been done by the proposed methods. Statistically comparative analysis for the obtained results with the manufacturing methods has been performed. It is noteworthy to mention that there was no significant difference between the proposed methods and the manufacturing one with respect to the validation parameters.

  19. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

    PubMed Central

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models. PMID:26890307

  20. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

    PubMed

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.

  1. Model-Averaged ℓ1 Regularization using Markov Chain Monte Carlo Model Composition

    PubMed Central

    Fraley, Chris; Percival, Daniel

    2014-01-01

    Bayesian Model Averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining ℓ1 regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the ℓ1 regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional datasets. We apply our technique in simulations, as well as to some applications that arise in genomics. PMID:25642001

  2. Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs.

    PubMed

    Cao, Hongbao; Duan, Junbo; Lin, Dongdong; Shugart, Yin Yao; Calhoun, Vince; Wang, Yu-Ping

    2014-11-15

    Integrative analysis of multiple data types can take advantage of their complementary information and therefore may provide higher power to identify potential biomarkers that would be missed using individual data analysis. Due to different natures of diverse data modality, data integration is challenging. Here we address the data integration problem by developing a generalized sparse model (GSM) using weighting factors to integrate multi-modality data for biomarker selection. As an example, we applied the GSM model to a joint analysis of two types of schizophrenia data sets: 759,075 SNPs and 153,594 functional magnetic resonance imaging (fMRI) voxels in 208 subjects (92 cases/116 controls). To solve this small-sample-large-variable problem, we developed a novel sparse representation based variable selection (SRVS) algorithm, with the primary aim to identify biomarkers associated with schizophrenia. To validate the effectiveness of the selected variables, we performed multivariate classification followed by a ten-fold cross validation. We compared our proposed SRVS algorithm with an earlier sparse model based variable selection algorithm for integrated analysis. In addition, we compared with the traditional statistics method for uni-variant data analysis (Chi-squared test for SNP data and ANOVA for fMRI data). Results showed that our proposed SRVS method can identify novel biomarkers that show stronger capability in distinguishing schizophrenia patients from healthy controls. Moreover, better classification ratios were achieved using biomarkers from both types of data, suggesting the importance of integrative analysis. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. The Cramér-Rao Bounds and Sensor Selection for Nonlinear Systems with Uncertain Observations.

    PubMed

    Wang, Zhiguo; Shen, Xiaojing; Wang, Ping; Zhu, Yunmin

    2018-04-05

    This paper considers the problems of the posterior Cramér-Rao bound and sensor selection for multi-sensor nonlinear systems with uncertain observations. In order to effectively overcome the difficulties caused by uncertainty, we investigate two methods to derive the posterior Cramér-Rao bound. The first method is based on the recursive formula of the Cramér-Rao bound and the Gaussian mixture model. Nevertheless, it needs to compute a complex integral based on the joint probability density function of the sensor measurements and the target state. The computation burden of this method is relatively high, especially in large sensor networks. Inspired by the idea of the expectation maximization algorithm, the second method is to introduce some 0-1 latent variables to deal with the Gaussian mixture model. Since the regular condition of the posterior Cramér-Rao bound is unsatisfied for the discrete uncertain system, we use some continuous variables to approximate the discrete latent variables. Then, a new Cramér-Rao bound can be achieved by a limiting process of the Cramér-Rao bound of the continuous system. It avoids the complex integral, which can reduce the computation burden. Based on the new posterior Cramér-Rao bound, the optimal solution of the sensor selection problem can be derived analytically. Thus, it can be used to deal with the sensor selection of a large-scale sensor networks. Two typical numerical examples verify the effectiveness of the proposed methods.

  4. Sampling in health geography: reconciling geographical objectives and probabilistic methods. An example of a health survey in Vientiane (Lao PDR)

    PubMed Central

    Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard

    2007-01-01

    Background Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. Methods We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. Application We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. Conclusion This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy. PMID:17543100

  5. Analysis of Factors Influencing Energy Consumption at an Air Force Base.

    DTIC Science & Technology

    1995-12-01

    include them in energy consumption projections. 28 Table 2-3 Selected Independent Variables ( Morill , 1985) Dependent Variable Energy Conservation...most appropriate method for forecasting energy consumption (Weck, 1981; Tinsley, 1981; and Morill , 1985). This section will present a brief

  6. Estimation of selection intensity under overdominance by Bayesian methods.

    PubMed

    Buzbas, Erkan Ozge; Joyce, Paul; Abdo, Zaid

    2009-01-01

    A balanced pattern in the allele frequencies of polymorphic loci is a potential sign of selection, particularly of overdominance. Although this type of selection is of some interest in population genetics, there exists no likelihood based approaches specifically tailored to make inference on selection intensity. To fill this gap, we present Bayesian methods to estimate selection intensity under k-allele models with overdominance. Our model allows for an arbitrary number of loci and alleles within a locus. The neutral and selected variability within each locus are modeled with corresponding k-allele models. To estimate the posterior distribution of the mean selection intensity in a multilocus region, a hierarchical setup between loci is used. The methods are demonstrated with data at the Human Leukocyte Antigen loci from world-wide populations.

  7. [Study on correction of data bias caused by different missing mechanisms in survey of medical expenditure among students enrolling in Urban Resident Basic Medical Insurance].

    PubMed

    Zhang, Haixia; Zhao, Junkang; Gu, Caijiao; Cui, Yan; Rong, Huiying; Meng, Fanlong; Wang, Tong

    2015-05-01

    The study of the medical expenditure and its influencing factors among the students enrolling in Urban Resident Basic Medical Insurance (URBMI) in Taiyuan indicated that non response bias and selection bias coexist in dependent variable of the survey data. Unlike previous studies only focused on one missing mechanism, a two-stage method to deal with two missing mechanisms simultaneously was suggested in this study, combining multiple imputation with sample selection model. A total of 1 190 questionnaires were returned by the students (or their parents) selected in child care settings, schools and universities in Taiyuan by stratified cluster random sampling in 2012. In the returned questionnaires, 2.52% existed not missing at random (NMAR) of dependent variable and 7.14% existed missing at random (MAR) of dependent variable. First, multiple imputation was conducted for MAR by using completed data, then sample selection model was used to correct NMAR in multiple imputation, and a multi influencing factor analysis model was established. Based on 1 000 times resampling, the best scheme of filling the random missing values is the predictive mean matching (PMM) method under the missing proportion. With this optimal scheme, a two stage survey was conducted. Finally, it was found that the influencing factors on annual medical expenditure among the students enrolling in URBMI in Taiyuan included population group, annual household gross income, affordability of medical insurance expenditure, chronic disease, seeking medical care in hospital, seeking medical care in community health center or private clinic, hospitalization, hospitalization canceled due to certain reason, self medication and acceptable proportion of self-paid medical expenditure. The two-stage method combining multiple imputation with sample selection model can deal with non response bias and selection bias effectively in dependent variable of the survey data.

  8. Quality assessment of data discrimination using self-organizing maps.

    PubMed

    Mekler, Alexey; Schwarz, Dmitri

    2014-10-01

    One of the important aspects of the data classification problem lies in making the most appropriate selection of features. The set of variables should be small and, at the same time, should provide reliable discrimination of the classes. The method for the discriminating power evaluation that enables a comparison between different sets of variables will be useful in the search for the set of variables. A new approach to feature selection is presented. Two methods of evaluation of the data discriminating power of a feature set are suggested. Both of the methods implement self-organizing maps (SOMs) and the newly introduced exponents of the degree of data clusterization on the SOM. The first method is based on the comparison of intraclass and interclass distances on the map. Another method concerns the evaluation of the relative number of best matching unit's (BMUs) nearest neighbors of the same class. Both methods make it possible to evaluate the discriminating power of a feature set in cases when this set provides nonlinear discrimination of the classes. Current algorithms in program code can be downloaded for free at http://mekler.narod.ru/Science/Articles_support.html, as well as the supporting data files. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Incorporating biological information in sparse principal component analysis with application to genomic data.

    PubMed

    Li, Ziyi; Safo, Sandra E; Long, Qi

    2017-07-11

    Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection. Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma. The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.

  10. Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

    NASA Astrophysics Data System (ADS)

    Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

    2008-04-01

    Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

  11. Optimal information networks: Application for data-driven integrated health in populations

    PubMed Central

    Servadio, Joseph L.; Convertino, Matteo

    2018-01-01

    Development of composite indicators for integrated health in populations typically relies on a priori assumptions rather than model-free, data-driven evidence. Traditional variable selection processes tend not to consider relatedness and redundancy among variables, instead considering only individual correlations. In addition, a unified method for assessing integrated health statuses of populations is lacking, making systematic comparison among populations impossible. We propose the use of maximum entropy networks (MENets) that use transfer entropy to assess interrelatedness among selected variables considered for inclusion in a composite indicator. We also define optimal information networks (OINs) that are scale-invariant MENets, which use the information in constructed networks for optimal decision-making. Health outcome data from multiple cities in the United States are applied to this method to create a systemic health indicator, representing integrated health in a city. PMID:29423440

  12. Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data.

    PubMed

    Balabin, Roman M; Smirnov, Sergey V

    2011-04-29

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice. Copyright © 2011 Elsevier B.V. All rights reserved.

  13. Purposeful Variable Selection and Stratification to Impute Missing FAST Data in Trauma Research

    PubMed Central

    Fuchs, Paul A.; del Junco, Deborah J.; Fox, Erin E.; Holcomb, John B.; Rahbar, Mohammad H.; Wade, Charles A.; Alarcon, Louis H.; Brasel, Karen J.; Bulger, Eileen M.; Cohen, Mitchell J.; Myers, John G.; Muskat, Peter; Phelan, Herb A.; Schreiber, Martin A.; Cotton, Bryan A.

    2013-01-01

    Background The Focused Assessment with Sonography for Trauma (FAST) exam is an important variable in many retrospective trauma studies. The purpose of this study was to devise an imputation method to overcome missing data for the FAST exam. Due to variability in patients’ injuries and trauma care, these data are unlikely to be missing completely at random (MCAR), raising concern for validity when analyses exclude patients with missing values. Methods Imputation was conducted under a less restrictive, more plausible missing at random (MAR) assumption. Patients with missing FAST exams had available data on alternate, clinically relevant elements that were strongly associated with FAST results in complete cases, especially when considered jointly. Subjects with missing data (32.7%) were divided into eight mutually exclusive groups based on selected variables that both described the injury and were associated with missing FAST values. Additional variables were selected within each group to classify missing FAST values as positive or negative, and correct FAST exam classification based on these variables was determined for patients with non-missing FAST values. Results Severe head/neck injury (odds ratio, OR=2.04), severe extremity injury (OR=4.03), severe abdominal injury (OR=1.94), no injury (OR=1.94), other abdominal injury (OR=0.47), other head/neck injury (OR=0.57) and other extremity injury (OR=0.45) groups had significant ORs for missing data; the other group odds ratio was not significant (OR=0.84). All 407 missing FAST values were imputed, with 109 classified as positive. Correct classification of non-missing FAST results using the alternate variables was 87.2%. Conclusions Purposeful imputation for missing FAST exams based on interactions among selected variables assessed by simple stratification may be a useful adjunct to sensitivity analysis in the evaluation of imputation strategies under different missing data mechanisms. This approach has the potential for widespread application in clinical and translational research and validation is warranted. Level of Evidence Level II Prognostic or Epidemiological PMID:23778515

  14. Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.

    PubMed

    Coull, Brent A; Bobb, Jennifer F; Wellenius, Gregory A; Kioumourtzoglou, Marianthi-Anna; Mittleman, Murray A; Koutrakis, Petros; Godleski, John J

    2015-06-01

    The United States Environmental Protection Agency (U.S. EPA*) currently regulates individual air pollutants on a pollutant-by-pollutant basis, adjusted for other pollutants and potential confounders. However, the National Academies of Science concluded that a multipollutant regulatory approach that takes into account the joint effects of multiple constituents is likely to be more protective of human health. Unfortunately, the large majority of existing research had focused on health effects of air pollution for one pollutant or for one pollutant with control for the independent effects of a small number of copollutants. Limitations in existing statistical methods are at least partially responsible for this lack of information on joint effects. The goal of this project was to fill this gap by developing flexible statistical methods to estimate the joint effects of multiple pollutants, while allowing for potential nonlinear or nonadditive associations between a given pollutant and the health outcome of interest. We proposed Bayesian kernel machine regression (BKMR) methods as a way to simultaneously achieve the multifaceted goals of variable selection, flexible estimation of the exposure-response relationship, and inference on the strength of the association between individual pollutants and health outcomes in a health effects analysis of mixtures. We first developed a BKMR variable-selection approach, which we call component-wise variable selection, to make estimating such a potentially complex exposure-response function possible by effectively using two types of penalization (or regularization) of the multivariate exposure-response surface. Next we developed an extension of this first variable-selection approach that incorporates knowledge about how pollutants might group together, such as multiple constituents of particulate matter that might represent a common pollution source category. This second grouped, or hierarchical, variable-selection procedure is applicable when groups of highly correlated pollutants are being studied. To investigate the properties of the proposed methods, we conducted three simulation studies designed to evaluate the ability of BKMR to estimate environmental mixtures responsible for health effects under potentially complex but plausible exposure-response relationships. An attractive feature of our simulation studies is that we used actual exposure data rather than simulated values. This real-data simulation approach allowed us to evaluate the performance of BKMR and several other models under realistic joint distributions of multipollutant exposure. The simulation studies compared the two proposed variable-selection approaches (component-wise and hierarchical variable selection) with each other and with existing frequentist treatments of kernel machine regression (KMR). After the simulation studies, we applied the newly developed methods to an epidemiologic data set and to a toxicologic data set. To illustrate the applicability of the proposed methods to human epidemiologic data, we estimated associations between short-term exposures to fine particulate matter constituents and blood pressure in the Maintenance of Balance, Independent Living, Intellect, and Zest in the Elderly (MOBILIZE) Boston study, a prospective cohort study of elderly subjects. To illustrate the applicability of these methods to animal toxicologic studies, we analyzed data on the associations between both blood pressure and heart rate in canines exposed to a composition of concentrated ambient particles (CAPs) in a study conducted at the Harvard T. H. Chan School of Public Health (the Harvard Chan School; formerly Harvard School of Public Health; Bartoli et al. 2009). We successfully developed the theory and computational tools required to apply the proposed methods to the motivating data sets. Collectively, the three simulation studies showed that component-wise variable selection can identify important pollutants within a mixture as long as the correlations among pollutant concentrations are low to moderate. The hierarchical variable-selection method was more effective in high-dimension, high-correlation settings. Variable selection in existing frequentist KMR models can incur inflated type I error rates, particularly when pollutants are highly correlated. The analyses of the MOBILIZE data yielded evidence of a linear and additive association of black carbon (BC) or Cu exposure with standing diastolic blood pressure (DBP), and a linear association of S exposure with standing systolic blood pressure (SBP). Cu is thought to be a marker of urban road dust associated with traffic; and S is a marker of power plant emissions or regional long-range transported air pollution or both. Therefore, these analyses of the MOBILIZE data set suggest that emissions from these three source categories were most strongly associated with hemodynamic responses in this cohort. In contrast, in the Harvard Chan School canine study, after controlling for an overall effect of CAPs exposure, we did not observe any associations between DBP or SBP and any elemental concentrations. Instead, we observed strong evidence of an association between Mn concentrations and heart rate in that heart rate increased linearly with increasing concentrations of Mn. According to the positive matrix factorization (PMF) source apportionment analyses of the multipollutant data set from the Harvard Chan School Boston Supersite, Mn loads on the two factors that represent the mobile and road dust source categories. The results of the BKMR analyses in both the MOBILIZE and canine studies were similar to those from existing linear mixed model analyses of the same multipollutant data because the effects have linear and additive forms that could also have been detected using standard methods. This work provides several contributions to the KMR literature. First, to our knowledge this is the first time KMR methods have been used to estimate the health effects of multipollutant mixtures. Second, we developed a novel hierarchical variable-selection approach within BKMR that is able to account for the structure of the mixture and systematically handle highly correlated exposures. The analyses of the epidemiologic and toxicologic data on associations between fine particulate matter constituents and blood pressure or heart rate demonstrated associations with constituents that are typically associated with traffic emissions, power plants, and long-range transported pollutants. The simulation studies showed that the BKMR methods proposed here work well for small to moderate data sets; more work is needed to develop computationally fast methods for large data sets. This will be a goal of future work.

  15. Evaluation of redundancy analysis to identify signatures of local adaptation.

    PubMed

    Capblancq, Thibaut; Luu, Keurcien; Blum, Michael G B; Bazin, Eric

    2018-05-26

    Ordination is a common tool in ecology that aims at representing complex biological information in a reduced space. In landscape genetics, ordination methods such as principal component analysis (PCA) have been used to detect adaptive variation based on genomic data. Taking advantage of environmental data in addition to genotype data, redundancy analysis (RDA) is another ordination approach that is useful to detect adaptive variation. This paper aims at proposing a test statistic based on RDA to search for loci under selection. We compare redundancy analysis to pcadapt, which is a nonconstrained ordination method, and to a latent factor mixed model (LFMM), which is a univariate genotype-environment association method. Individual-based simulations identify evolutionary scenarios where RDA genome scans have a greater statistical power than genome scans based on PCA. By constraining the analysis with environmental variables, RDA performs better than PCA in identifying adaptive variation when selection gradients are weakly correlated with population structure. Additionally, we show that if RDA and LFMM have a similar power to identify genetic markers associated with environmental variables, the RDA-based procedure has the advantage to identify the main selective gradients as a combination of environmental variables. To give a concrete illustration of RDA in population genomics, we apply this method to the detection of outliers and selective gradients on an SNP data set of Populus trichocarpa (Geraldes et al., 2013). The RDA-based approach identifies the main selective gradient contrasting southern and coastal populations to northern and continental populations in the northwestern American coast. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  16. Advanced colorectal neoplasia risk stratification by penalized logistic regression.

    PubMed

    Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F

    2016-08-01

    Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.

  17. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm.

    PubMed

    Mao, Yong; Zhou, Xiao-Bo; Pi, Dao-Ying; Sun, You-Xian; Wong, Stephen T C

    2005-10-01

    In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.

  18. Variable selection with stepwise and best subset approaches

    PubMed Central

    2016-01-01

    While purposeful selection is performed partly by software and partly by hand, the stepwise and best subset approaches are automatically performed by software. Two R functions stepAIC() and bestglm() are well designed for stepwise and best subset regression, respectively. The stepAIC() function begins with a full or null model, and methods for stepwise regression can be specified in the direction argument with character values “forward”, “backward” and “both”. The bestglm() function begins with a data frame containing explanatory variables and response variables. The response variable should be in the last column. Varieties of goodness-of-fit criteria can be specified in the IC argument. The Bayesian information criterion (BIC) usually results in more parsimonious model than the Akaike information criterion. PMID:27162786

  19. Estimation of Monthly Near Surface Air Temperature Using Geographically Weighted Regression in China

    NASA Astrophysics Data System (ADS)

    Wang, M. M.; He, G. J.; Zhang, Z. M.; Zhang, Z. J.; Liu, X. G.

    2018-04-01

    Near surface air temperature (NSAT) is a primary descriptor of terrestrial environment conditions. The availability of NSAT with high spatial resolution is deemed necessary for several applications such as hydrology, meteorology and ecology. In this study, a regression-based NSAT mapping method is proposed. This method is combined remote sensing variables with geographical variables, and uses geographically weighted regression to estimate NSAT. The altitude was selected as geographical variable; and the remote sensing variables include land surface temperature (LST) and Normalized Difference vegetation index (NDVI). The performance of the proposed method was assessed by predict monthly minimum, mean, and maximum NSAT from point station measurements in China, a domain with a large area, complex topography, and highly variable station density, and the NSAT maps were validated against the meteorology observations. Validation results with meteorological data show the proposed method achieved an accuracy of 1.58 °C. It is concluded that the proposed method for mapping NSAT is very operational and has good precision.

  20. Sampling in health geography: reconciling geographical objectives and probabilistic methods. An example of a health survey in Vientiane (Lao PDR).

    PubMed

    Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard

    2007-06-01

    Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy.

  1. Analysis of Observational Studies in the Presence of Treatment Selection Bias: Effects of Invasive Cardiac Management on AMI Survival Using Propensity Score and Instrumental Variable Methods

    PubMed Central

    Stukel, Thérèse A.; Fisher, Elliott S; Wennberg, David E.; Alter, David A.; Gottlieb, Daniel J.; Vermeulen, Marian J.

    2007-01-01

    Context Comparisons of outcomes between patients treated and untreated in observational studies may be biased due to differences in patient prognosis between groups, often because of unobserved treatment selection biases. Objective To compare 4 analytic methods for removing the effects of selection bias in observational studies: multivariable model risk adjustment, propensity score risk adjustment, propensity-based matching, and instrumental variable analysis. Design, Setting, and Patients A national cohort of 122 124 patients who were elderly (aged 65–84 years), receiving Medicare, and hospitalized with acute myocardial infarction (AMI) in 1994–1995, and who were eligible for cardiac catheterization. Baseline chart reviews were taken from the Cooperative Cardiovascular Project and linked to Medicare health administrative data to provide a rich set of prognostic variables. Patients were followed up for 7 years through December 31, 2001, to assess the association between long-term survival and cardiac catheterization within 30 days of hospital admission. Main Outcome Measure Risk-adjusted relative mortality rate using each of the analytic methods. Results Patients who received cardiac catheterization (n=73 238) were younger and had lower AMI severity than those who did not. After adjustment for prognostic factors by using standard statistical risk-adjustment methods, cardiac catheterization was associated with a 50% relative decrease in mortality (for multivariable model risk adjustment: adjusted relative risk [RR], 0.51; 95% confidence interval [CI], 0.50–0.52; for propensity score risk adjustment: adjusted RR, 0.54; 95% CI, 0.53–0.55; and for propensity-based matching: adjusted RR, 0.54; 95% CI, 0.52–0.56). Using regional catheterization rate as an instrument, instrumental variable analysis showed a 16% relative decrease in mortality (adjusted RR, 0.84; 95% CI, 0.79–0.90). The survival benefits of routine invasive care from randomized clinical trials are between 8% and 21 %. Conclusions Estimates of the observational association of cardiac catheterization with long-term AMI mortality are highly sensitive to analytic method. All standard risk-adjustment methods have the same limitations regarding removal of unmeasured treatment selection biases. Compared with standard modeling, instrumental variable analysis may produce less biased estimates of treatment effects, but is more suited to answering policy questions than specific clinical questions. PMID:17227979

  2. Development of variable LRFD \\0x03C6 factors for deep foundation design due to site variability.

    DOT National Transportation Integrated Search

    2012-04-01

    The current design guidelines of Load and Resistance Factor Design (LRFD) specifies constant values : for deep foundation design, based on analytical method selected and degree of redundancy of the pier. : However, investigation of multiple sites in ...

  3. Annual Research Review: Embracing Not Erasing Contextual Variability in Children's Behavior--Theory and Utility in the Selection and Use of Methods and Informants in Developmental Psychopathology

    ERIC Educational Resources Information Center

    Dirks, Melanie A.; De Los Reyes, Andres; Briggs-Gowan, Margaret; Cella, David; Wakschlag, Lauren S.

    2012-01-01

    This paper examines the selection and use of multiple methods and informants for the assessment of disruptive behavior syndromes and attention deficit/hyperactivity disorder, providing a critical discussion of (a) the bidirectional linkages between theoretical models of childhood psychopathology and current assessment techniques; and (b) current…

  4. Variability aware compact model characterization for statistical circuit design optimization

    NASA Astrophysics Data System (ADS)

    Qiao, Ying; Qian, Kun; Spanos, Costas J.

    2012-03-01

    Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose an efficient variabilityaware compact model characterization methodology based on the linear propagation of variance. Hierarchical spatial variability patterns of selected compact model parameters are directly calculated from transistor array test structures. This methodology has been implemented and tested using transistor I-V measurements and the EKV-EPFL compact model. Calculation results compare well to full-wafer direct model parameter extractions. Further studies are done on the proper selection of both compact model parameters and electrical measurement metrics used in the method.

  5. [Research on fast detecting tomato seedlings nitrogen content based on NIR characteristic spectrum selection].

    PubMed

    Wu, Jing-zhu; Wang, Feng-zhu; Wang, Li-li; Zhang, Xiao-chao; Mao, Wen-hua

    2015-01-01

    In order to improve the accuracy and robustness of detecting tomato seedlings nitrogen content based on near-infrared spectroscopy (NIR), 4 kinds of characteristic spectrum selecting methods were studied in the present paper, i. e. competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variables elimination (MCUVE), backward interval partial least squares (BiPLS) and synergy interval partial least squares (SiPLS). There were totally 60 tomato seedlings cultivated at 10 different nitrogen-treatment levels (urea concentration from 0 to 120 mg . L-1), with 6 samples at each nitrogen-treatment level. They are in different degrees of over nitrogen, moderate nitrogen, lack of nitrogen and no nitrogen status. Each sample leaves were collected to scan near-infrared spectroscopy from 12 500 to 3 600 cm-1. The quantitative models based on the above 4 methods were established. According to the experimental result, the calibration model based on CARS and MCUVE selecting methods show better performance than those based on BiPLS and SiPLS selecting methods, but their prediction ability is much lower than that of the latter. Among them, the model built by BiPLS has the best prediction performance. The correlation coefficient (r), root mean square error of prediction (RMSEP) and ratio of performance to standard derivate (RPD) is 0. 952 7, 0. 118 3 and 3. 291, respectively. Therefore, NIR technology combined with characteristic spectrum selecting methods can improve the model performance. But the characteristic spectrum selecting methods are not universal. For the built model based or single wavelength variables selection is more sensitive, it is more suitable for the uniform object. While the anti-interference ability of the model built based on wavelength interval selection is much stronger, it is more suitable for the uneven and poor reproducibility object. Therefore, the characteristic spectrum selection will only play a better role in building model, combined with the consideration of sample state and the model indexes.

  6. Variable Step-Size Selection Methods for Implicit Integration Schemes

    DTIC Science & Technology

    2005-10-01

    for ρk numerically. 23 4 Examples In this section we explore this variable step-size selection method for two problems, the Lotka - Volterra model and...the Kepler problem. 4.1 The Lotka - Volterra Model For this example we consider the Lotka - Volterra model of a simple predator- prey system from...problems. Consider this variation to the Lotka - Volterra problem:   u̇ v̇   =   u2v(v − 2) v2u(1− u)   = f(u, v); t ∈ [0, 50

  7. Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis.

    PubMed

    Dietrich, Stefan; Floegel, Anna; Troll, Martina; Kühn, Tilman; Rathmann, Wolfgang; Peters, Anette; Sookthai, Disorn; von Bergen, Martin; Kaaks, Rudolf; Adamski, Jerzy; Prehn, Cornelia; Boeing, Heiner; Schulze, Matthias B; Illig, Thomas; Pischon, Tobias; Knüppel, Sven; Wang-Sattler, Rui; Drogan, Dagmar

    2016-10-01

    The application of metabolomics in prospective cohort studies is statistically challenging. Given the importance of appropriate statistical methods for selection of disease-associated metabolites in highly correlated complex data, we combined random survival forest (RSF) with an automated backward elimination procedure that addresses such issues. Our RSF approach was illustrated with data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study, with concentrations of 127 serum metabolites as exposure variables and time to development of type 2 diabetes mellitus (T2D) as outcome variable. Out of this data set, Cox regression with a stepwise selection method was recently published. Replication of methodical comparison (RSF and Cox regression) was conducted in two independent cohorts. Finally, the R-code for implementing the metabolite selection procedure into the RSF-syntax is provided. The application of the RSF approach in EPIC-Potsdam resulted in the identification of 16 incident T2D-associated metabolites which slightly improved prediction of T2D when used in addition to traditional T2D risk factors and also when used together with classical biomarkers. The identified metabolites partly agreed with previous findings using Cox regression, though RSF selected a higher number of highly correlated metabolites. The RSF method appeared to be a promising approach for identification of disease-associated variables in complex data with time to event as outcome. The demonstrated RSF approach provides comparable findings as the generally used Cox regression, but also addresses the problem of multicollinearity and is suitable for high-dimensional data. © The Author 2016; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association.

  8. Estimate variable importance for recurrent event outcomes with an application to identify hypoglycemia risk factors.

    PubMed

    Duan, Ran; Fu, Haoda

    2015-08-30

    Recurrent event data are an important data type for medical research. In particular, many safety endpoints are recurrent outcomes, such as hypoglycemic events. For such a situation, it is important to identify the factors causing these events and rank these factors by their importance. Traditional model selection methods are not able to provide variable importance in this context. Methods that are able to evaluate the variable importance, such as gradient boosting and random forest algorithms, cannot directly be applied to recurrent events data. In this paper, we propose a two-step method that enables us to evaluate the variable importance for recurrent events data. We evaluated the performance of our proposed method by simulations and applied it to a data set from a diabetes study. Copyright © 2015 John Wiley & Sons, Ltd.

  9. Optical phase conjugation assisted scattering lens: variable focusing and 3D patterning

    PubMed Central

    Ryu, Jihee; Jang, Mooseok; Eom, Tae Joong; Yang, Changhuei; Chung, Euiheon

    2016-01-01

    Variable light focusing is the ability to flexibly select the focal distance of a lens. This feature presents technical challenges, but is significant for optical interrogation of three-dimensional objects. Numerous lens designs have been proposed to provide flexible light focusing, including zoom, fluid, and liquid-crystal lenses. Although these lenses are useful for macroscale applications, they have limited utility in micron-scale applications due to restricted modulation range and exacting requirements for fabrication and control. Here, we present a holographic focusing method that enables variable light focusing without any physical modification to the lens element. In this method, a scattering layer couples low-angle (transverse wave vector) components into a full angular spectrum, and a digital optical phase conjugation (DOPC) system characterizes and plays back the wavefront that focuses through the scattering layer. We demonstrate micron-scale light focusing and patterning over a wide range of focal distances of 22–51 mm. The interferometric nature of the focusing scheme also enables an aberration-free scattering lens. The proposed method provides a unique variable focusing capability for imaging thick specimens or selective photoactivation of neuronal networks. PMID:27049442

  10. An investigation of dynamic-analysis methods for variable-geometry structures

    NASA Technical Reports Server (NTRS)

    Austin, F.

    1980-01-01

    Selected space structure configurations were reviewed in order to define dynamic analysis problems associated with variable geometry. The dynamics of a beam being constructed from a flexible base and the relocation of the completed beam by rotating the remote manipulator system about the shoulder joint were selected. Equations of motion were formulated in physical coordinates for both of these problems, and FORTRAN programs were developed to generate solutions by numerically integrating the equations. These solutions served as a standard of comparison to gauge the accuracy of approximate solution techniques that were developed and studied. Good control was achieved in both problems. Unstable control system coupling with the system flexibility did not occur. An approximate method was developed for each problem to enable the analyst to investigate variable geometry effects during a short time span using standard fixed geometry programs such as NASTRAN. The average angle and average length techniques are discussed.

  11. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  12. Stochastic model search with binary outcomes for genome-wide association studies.

    PubMed

    Russu, Alberto; Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo

    2012-06-01

    The spread of case-control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model.

  13. 14 CFR 35.21 - Variable and reversible pitch propellers.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... is shown to be extremely remote under § 35.15. (b) For propellers incorporating a method to select... manual. The method for sensing and indicating the propeller blade pitch position must be such that its...

  14. 14 CFR 35.21 - Variable and reversible pitch propellers.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... is shown to be extremely remote under § 35.15. (b) For propellers incorporating a method to select... manual. The method for sensing and indicating the propeller blade pitch position must be such that its...

  15. 14 CFR 35.21 - Variable and reversible pitch propellers.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... is shown to be extremely remote under § 35.15. (b) For propellers incorporating a method to select... manual. The method for sensing and indicating the propeller blade pitch position must be such that its...

  16. 14 CFR 35.21 - Variable and reversible pitch propellers.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... is shown to be extremely remote under § 35.15. (b) For propellers incorporating a method to select... manual. The method for sensing and indicating the propeller blade pitch position must be such that its...

  17. 14 CFR 35.21 - Variable and reversible pitch propellers.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... is shown to be extremely remote under § 35.15. (b) For propellers incorporating a method to select... manual. The method for sensing and indicating the propeller blade pitch position must be such that its...

  18. Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context.

    PubMed

    Martinez, Josue G; Carroll, Raymond J; Müller, Samuel; Sampson, Joshua N; Chatterjee, Nilanjan

    2011-11-01

    When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.

  19. Diversity pattern in Sesamum mutants selected for a semi-arid cropping system.

    PubMed

    Murty, B R; Oropeza, F

    1989-02-01

    Due to the complex requirements of moisture stress, substantial genetic diversity with a wide array of character combinations and effective simultaneous selection for several variables is necessary for improving the productivity and adaptation of a component crop in order for it to fit into a cropping system under semi-arid tropical conditions. Sesamum indicum L. is grown in Venezuela after rice/sorghum/or maize under such conditions. A mutation breeding program was undertaken using six locally adapted varieties to develop genotypes suitable for the above system. The diversity pattern for nine variables was assessed by multivariate analysis in 301 M4 progenies. Analysis of the characteristic roots and principal components in three methods of selection, i.e., M2 bulks (A), individual plant selection throughout (B), and selection in M3 for single variable (C), revealed differences in the pattern of variation between varieties, selection methods, and varieties x methods interactions. Method B was superior to the others and gave 17 of the 21 best M5 progenies. 'Piritu' and 'CF' varieties yielded the most productive progenies in M5 and M6. Diversity was large and selection was effective for such developmental traits as earliness and synchrony, combined with multiple disease resistance, which could be related to their importance by multivariate analyses. Considerable differences in the variety of character combinations among the high yielding. M5 progenies of 'CF' and 'Piritu' suggested possible further yield improvement. The superior response of 'Piritu' and 'CF' over other varieties in yield and adaptation was due to major changes in plant type and character associations. Multilocation testing of M5 generations revealed that the mutant progenies had a 40%-100% yield superiority over the parents; this was combined with earliness, synchrony, and multiple disease resistance, and was confirmed in the M6 generation grown on a commercial scale. This study showed that multivariate analysis is an effective tool for assessing diversity patterns, choice of appropriate variety, and selection methodology in order to make rapid progress in meeting the complex requirements of semi-arid cropping systems.

  20. Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines.

    PubMed

    Richardson, Alice M; Lidbury, Brett A

    2017-08-14

    Data mining techniques such as support vector machines (SVMs) have been successfully used to predict outcomes for complex problems, including for human health. Much health data is imbalanced, with many more controls than positive cases. The impact of three balancing methods and one feature selection method is explored, to assess the ability of SVMs to classify imbalanced diagnostic pathology data associated with the laboratory diagnosis of hepatitis B (HBV) and hepatitis C (HCV) infections. Random forests (RFs) for predictor variable selection, and data reshaping to overcome a large imbalance of negative to positive test results in relation to HBV and HCV immunoassay results, are examined. The methodology is illustrated using data from ACT Pathology (Canberra, Australia), consisting of laboratory test records from 18,625 individuals who underwent hepatitis virus testing over the decade from 1997 to 2007. Overall, the prediction of HCV test results by immunoassay was more accurate than for HBV immunoassay results associated with identical routine pathology predictor variable data. HBV and HCV negative results were vastly in excess of positive results, so three approaches to handling the negative/positive data imbalance were compared. Generating datasets by the Synthetic Minority Oversampling Technique (SMOTE) resulted in significantly more accurate prediction than single downsizing or multiple downsizing (MDS) of the dataset. For downsized data sets, applying a RF for predictor variable selection had a small effect on the performance, which varied depending on the virus. For SMOTE, a RF had a negative effect on performance. An analysis of variance of the performance across settings supports these findings. Finally, age and assay results for alanine aminotransferase (ALT), sodium for HBV and urea for HCV were found to have a significant impact upon laboratory diagnosis of HBV or HCV infection using an optimised SVM model. Laboratories looking to include machine learning via SVM as part of their decision support need to be aware that the balancing method, predictor variable selection and the virus type interact to affect the laboratory diagnosis of hepatitis virus infection with routine pathology laboratory variables in different ways depending on which combination is being studied. This awareness should lead to careful use of existing machine learning methods, thus improving the quality of laboratory diagnosis.

  1. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

    NASA Astrophysics Data System (ADS)

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-04-01

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  2. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra.

    PubMed

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-03-13

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  3. [Rapid assessment of critical quality attributes of Chinese materia medica (II): strategy of NIR assignment].

    PubMed

    Pei, Yan-Ling; Wu, Zhi-Sheng; Shi, Xin-Yuan; Zhou, Lu-Wei; Qiao, Yan-Jiang

    2014-09-01

    The present paper firstly reviewed the research progress and main methods of NIR spectral assignment coupled with our research results. Principal component analysis was focused on characteristic signal extraction to reflect spectral differences. Partial least squares method was concerned with variable selection to discover characteristic absorption band. Two-dimensional correlation spectroscopy was mainly adopted for spectral assignment. Autocorrelation peaks were obtained from spectral changes, which were disturbed by external factors, such as concentration, temperature and pressure. Density functional theory was used to calculate energy from substance structure to establish the relationship between molecular energy and spectra change. Based on the above reviewed method, taking a NIR spectral assignment of chlorogenic acid as example, a reliable spectral assignment for critical quality attributes of Chinese materia medica (CMM) was established using deuterium technology and spectral variable selection. The result demonstrated the assignment consistency according to spectral features of different concentrations of chlorogenic acid and variable selection region of online NIR model in extract process. Although spectral assignment was initial using an active pharmaceutical ingredient, it is meaningful to look forward to the futurity of the complex components in CMM. Therefore, it provided methodology for NIR spectral assignment of critical quality attributes in CMM.

  4. Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics

    PubMed Central

    Lin, Wei; Feng, Rui; Li, Hongzhe

    2014-01-01

    In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642

  5. Stochastic model search with binary outcomes for genome-wide association studies

    PubMed Central

    Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo

    2012-01-01

    Objective The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Materials and methods Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. Results BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. Discussion BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. Conclusion The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model. PMID:22534080

  6. Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.

    PubMed

    Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui

    2017-07-15

    New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier Inc.

  7. Protein construct storage: Bayesian variable selection and prediction with mixtures.

    PubMed

    Clyde, M A; Parmigiani, G

    1998-07-01

    Determining optimal conditions for protein storage while maintaining a high level of protein activity is an important question in pharmaceutical research. A designed experiment based on a space-filling design was conducted to understand the effects of factors affecting protein storage and to establish optimal storage conditions. Different model-selection strategies to identify important factors may lead to very different answers about optimal conditions. Uncertainty about which factors are important, or model uncertainty, can be a critical issue in decision-making. We use Bayesian variable selection methods for linear models to identify important variables in the protein storage data, while accounting for model uncertainty. We also use the Bayesian framework to build predictions based on a large family of models, rather than an individual model, and to evaluate the probability that certain candidate storage conditions are optimal.

  8. Sharpening method of satellite thermal image based on the geographical statistical model

    NASA Astrophysics Data System (ADS)

    Qi, Pengcheng; Hu, Shixiong; Zhang, Haijun; Guo, Guangmeng

    2016-04-01

    To improve the effectiveness of thermal sharpening in mountainous regions, paying more attention to the laws of land surface energy balance, a thermal sharpening method based on the geographical statistical model (GSM) is proposed. Explanatory variables were selected from the processes of land surface energy budget and thermal infrared electromagnetic radiation transmission, then high spatial resolution (57 m) raster layers were generated for these variables through spatially simulating or using other raster data as proxies. Based on this, the local adaptation statistical relationship between brightness temperature (BT) and the explanatory variables, i.e., the GSM, was built at 1026-m resolution using the method of multivariate adaptive regression splines. Finally, the GSM was applied to the high-resolution (57-m) explanatory variables; thus, the high-resolution (57-m) BT image was obtained. This method produced a sharpening result with low error and good visual effect. The method can avoid the blind choice of explanatory variables and remove the dependence on synchronous imagery at visible and near-infrared bands. The influences of the explanatory variable combination, sampling method, and the residual error correction on sharpening results were analyzed deliberately, and their influence mechanisms are reported herein.

  9. Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

    PubMed

    Shen, Chung-Wei; Chen, Yi-Hau

    2018-03-13

    We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.

  10. NLSCIDNT user's guide maximum likehood parameter identification computer program with nonlinear rotorcraft model

    NASA Technical Reports Server (NTRS)

    1979-01-01

    A nonlinear, maximum likelihood, parameter identification computer program (NLSCIDNT) is described which evaluates rotorcraft stability and control coefficients from flight test data. The optimal estimates of the parameters (stability and control coefficients) are determined (identified) by minimizing the negative log likelihood cost function. The minimization technique is the Levenberg-Marquardt method, which behaves like the steepest descent method when it is far from the minimum and behaves like the modified Newton-Raphson method when it is nearer the minimum. Twenty-one states and 40 measurement variables are modeled, and any subset may be selected. States which are not integrated may be fixed at an input value, or time history data may be substituted for the state in the equations of motion. Any aerodynamic coefficient may be expressed as a nonlinear polynomial function of selected 'expansion variables'.

  11. Genetic potential of common bean progenies obtained by different breeding methods evaluated in various environments.

    PubMed

    Pontes Júnior, V A; Melo, P G S; Pereira, H S; Melo, L C

    2016-09-02

    Grain yield is strongly influenced by the environment, has polygenic and complex inheritance, and is a key trait in the selection and recommendation of cultivars. Breeding programs should efficiently explore the genetic variability resulting from crosses by selecting the most appropriate method for breeding in segregating populations. The goal of this study was to evaluate and compare the genetic potential of common bean progenies of carioca grain for grain yield, obtained by different breeding methods and evaluated in different environments. Progenies originating from crosses between lines and CNFC 7812 and CNFC 7829 were replanted up to the F 7 generation using three breeding methods in segregating populations: population (bulk), bulk within F 2 progenies, and single-seed descent (SSD). Fifteen F 8 progenies per method, two controls (BRS Estilo and Perola), and the parents were evaluated in a 7 x 7 simple lattice design, with plots of two 4-m rows. The tests were conducted in 10 environments in four States of Brazil and in three growing seasons in 2009 and 2010. Genetic parameters including genetic variance, heritability, variance of interaction, and expected selection gain were estimated. Genetic variability among progenies and the effect of progeny-environment interactions were determined for the three methods. The breeding methods differed significantly due to the effects of sampling procedures on the progenies and due to natural selection, which mainly affected the bulk method. The SSD and bulk methods provided populations with better estimates of genetic parameters and more stable progenies that were less affected by interaction with the environment.

  12. New insights into time series analysis. II - Non-correlated observations

    NASA Astrophysics Data System (ADS)

    Ferreira Lopes, C. E.; Cross, N. J. G.

    2017-08-01

    Context. Statistical parameters are used to draw conclusions in a vast number of fields such as finance, weather, industrial, and science. These parameters are also used to identify variability patterns on photometric data to select non-stochastic variations that are indicative of astrophysical effects. New, more efficient, selection methods are mandatory to analyze the huge amount of astronomical data. Aims: We seek to improve the current methods used to select non-stochastic variations on non-correlated data. Methods: We used standard and new data-mining parameters to analyze non-correlated data to find the best way to discriminate between stochastic and non-stochastic variations. A new approach that includes a modified Strateva function was performed to select non-stochastic variations. Monte Carlo simulations and public time-domain data were used to estimate its accuracy and performance. Results: We introduce 16 modified statistical parameters covering different features of statistical distribution such as average, dispersion, and shape parameters. Many dispersion and shape parameters are unbound parameters, I.e. equations that do not require the calculation of average. Unbound parameters are computed with single loop and hence decreasing running time. Moreover, the majority of these parameters have lower errors than previous parameters, which is mainly observed for distributions with few measurements. A set of non-correlated variability indices, sample size corrections, and a new noise model along with tests of different apertures and cut-offs on the data (BAS approach) are introduced. The number of mis-selections are reduced by about 520% using a single waveband and 1200% combining all wavebands. On the other hand, the even-mean also improves the correlated indices introduced in Paper I. The mis-selection rate is reduced by about 18% if the even-mean is used instead of the mean to compute the correlated indices in the WFCAM database. Even-statistics allows us to improve the effectiveness of both correlated and non-correlated indices. Conclusions: The selection of non-stochastic variations is improved by non-correlated indices. The even-averages provide a better estimation of mean and median for almost all statistical distributions analyzed. The correlated variability indices, which are proposed in the first paper of this series, are also improved if the even-mean is used. The even-parameters will also be useful for classifying light curves in the last step of this project. We consider that the first step of this project, where we set new techniques and methods that provide a huge improvement on the efficiency of selection of variable stars, is now complete. Many of these techniques may be useful for a large number of fields. Next, we will commence a new step of this project regarding the analysis of period search methods.

  13. Interpretation of tropospheric ozone variability in data with different vertical and temporal resolution

    NASA Astrophysics Data System (ADS)

    Petropavlovskikh, I. V.; Disterhoft, P.; Johnson, B. J.; Rieder, H. E.; Manney, G. L.; Daffer, W.

    2012-12-01

    This work attributes tropospheric ozone variability derived from the ground-based Dobson and Brewer Umkehr measurements and from ozone sonde data to local sources and transport. It assesses capability and limitations in both types of measurements that are often used to analyze long- and short-term variability in tropospheric ozone time series. We will address the natural and instrument-related contribution to the variability found in both Umkehr and sonde data. Validation of Umkehr methods is often done by intercomparisons against independent ozone measuring techniques such as ozone sounding. We will use ozone-sounding in its original and AK-smoothed vertical profiles for assessment of ozone inter-annual variability over Boulder, CO. We will discuss possible reasons for differences between different ozone measuring techniques and its effects on the derived ozone trends. Next to standard evaluation techniques we utilize a STL-decomposition method to address temporal variability and trends in the Boulder Umkehr data. Further, we apply a statistical modeling approach to the ozone data set to attribute ozone variability to individual driving forces associated with natural and anthropogenic causes. To this aim we follow earlier work applying a backward selection method (i.e., a stepwise elimination procedure out of a set of total 44 explanatory variables) to determine those explanatory variables which contribute most significantly to the observed variability. We will present also some results associated with completeness (sampling rate) of the existing data sets. We will also use MERRA (Modern-Era Retrospective analysis for Research and Applications) re-analysis results selected for Boulder location as a transfer function in understanding of the effects that the temporal sampling and vertical resolution bring into trend and ozone variability analysis. Analyzing intra-annual variability in ozone measurements over Boulder, CO, in relation to the upper tropospheric subtropical and polar jets, we will address the stratospheric and tropospheric intrusions in the middle latitude troposphere ozone field.

  14. Construction of Response Surface with Higher Order Continuity and Its Application to Reliability Engineering

    NASA Technical Reports Server (NTRS)

    Krishnamurthy, T.; Romero, V. J.

    2002-01-01

    The usefulness of piecewise polynomials with C1 and C2 derivative continuity for response surface construction method is examined. A Moving Least Squares (MLS) method is developed and compared with four other interpolation methods, including kriging. First the selected methods are applied and compared with one another in a two-design variables problem with a known theoretical response function. Next the methods are tested in a four-design variables problem from a reliability-based design application. In general the piecewise polynomial with higher order derivative continuity methods produce less error in the response prediction. The MLS method was found to be superior for response surface construction among the methods evaluated.

  15. Path Finding on High-Dimensional Free Energy Landscapes

    NASA Astrophysics Data System (ADS)

    Díaz Leines, Grisell; Ensing, Bernd

    2012-07-01

    We present a method for determining the average transition path and the free energy along this path in the space of selected collective variables. The formalism is based upon a history-dependent bias along a flexible path variable within the metadynamics framework but with a trivial scaling of the cost with the number of collective variables. Controlling the sampling of the orthogonal modes recovers the average path and the minimum free energy path as the limiting cases. The method is applied to resolve the path and the free energy of a conformational transition in alanine dipeptide.

  16. Analysis of Genome-Wide Association Studies with Multiple Outcomes Using Penalization

    PubMed Central

    Liu, Jin; Huang, Jian; Ma, Shuangge

    2012-01-01

    Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods. PMID:23272092

  17. Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification

    NASA Astrophysics Data System (ADS)

    Richards, Joseph W.; Starr, Dan L.; Brink, Henrik; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; James, J. Berian; Long, James P.; Rice, John

    2012-01-01

    Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because (1) standard assumptions for machine-learned model selection procedures break down and (2) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting, co-training, and active learning (AL). We argue that AL—where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up—is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and Optical Gravitational Lensing Experiment, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a Web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply AL to classify variable stars in the All Sky Automated Survey, finding dramatic improvement in our agreement with the ASAS Catalog of Variable Stars, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.

  18. Classification of the European Union member states according to the relative level of sustainable development.

    PubMed

    Anna, Bluszcz

    Nowadays methods of measurement and assessment of the level of sustained development at the international, national and regional level are a current research problem, which requires multi-dimensional analysis. The relative assessment of the sustainability level of the European Union member states and the comparative analysis of the position of Poland relative to other countries was the aim of the conducted studies in the article. EU member states were treated as objects in the multi-dimensional space. Dimensions of space were specified by ten diagnostic variables describing the sustainability level of UE countries in three dimensions, i.e., social, economic and environmental. Because the compiled statistical data were expressed in different units of measure, taxonomic methods were used for building an aggregated measure to assess the level of sustainable development of EU member states, which through normalisation of variables enabled the comparative analysis between countries. Methodology of studies consisted of eight stages, which included, among others: defining data matrices, calculating the variability coefficient for all variables, which variability coefficient was under 10 %, division of variables into stimulants and destimulants, selection of the method of variable normalisation, developing matrices of normalised data, selection of the formula and calculating the aggregated indicator of the relative level of sustainable development of the EU countries, calculating partial development indicators for three studies dimensions: social, economic and environmental and the classification of the EU countries according to the relative level of sustainable development. Statistical date were collected based on the Polish Central Statistical Office publication.

  19. Variable-mesh method of solving differential equations

    NASA Technical Reports Server (NTRS)

    Van Wyk, R.

    1969-01-01

    Multistep predictor-corrector method for numerical solution of ordinary differential equations retains high local accuracy and convergence properties. In addition, the method was developed in a form conducive to the generation of effective criteria for the selection of subsequent step sizes in step-by-step solution of differential equations.

  20. Selection Index in the Study of Adaptability and Stability in Maize

    PubMed Central

    Lunezzo de Oliveira, Rogério; Garcia Von Pinho, Renzo; Furtado Ferreira, Daniel; Costa Melo, Wagner Mateus

    2014-01-01

    This paper proposes an alternative method for evaluating the stability and adaptability of maize hybrids using a genotype-ideotype distance index (GIDI) for selection. Data from seven variables were used, obtained through evaluation of 25 maize hybrids at six sites in southern Brazil. The GIDI was estimated by means of the generalized Mahalanobis distance for each plot of the test. We then proceeded to GGE biplot analysis in order to compare the predictive accuracy of the GGE models and the grouping of environments and to select the best five hybrids. The G × E interaction was significant for both variables assessed. The GGE model with two principal components obtained a predictive accuracy (PRECORR) of 0.8913 for the GIDI and 0.8709 for yield (t ha−1). Two groups of environments were obtained upon analyzing the GIDI, whereas all the environments remained in the same group upon analyzing yield. Coincidence occurred in only two hybrids considering evaluation of the two features. The GIDI assessment provided for selection of hybrids that combine adaptability and stability in most of the variables assessed, making its use more highly recommended than analyzing each variable separately. Not all the higher-yielding hybrids were the best in the other variables assessed. PMID:24696641

  1. Continuous-variable measurement-device-independent quantum key distribution with virtual photon subtraction

    NASA Astrophysics Data System (ADS)

    Zhao, Yijia; Zhang, Yichen; Xu, Bingjie; Yu, Song; Guo, Hong

    2018-04-01

    The method of improving the performance of continuous-variable quantum key distribution protocols by postselection has been recently proposed and verified. In continuous-variable measurement-device-independent quantum key distribution (CV-MDI QKD) protocols, the measurement results are obtained from untrusted third party Charlie. There is still not an effective method of improving CV-MDI QKD by the postselection with untrusted measurement. We propose a method to improve the performance of coherent-state CV-MDI QKD protocol by virtual photon subtraction via non-Gaussian postselection. The non-Gaussian postselection of transmitted data is equivalent to an ideal photon subtraction on the two-mode squeezed vacuum state, which is favorable to enhance the performance of CV-MDI QKD. In CV-MDI QKD protocol with non-Gaussian postselection, two users select their own data independently. We demonstrate that the optimal performance of the renovated CV-MDI QKD protocol is obtained with the transmitted data only selected by Alice. By setting appropriate parameters of the virtual photon subtraction, the secret key rate and tolerable excess noise are both improved at long transmission distance. The method provides an effective optimization scheme for the application of CV-MDI QKD protocols.

  2. The Multifaceted Variable Approach: Selection of Method in Solving Simple Linear Equations

    ERIC Educational Resources Information Center

    Tahir, Salma; Cavanagh, Michael

    2010-01-01

    This paper presents a comparison of the solution strategies used by two groups of Year 8 students as they solved linear equations. The experimental group studied algebra following a multifaceted variable approach, while the comparison group used a traditional approach. Students in the experimental group employed different solution strategies,…

  3. Adjusting for radiotelemetry error to improve estimates of habitat use.

    Treesearch

    Scott L. Findholt; Bruce K. Johnson; Lyman L. McDonald; John W. Kern; Alan Ager; Rosemary J. Stussy; Larry D. Bryant

    2002-01-01

    Animal locations estimated from radiotelemetry have traditionally been treated as error-free when analyzed in relation to habitat variables. Location error lowers the power of statistical tests of habitat selection. We describe a method that incorporates the error surrounding point estimates into measures of environmental variables determined from a geographic...

  4. Reinforcement Learning Trees

    PubMed Central

    Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.

    2015-01-01

    In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687

  5. Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

    PubMed Central

    Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.

    2013-01-01

    Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933

  6. Variability-selected active galactic nuclei in the VST-SUDARE/VOICE survey of the COSMOS field

    NASA Astrophysics Data System (ADS)

    De Cicco, D.; Paolillo, M.; Covone, G.; Falocco, S.; Longo, G.; Grado, A.; Limatola, L.; Botticella, M. T.; Pignata, G.; Cappellaro, E.; Vaccari, M.; Trevese, D.; Vagnetti, F.; Salvato, M.; Radovich, M.; Brandt, W. N.; Capaccioli, M.; Napolitano, N. R.; Schipani, P.

    2015-02-01

    Context. Active galaxies are characterized by variability at every wavelength, with timescales from hours to years depending on the observing window. Optical variability has proven to be an effective way of detecting AGNs in imaging surveys, lasting from weeks to years. Aims: In the present work we test the use of optical variability as a tool to identify active galactic nuclei in the VST multiepoch survey of the COSMOS field, originally tailored to detect supernova events. Methods: We make use of the multiwavelength data provided by other COSMOS surveys to discuss the reliability of the method and the nature of our AGN candidates. Results: The selection on the basis of optical variability returns a sample of 83 AGN candidates; based on a number of diagnostics, we conclude that 67 of them are confirmed AGNs (81% purity), 12 are classified as supernovae, while the nature of the remaining 4 is unknown. For the subsample of AGNs with some spectroscopic classification, we find that Type 1 are prevalent (89%) compared to Type 2 AGNs (11%). Overall, our approach is able to retrieve on average 15% of all AGNs in the field identified by means of spectroscopic or X-ray classification, with a strong dependence on the source apparent magnitude (completeness ranging from 26% to 5%). In particular, the completeness for Type 1 AGNs is 25%, while it drops to 6% for Type 2 AGNs. The rest of the X-ray selected AGN population presents on average a larger rms variability than the bulk of non-variable sources, indicating that variability detection for at least some of these objects is prevented only by the photometric accuracy of the data. The low completeness is in part due to the short observing span: we show that increasing the temporal baseline results in larger samples as expected for sources with a red-noise power spectrum. Our results allow us to assess the usefulness of this AGN selection technique in view of future wide-field surveys. Observations were provided by the ESO programs 088.D-0370 and 088.D-4013 (PI G. Pignata).Table 3 is available in electronic form at http://www.aanda.org

  7. Exploring the Impact of Early Decisions in Variable Ordering for Constraint Satisfaction Problems.

    PubMed

    Ortiz-Bayliss, José Carlos; Amaya, Ivan; Conant-Pablos, Santiago Enrique; Terashima-Marín, Hugo

    2018-01-01

    When solving constraint satisfaction problems (CSPs), it is a common practice to rely on heuristics to decide which variable should be instantiated at each stage of the search. But, this ordering influences the search cost. Even so, and to the best of our knowledge, no earlier work has dealt with how first variable orderings affect the overall cost. In this paper, we explore the cost of finding high-quality orderings of variables within constraint satisfaction problems. We also study differences among the orderings produced by some commonly used heuristics and the way bad first decisions affect the search cost. One of the most important findings of this work confirms the paramount importance of first decisions. Another one is the evidence that many of the existing variable ordering heuristics fail to appropriately select the first variable to instantiate. Another one is the evidence that many of the existing variable ordering heuristics fail to appropriately select the first variable to instantiate. We propose a simple method to improve early decisions of heuristics. By using it, performance of heuristics increases.

  8. Exploring the Impact of Early Decisions in Variable Ordering for Constraint Satisfaction Problems

    PubMed Central

    Amaya, Ivan

    2018-01-01

    When solving constraint satisfaction problems (CSPs), it is a common practice to rely on heuristics to decide which variable should be instantiated at each stage of the search. But, this ordering influences the search cost. Even so, and to the best of our knowledge, no earlier work has dealt with how first variable orderings affect the overall cost. In this paper, we explore the cost of finding high-quality orderings of variables within constraint satisfaction problems. We also study differences among the orderings produced by some commonly used heuristics and the way bad first decisions affect the search cost. One of the most important findings of this work confirms the paramount importance of first decisions. Another one is the evidence that many of the existing variable ordering heuristics fail to appropriately select the first variable to instantiate. Another one is the evidence that many of the existing variable ordering heuristics fail to appropriately select the first variable to instantiate. We propose a simple method to improve early decisions of heuristics. By using it, performance of heuristics increases. PMID:29681923

  9. A critique of assumptions about selecting chemical-resistant gloves: a case for workplace evaluation of glove efficacy.

    PubMed

    Klingner, Thomas D; Boeniger, Mark F

    2002-05-01

    Wearing chemical-resistant gloves and clothing is the primary method used to prevent skin exposure to toxic chemicals in the workplace. The process for selecting gloves is usually based on manufacturers' laboratory-generated chemical permeation data. However, such data may not reflect conditions in the workplace where many variables are encountered (e.g., elevated temperature, flexing, pressure, and product variation between suppliers). Thus, the reliance on this selection process is questionable. Variables that may influence the performance of chemical-resistant gloves are identified and discussed. Passive dermal monitoring is recommended to evaluate glove performance under actual-use conditions and can bridge the gap between laboratory data and real-world performance.

  10. Probabilistic structural analysis methods for improving Space Shuttle engine reliability

    NASA Technical Reports Server (NTRS)

    Boyce, L.

    1989-01-01

    Probabilistic structural analysis methods are particularly useful in the design and analysis of critical structural components and systems that operate in very severe and uncertain environments. These methods have recently found application in space propulsion systems to improve the structural reliability of Space Shuttle Main Engine (SSME) components. A computer program, NESSUS, based on a deterministic finite-element program and a method of probabilistic analysis (fast probability integration) provides probabilistic structural analysis for selected SSME components. While computationally efficient, it considers both correlated and nonnormal random variables as well as an implicit functional relationship between independent and dependent variables. The program is used to determine the response of a nickel-based superalloy SSME turbopump blade. Results include blade tip displacement statistics due to the variability in blade thickness, modulus of elasticity, Poisson's ratio or density. Modulus of elasticity significantly contributed to blade tip variability while Poisson's ratio did not. Thus, a rational method for choosing parameters to be modeled as random is provided.

  11. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    DTIC Science & Technology

    2014-12-01

    regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value

  12. Chemical spoilage extent traceability of two kinds of processed pork meats using one multispectral system developed by hyperspectral imaging combined with effective variable selection methods.

    PubMed

    Cheng, Weiwei; Sun, Da-Wen; Pu, Hongbin; Wei, Qingyi

    2017-04-15

    The feasibility of hyperspectral imaging (HSI) (400-1000nm) for tracing the chemical spoilage extent of the raw meat used for two kinds of processed meats was investigated. Calibration models established separately for salted and cooked meats using full wavebands showed good results with the determination coefficient in prediction (R 2 P ) of 0.887 and 0.832, respectively. For simplifying the calibration models, two variable selection methods were used and compared. The results showed that genetic algorithm-partial least squares (GA-PLS) with as much continuous wavebands selected as possible always had better performance. The potential of HSI to develop one multispectral system for simultaneously tracing the chemical spoilage extent of the two kinds of processed meats was also studied. Good result with an R 2 P of 0.854 was obtained using GA-PLS as the dimension reduction method, which was thus used to visualize total volatile base nitrogen (TVB-N) contents corresponding to each pixel of the image. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Selection of Optimal Auxiliary Soil Nutrient Variables for Cokriging Interpolation

    PubMed Central

    Song, Genxin; Zhang, Jing; Wang, Ke

    2014-01-01

    In order to explore the selection of the best auxiliary variables (BAVs) when using the Cokriging method for soil attribute interpolation, this paper investigated the selection of BAVs from terrain parameters, soil trace elements, and soil nutrient attributes when applying Cokriging interpolation to soil nutrients (organic matter, total N, available P, and available K). In total, 670 soil samples were collected in Fuyang, and the nutrient and trace element attributes of the soil samples were determined. Based on the spatial autocorrelation of soil attributes, the Digital Elevation Model (DEM) data for Fuyang was combined to explore the coordinate relationship among terrain parameters, trace elements, and soil nutrient attributes. Variables with a high correlation to soil nutrient attributes were selected as BAVs for Cokriging interpolation of soil nutrients, and variables with poor correlation were selected as poor auxiliary variables (PAVs). The results of Cokriging interpolations using BAVs and PAVs were then compared. The results indicated that Cokriging interpolation with BAVs yielded more accurate results than Cokriging interpolation with PAVs (the mean absolute error of BAV interpolation results for organic matter, total N, available P, and available K were 0.020, 0.002, 7.616, and 12.4702, respectively, and the mean absolute error of PAV interpolation results were 0.052, 0.037, 15.619, and 0.037, respectively). The results indicated that Cokriging interpolation with BAVs can significantly improve the accuracy of Cokriging interpolation for soil nutrient attributes. This study provides meaningful guidance and reference for the selection of auxiliary parameters for the application of Cokriging interpolation to soil nutrient attributes. PMID:24927129

  14. Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning.

    PubMed

    Iwata, Kazunori

    2016-05-11

    Softmax selection is one of the most popular methods for action selection in reinforcement learning. Although various recently proposed methods may be more effective with full parameter tuning, implementing a complicated method that requires the tuning of many parameters can be difficult. Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning. In fact, this method works adequately in practice with only one parameter appropriately set for the environment. The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning. To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection. Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability. The bandwidth is quantitatively assessed in a series of statistical tests.

  15. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    PubMed

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  16. [Correlation coefficient-based classification method of hydrological dependence variability: With auto-regression model as example].

    PubMed

    Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi

    2018-04-01

    Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.

  17. Mendelian randomization with fine-mapped genetic data: Choosing from large numbers of correlated instrumental variables.

    PubMed

    Burgess, Stephen; Zuber, Verena; Valdes-Marquez, Elsa; Sun, Benjamin B; Hopewell, Jemma C

    2017-12-01

    Mendelian randomization uses genetic variants to make causal inferences about the effect of a risk factor on an outcome. With fine-mapped genetic data, there may be hundreds of genetic variants in a single gene region any of which could be used to assess this causal relationship. However, using too many genetic variants in the analysis can lead to spurious estimates and inflated Type 1 error rates. But if only a few genetic variants are used, then the majority of the data is ignored and estimates are highly sensitive to the particular choice of variants. We propose an approach based on summarized data only (genetic association and correlation estimates) that uses principal components analysis to form instruments. This approach has desirable theoretical properties: it takes the totality of data into account and does not suffer from numerical instabilities. It also has good properties in simulation studies: it is not particularly sensitive to varying the genetic variants included in the analysis or the genetic correlation matrix, and it does not have greatly inflated Type 1 error rates. Overall, the method gives estimates that are less precise than those from variable selection approaches (such as using a conditional analysis or pruning approach to select variants), but are more robust to seemingly arbitrary choices in the variable selection step. Methods are illustrated by an example using genetic associations with testosterone for 320 genetic variants to assess the effect of sex hormone related pathways on coronary artery disease risk, in which variable selection approaches give inconsistent inferences. © 2017 The Authors Genetic Epidemiology Published by Wiley Periodicals, Inc.

  18. Determination of biodiesel content in biodiesel/diesel blends using NIR and visible spectroscopy with variable selection.

    PubMed

    Fernandes, David Douglas Sousa; Gomes, Adriano A; Costa, Gean Bezerra da; Silva, Gildo William B da; Véras, Germano

    2011-12-15

    This work is concerned of evaluate the use of visible and near-infrared (NIR) range, separately and combined, to determine the biodiesel content in biodiesel/diesel blends using Multiple Linear Regression (MLR) and variable selection by Successive Projections Algorithm (SPA). Full spectrum models employing Partial Least Squares (PLS) and variables selection by Stepwise (SW) regression coupled with Multiple Linear Regression (MLR) and PLS models also with variable selection by Jack-Knife (Jk) were compared the proposed methodology. Several preprocessing were evaluated, being chosen derivative Savitzky-Golay with second-order polynomial and 17-point window for NIR and visible-NIR range, with offset correction. A total of 100 blends with biodiesel content between 5 and 50% (v/v) prepared starting from ten sample of biodiesel. In the NIR and visible region the best model was the SPA-MLR using only two and eight wavelengths with RMSEP of 0.6439% (v/v) and 0.5741 respectively, while in the visible-NIR region the best model was the SW-MLR using five wavelengths and RMSEP of 0.9533% (v/v). Results indicate that both spectral ranges evaluated showed potential for developing a rapid and nondestructive method to quantify biodiesel in blends with mineral diesel. Finally, one can still mention that the improvement in terms of prediction error obtained with the procedure for variables selection was significant. Copyright © 2011 Elsevier B.V. All rights reserved.

  19. Methods of rapid, early selection of poplar clones for maximum yield potential: a manual of procedures.

    Treesearch

    USDA FS

    1982-01-01

    Instructions, illustrated with examples and experimental results, are given for the controlled-environment propagation and selection of poplar clones. Greenhouse and growth-room culture of poplar stock plants and scions are described, and statistical techniques for discriminating among clones on the basis of growth variables are emphasized.

  20. Fruit Quality Evaluation Using Spectroscopy Technology: A Review

    PubMed Central

    Wang, Hailong; Peng, Jiyu; Xie, Chuanqi; Bao, Yidan; He, Yong

    2015-01-01

    An overview is presented with regard to applications of visible and near infrared (Vis/NIR) spectroscopy, multispectral imaging and hyperspectral imaging techniques for quality attributes measurement and variety discrimination of various fruit species, i.e., apple, orange, kiwifruit, peach, grape, strawberry, grape, jujube, banana, mango and others. Some commonly utilized chemometrics including pretreatment methods, variable selection methods, discriminant methods and calibration methods are briefly introduced. The comprehensive review of applications, which concentrates primarily on Vis/NIR spectroscopy, are arranged according to fruit species. Most of the applications are focused on variety discrimination or the measurement of soluble solids content (SSC), acidity and firmness, but also some measurements involving dry matter, vitamin C, polyphenols and pigments have been reported. The feasibility of different spectral modes, i.e., reflectance, interactance and transmittance, are discussed. Optimal variable selection methods and calibration methods for measuring different attributes of different fruit species are addressed. Special attention is paid to sample preparation and the influence of the environment. Areas where further investigation is needed and problems concerning model robustness and model transfer are identified. PMID:26007736

  1. Influences of Social and Style Variables on Adult Usage of African American English Features

    ERIC Educational Resources Information Center

    Craig, Holly K.; Grogger, Jeffrey T.

    2012-01-01

    Purpose: In this study, the authors examined the influences of selected social (gender, employment status, educational achievement level) and style variables (race of examiner, interview topic) on the production of African American English (AAE) by adults. Method: Participants were 50 African American men and women, ages 20-30 years. The authors…

  2. Conditional screening for ultra-high dimensional covariates with survival outcomes

    PubMed Central

    Hong, Hyokyoung G.; Li, Yi

    2017-01-01

    Identifying important biomarkers that are predictive for cancer patients’ prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in detecting marginally weak while jointly important signals. We propose a new conditional screening method for survival outcome data by computing the marginal contribution of each biomarker given priorily known biological information. This is based on the premise that some biomarkers are known to be associated with disease outcomes a priori. Our method possesses sure screening properties and a vanishing false selection rate. The utility of the proposal is further confirmed with extensive simulation studies and analysis of a diffuse large B-cell lymphoma dataset. We are pleased to dedicate this work to Jack Kalbfleisch, who has made instrumental contributions to the development of modern methods of analyzing survival data. PMID:27933468

  3. The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases.

    PubMed

    Heidema, A Geert; Boer, Jolanda M A; Nagelkerke, Nico; Mariman, Edwin C M; van der A, Daphne L; Feskens, Edith J M

    2006-04-21

    Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.

  4. Durbin-Watson partial least-squares regression applied to MIR data on adulteration with edible oils of different origins.

    PubMed

    Jović, Ozren

    2016-12-15

    A novel method for quantitative prediction and variable-selection on spectroscopic data, called Durbin-Watson partial least-squares regression (dwPLS), is proposed in this paper. The idea is to inspect serial correlation in infrared data that is known to consist of highly correlated neighbouring variables. The method selects only those variables whose intervals have a lower Durbin-Watson statistic (dw) than a certain optimal cutoff. For each interval, dw is calculated on a vector of regression coefficients. Adulteration of cold-pressed linseed oil (L), a well-known nutrient beneficial to health, is studied in this work by its being mixed with cheaper oils: rapeseed oil (R), sesame oil (Se) and sunflower oil (Su). The samples for each botanical origin of oil vary with respect to producer, content and geographic origin. The results obtained indicate that MIR-ATR, combined with dwPLS could be implemented to quantitative determination of edible-oil adulteration. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. On the Accretion Rates of SW Sextantis Nova-like Variables

    NASA Astrophysics Data System (ADS)

    Ballouz, Ronald-Louis; Sion, Edward M.

    2009-06-01

    We present accretion rates for selected samples of nova-like variables having IUE archival spectra and distances uniformly determined using an infrared method by Knigge. A comparison with accretion rates derived independently with a multiparametric optimization modeling approach by Puebla et al. is carried out. The accretion rates of SW Sextantis nova-like systems are compared with the accretion rates of non-SW Sextantis systems in the Puebla et al. sample and in our sample, which was selected in the orbital period range of three to four and a half hours, with all systems having distances using the method of Knigge. Based upon the two independent modeling approaches, we find no significant difference between the accretion rates of SW Sextantis systems and non-SW Sextantis nova-like systems insofar as optically thick disk models are appropriate. We find little evidence to suggest that the SW Sex stars have higher accretion rates than other nova-like cataclysmic variables (CVs) above the period gap within the same range of orbital periods.

  6. Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context

    PubMed Central

    Martinez, Josue G.; Carroll, Raymond J.; Müller, Samuel; Sampson, Joshua N.; Chatterjee, Nilanjan

    2012-01-01

    When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso. PMID:22347720

  7. Adaptive Elastic Net for Generalized Methods of Moments.

    PubMed

    Caner, Mehmet; Zhang, Hao Helen

    2014-01-30

    Model selection and estimation are crucial parts of econometrics. This paper introduces a new technique that can simultaneously estimate and select the model in generalized method of moments (GMM) context. The GMM is particularly powerful for analyzing complex data sets such as longitudinal and panel data, and it has wide applications in econometrics. This paper extends the least squares based adaptive elastic net estimator of Zou and Zhang (2009) to nonlinear equation systems with endogenous variables. The extension is not trivial and involves a new proof technique due to estimators lack of closed form solutions. Compared to Bridge-GMM of Caner (2009), we allow for the number of parameters to diverge to infinity as well as collinearity among a large number of variables, also the redundant parameters set to zero via a data dependent technique. This method has the oracle property, meaning that we can estimate nonzero parameters with their standard limit and the redundant parameters are dropped from the equations simultaneously. Numerical examples are used to illustrate the performance of the new method.

  8. ALGORITHM TO REDUCE APPROXIMATION ERROR FROM THE COMPLEX-VARIABLE BOUNDARY-ELEMENT METHOD APPLIED TO SOIL FREEZING.

    USGS Publications Warehouse

    Hromadka, T.V.; Guymon, G.L.

    1985-01-01

    An algorithm is presented for the numerical solution of the Laplace equation boundary-value problem, which is assumed to apply to soil freezing or thawing. The Laplace equation is numerically approximated by the complex-variable boundary-element method. The algorithm aids in reducing integrated relative error by providing a true measure of modeling error along the solution domain boundary. This measure of error can be used to select locations for adding, removing, or relocating nodal points on the boundary or to provide bounds for the integrated relative error of unknown nodal variable values along the boundary.

  9. Inferring Weighted Directed Association Network from Multivariate Time Series with a Synthetic Method of Partial Symbolic Transfer Entropy Spectrum and Granger Causality

    PubMed Central

    Hu, Yanzhu; Ai, Xinbo

    2016-01-01

    Complex network methodology is very useful for complex system explorer. However, the relationships among variables in complex system are usually not clear. Therefore, inferring association networks among variables from their observed data has been a popular research topic. We propose a synthetic method, named small-shuffle partial symbolic transfer entropy spectrum (SSPSTES), for inferring association network from multivariate time series. The method synthesizes surrogate data, partial symbolic transfer entropy (PSTE) and Granger causality. A proper threshold selection is crucial for common correlation identification methods and it is not easy for users. The proposed method can not only identify the strong correlation without selecting a threshold but also has the ability of correlation quantification, direction identification and temporal relation identification. The method can be divided into three layers, i.e. data layer, model layer and network layer. In the model layer, the method identifies all the possible pair-wise correlation. In the network layer, we introduce a filter algorithm to remove the indirect weak correlation and retain strong correlation. Finally, we build a weighted adjacency matrix, the value of each entry representing the correlation level between pair-wise variables, and then get the weighted directed association network. Two numerical simulated data from linear system and nonlinear system are illustrated to show the steps and performance of the proposed approach. The ability of the proposed method is approved by an application finally. PMID:27832153

  10. Effects of environmental variables on invasive amphibian activity: Using model selection on quantiles for counts

    USGS Publications Warehouse

    Muller, Benjamin J.; Cade, Brian S.; Schwarzkoph, Lin

    2018-01-01

    Many different factors influence animal activity. Often, the value of an environmental variable may influence significantly the upper or lower tails of the activity distribution. For describing relationships with heterogeneous boundaries, quantile regressions predict a quantile of the conditional distribution of the dependent variable. A quantile count model extends linear quantile regression methods to discrete response variables, and is useful if activity is quantified by trapping, where there may be many tied (equal) values in the activity distribution, over a small range of discrete values. Additionally, different environmental variables in combination may have synergistic or antagonistic effects on activity, so examining their effects together, in a modeling framework, is a useful approach. Thus, model selection on quantile counts can be used to determine the relative importance of different variables in determining activity, across the entire distribution of capture results. We conducted model selection on quantile count models to describe the factors affecting activity (numbers of captures) of cane toads (Rhinella marina) in response to several environmental variables (humidity, temperature, rainfall, wind speed, and moon luminosity) over eleven months of trapping. Environmental effects on activity are understudied in this pest animal. In the dry season, model selection on quantile count models suggested that rainfall positively affected activity, especially near the lower tails of the activity distribution. In the wet season, wind speed limited activity near the maximum of the distribution, while minimum activity increased with minimum temperature. This statistical methodology allowed us to explore, in depth, how environmental factors influenced activity across the entire distribution, and is applicable to any survey or trapping regime, in which environmental variables affect activity.

  11. Counter-propagation network with variable degree variable step size LMS for single switch typing recognition.

    PubMed

    Yang, Cheng-Huei; Luo, Ching-Hsing; Yang, Cheng-Hong; Chuang, Li-Yeh

    2004-01-01

    Morse code is now being harnessed for use in rehabilitation applications of augmentative-alternative communication and assistive technology, including mobility, environmental control and adapted worksite access. In this paper, Morse code is selected as a communication adaptive device for disabled persons who suffer from muscle atrophy, cerebral palsy or other severe handicaps. A stable typing rate is strictly required for Morse code to be effective as a communication tool. This restriction is a major hindrance. Therefore, a switch adaptive automatic recognition method with a high recognition rate is needed. The proposed system combines counter-propagation networks with a variable degree variable step size LMS algorithm. It is divided into five stages: space recognition, tone recognition, learning process, adaptive processing, and character recognition. Statistical analyses demonstrated that the proposed method elicited a better recognition rate in comparison to alternative methods in the literature.

  12. [Effect of stock abundance and environmental factors on the recruitment success of small yellow croaker in the East China Sea].

    PubMed

    Liu, Zun-lei; Yuan, Xing-wei; Yang, Lin-lin; Yan, Li-ping; Zhang, Hui; Cheng, Jia-hua

    2015-02-01

    Multiple hypotheses are available to explain recruitment rate. Model selection methods can be used to identify the best model that supports a particular hypothesis. However, using a single model for estimating recruitment success is often inadequate for overexploited population because of high model uncertainty. In this study, stock-recruitment data of small yellow croaker in the East China Sea collected from fishery dependent and independent surveys between 1992 and 2012 were used to examine density-dependent effects on recruitment success. Model selection methods based on frequentist (AIC, maximum adjusted R2 and P-values) and Bayesian (Bayesian model averaging, BMA) methods were applied to identify the relationship between recruitment and environment conditions. Interannual variability of the East China Sea environment was indicated by sea surface temperature ( SST) , meridional wind stress (MWS), zonal wind stress (ZWS), sea surface pressure (SPP) and runoff of Changjiang River ( RCR). Mean absolute error, mean squared predictive error and continuous ranked probability score were calculated to evaluate the predictive performance of recruitment success. The results showed that models structures were not consistent based on three kinds of model selection methods, predictive variables of models were spawning abundance and MWS by AIC, spawning abundance by P-values, spawning abundance, MWS and RCR by maximum adjusted R2. The recruitment success decreased linearly with stock abundance (P < 0.01), suggesting overcompensation effect in the recruitment success might be due to cannibalism or food competition. Meridional wind intensity showed marginally significant and positive effects on the recruitment success (P = 0.06), while runoff of Changjiang River showed a marginally negative effect (P = 0.07). Based on mean absolute error and continuous ranked probability score, predictive error associated with models obtained from BMA was the smallest amongst different approaches, while that from models selected based on the P-value of the independent variables was the highest. However, mean squared predictive error from models selected based on the maximum adjusted R2 was highest. We found that BMA method could improve the prediction of recruitment success, derive more accurate prediction interval and quantitatively evaluate model uncertainty.

  13. [Study of near infrared spectral preprocessing and wavelength selection methods for endometrial cancer tissue].

    PubMed

    Zhao, Li-Ting; Xiang, Yu-Hong; Dai, Yin-Mei; Zhang, Zhuo-Yong

    2010-04-01

    Near infrared spectroscopy was applied to measure the tissue slice of endometrial tissues for collecting the spectra. A total of 154 spectra were obtained from 154 samples. The number of normal, hyperplasia, and malignant samples was 36, 60, and 58, respectively. Original near infrared spectra are composed of many variables, for example, interference information including instrument errors and physical effects such as particle size and light scatter. In order to reduce these influences, original spectra data should be performed with different spectral preprocessing methods to compress variables and extract useful information. So the methods of spectral preprocessing and wavelength selection have played an important role in near infrared spectroscopy technique. In the present paper the raw spectra were processed using various preprocessing methods including first derivative, multiplication scatter correction, Savitzky-Golay first derivative algorithm, standard normal variate, smoothing, and moving-window median. Standard deviation was used to select the optimal spectral region of 4 000-6 000 cm(-1). Then principal component analysis was used for classification. Principal component analysis results showed that three types of samples could be discriminated completely and the accuracy almost achieved 100%. This study demonstrated that near infrared spectroscopy technology and chemometrics method could be a fast, efficient, and novel means to diagnose cancer. The proposed methods would be a promising and significant diagnosis technique of early stage cancer.

  14. An Integrated Optimization Design Method Based on Surrogate Modeling Applied to Diverging Duct Design

    NASA Astrophysics Data System (ADS)

    Hanan, Lu; Qiushi, Li; Shaobin, Li

    2016-12-01

    This paper presents an integrated optimization design method in which uniform design, response surface methodology and genetic algorithm are used in combination. In detail, uniform design is used to select the experimental sampling points in the experimental domain and the system performance is evaluated by means of computational fluid dynamics to construct a database. After that, response surface methodology is employed to generate a surrogate mathematical model relating the optimization objective and the design variables. Subsequently, genetic algorithm is adopted and applied to the surrogate model to acquire the optimal solution in the case of satisfying some constraints. The method has been applied to the optimization design of an axisymmetric diverging duct, dealing with three design variables including one qualitative variable and two quantitative variables. The method of modeling and optimization design performs well in improving the duct aerodynamic performance and can be also applied to wider fields of mechanical design and seen as a useful tool for engineering designers, by reducing the design time and computation consumption.

  15. Quantifying Uncertainties from Presence Data Sampling Methods for Species Distribution Modeling: Focused on Vegetation.

    NASA Astrophysics Data System (ADS)

    Sung, S.; Kim, H. G.; Lee, D. K.; Park, J. H.; Mo, Y.; Kil, S.; Park, C.

    2016-12-01

    The impact of climate change has been observed throughout the globe. The ecosystem experiences rapid changes such as vegetation shift, species extinction. In these context, Species Distribution Model (SDM) is one of the popular method to project impact of climate change on the ecosystem. SDM basically based on the niche of certain species with means to run SDM present point data is essential to find biological niche of species. To run SDM for plants, there are certain considerations on the characteristics of vegetation. Normally, to make vegetation data in large area, remote sensing techniques are used. In other words, the exact point of presence data has high uncertainties as we select presence data set from polygons and raster dataset. Thus, sampling methods for modeling vegetation presence data should be carefully selected. In this study, we used three different sampling methods for selection of presence data of vegetation: Random sampling, Stratified sampling and Site index based sampling. We used one of the R package BIOMOD2 to access uncertainty from modeling. At the same time, we included BioCLIM variables and other environmental variables as input data. As a result of this study, despite of differences among the 10 SDMs, the sampling methods showed differences in ROC values, random sampling methods showed the lowest ROC value while site index based sampling methods showed the highest ROC value. As a result of this study the uncertainties from presence data sampling methods and SDM can be quantified.

  16. Recursive feature selection with significant variables of support vectors.

    PubMed

    Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh

    2012-01-01

    The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  17. Determinants of Students' Academic Performance in Four Selected Accounting Courses at University of Zimbabwe

    ERIC Educational Resources Information Center

    Nyikahadzoi, Loveness; Matamande, Wilson; Taderera, Ever; Mandimika, Elinah

    2013-01-01

    The study seeks to establish scientific evidence of the factors affecting academic performance for first year accounting students using four selected courses at the University of Zimbabwe. It uses Ordinary Least Squares method to analyse the influence of personal and family background on performance. The findings show that variables age gender,…

  18. A Review of Validation Research on Psychological Variables Used in Hiring Police Officers.

    ERIC Educational Resources Information Center

    Malouff, John M.; Schutte Nicola S.

    This paper reviews the methods and findings of published research on the validity of police selection procedures. As a preface to the review, the typical police officer selection process is briefly described. Several common methodological deficiencies of the validation research are identified and discussed in detail: (1) use of past-selection…

  19. Bayesian Techniques for Comparing Time-dependent GRMHD Simulations to Variable Event Horizon Telescope Observations

    NASA Astrophysics Data System (ADS)

    Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios

    2016-12-01

    The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.

  20. BAYESIAN TECHNIQUES FOR COMPARING TIME-DEPENDENT GRMHD SIMULATIONS TO VARIABLE EVENT HORIZON TELESCOPE OBSERVATIONS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan

    2016-12-01

    The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore themore » robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.« less

  1. Predictors of Start of Different Antidepressants in Patient Charts among Patients with Depression

    PubMed Central

    Kim, Hyungjin Myra; Zivin, Kara; Choe, Hae Mi; Stano, Clare M.; Ganoczy, Dara; Walters, Heather; Valenstein, Marcia

    2016-01-01

    Background In usual psychiatric care, antidepressant treatments are selected based on physician and patient preferences rather than being randomly allocated, resulting in spurious associations between these treatments and outcome studies. Objectives To identify factors recorded in electronic medical chart progress notes predictive of antidepressant selection among patients who had received a depression diagnosis. Methods This retrospective study sample consisted of 556 randomly selected Veterans Health Administration (VHA) patients diagnosed with depression from April 1, 1999 to September 30, 2004, stratified by the antidepressant agent, geographic region, gender, and year of depression cohort entry. Predictors were obtained from administrative data, and additional variables were abstracted from electronic medical chart notes in the year prior to the start of the antidepressant in five categories: clinical symptoms and diagnoses, substance use, life stressors, behavioral/ideation measures (e.g., suicide attempts), and treatments received. Multinomial logistic regression analysis was used to assess the predictors associated with different antidepressant prescribing, and adjusted relative risk ratios (RRR) are reported. Results Of the administrative data-based variables, gender, age, illicit drug abuse or dependence, and number of psychiatric medications in prior year were significantly associated with antidepressant selection. After adjusting for administrative data-based variables, sleep problems (RRR = 2.47) or marital issues (RRR = 2.64) identified in the charts were significantly associated with prescribing mirtazapine rather than sertraline; however, no other chart-based variables showed a significant association or an association with a large magnitude. Conclusion Some chart data-based variables were predictive of antidepressant selection, but we neither found many nor found them highly predictive of antidepressant selection in patients treated for depression. PMID:25943003

  2. Spectroscopic follow-up of variability-selected active galactic nuclei in the Chandra Deep Field South

    NASA Astrophysics Data System (ADS)

    Boutsia, K.; Leibundgut, B.; Trevese, D.; Vagnetti, F.

    2009-04-01

    Context: Supermassive black holes with masses of 10^5-109 M⊙ are believed to inhabit most, if not all, nuclear regions of galaxies, and both observational evidence and theoretical models suggest a scenario where galaxy and black hole evolution are tightly related. Luminous AGNs are usually selected by their non-stellar colours or their X-ray emission. Colour selection cannot be used to select low-luminosity AGNs, since their emission is dominated by the host galaxy. Objects with low X-ray to optical ratio escape even the deepest X-ray surveys performed so far. In a previous study we presented a sample of candidates selected through optical variability in the Chandra Deep Field South, where repeated optical observations were performed in the framework of the STRESS supernova survey. Aims: The analysis is devoted to breaking down the sample in AGNs, starburst galaxies, and low-ionisation narrow-emission line objects, to providing new information about the possible dependence of the emission mechanisms on nuclear luminosity and black-hole mass, and eventually studying the evolution in cosmic time of the different populations. Methods: We obtained new optical spectroscopy for a sample of variability selected candidates with the ESO NTT telescope. We analysed the new spectra, together with those existing in the literature and studied the distribution of the objects in U-B and B-V colours, optical and X-ray luminosity, and variability amplitude. Results: A large fraction (17/27) of the observed candidates are broad-line luminous AGNs, confirming the efficiency of variability in detecting quasars. We detect: i) extended objects which would have escaped the colour selection and ii) objects of very low X-ray to optical ratio, in a few cases without any X-ray detection at all. Several objects resulted to be narrow-emission line galaxies where variability indicates nuclear activity, while no emission lines were detected in others. Some of these galaxies have variability and X-ray to optical ratio close to active galactic nuclei, while others have much lower variability and X-ray to optical ratio. This result can be explained by the dilution of the nuclear light due to the host galaxy. Conclusions: Our results demonstrate the effectiveness of supernova search programmes to detect large samples of low-luminosity AGNs. A sizable fraction of the AGN in our variability sample had escaped X-ray detection (5/47) and/or colour selection (9/48). Spectroscopic follow-up to fainter flux limits is strongly encouraged. Based on observations collected at the European Southern Observatory, Chile, 080.B-0187(A).

  3. Simplified methods of evaluating colonies for levels of Varroa Sensitive Hygiene (VSH)

    USDA-ARS?s Scientific Manuscript database

    Varroa sensitive hygiene (VSH) is a trait of honey bees, Apis mellifera, that supports resistance to varroa mites, Varroa destructor. Components of VSH were evaluated to identify simple methods for selection of the trait. Varroa mite population growth was measured in colonies with variable levels of...

  4. Conditional maximum-entropy method for selecting prior distributions in Bayesian statistics

    NASA Astrophysics Data System (ADS)

    Abe, Sumiyoshi

    2014-11-01

    The conditional maximum-entropy method (abbreviated here as C-MaxEnt) is formulated for selecting prior probability distributions in Bayesian statistics for parameter estimation. This method is inspired by a statistical-mechanical approach to systems governed by dynamics with largely separated time scales and is based on three key concepts: conjugate pairs of variables, dimensionless integration measures with coarse-graining factors and partial maximization of the joint entropy. The method enables one to calculate a prior purely from a likelihood in a simple way. It is shown, in particular, how it not only yields Jeffreys's rules but also reveals new structures hidden behind them.

  5. Multivariate Bayesian variable selection exploiting dependence structure among outcomes: Application to air pollution effects on DNA methylation.

    PubMed

    Lee, Kyu Ha; Tadesse, Mahlet G; Baccarelli, Andrea A; Schwartz, Joel; Coull, Brent A

    2017-03-01

    The analysis of multiple outcomes is becoming increasingly common in modern biomedical studies. It is well-known that joint statistical models for multiple outcomes are more flexible and more powerful than fitting a separate model for each outcome; they yield more powerful tests of exposure or treatment effects by taking into account the dependence among outcomes and pooling evidence across outcomes. It is, however, unlikely that all outcomes are related to the same subset of covariates. Therefore, there is interest in identifying exposures or treatments associated with particular outcomes, which we term outcome-specific variable selection. In this work, we propose a variable selection approach for multivariate normal responses that incorporates not only information on the mean model, but also information on the variance-covariance structure of the outcomes. The approach effectively leverages evidence from all correlated outcomes to estimate the effect of a particular covariate on a given outcome. To implement this strategy, we develop a Bayesian method that builds a multivariate prior for the variable selection indicators based on the variance-covariance of the outcomes. We show via simulation that the proposed variable selection strategy can boost power to detect subtle effects without increasing the probability of false discoveries. We apply the approach to the Normative Aging Study (NAS) epigenetic data and identify a subset of five genes in the asthma pathway for which gene-specific DNA methylations are associated with exposures to either black carbon, a marker of traffic pollution, or sulfate, a marker of particles generated by power plants. © 2016, The International Biometric Society.

  6. Interval ridge regression (iRR) as a fast and robust method for quantitative prediction and variable selection applied to edible oil adulteration.

    PubMed

    Jović, Ozren; Smrečki, Neven; Popović, Zora

    2016-04-01

    A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for p<0.05). Also, iRR can be a fast alternative to iPLS, especially in case of unknown degree of complexity of analyzed system, i.e. if upper limit of number of latent variables is not easily estimated for iPLS. Adulteration of hempseed (H) oil, a well known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as soybean (So) oil, rapeseed (R) oil and sunflower (Su) oil. Binary mixture sets of hempseed oil with these three oils (HSo, HR and HSu) and a ternary mixture set of H oil, R oil and Su oil (HRSu) were considered. The obtained accuracy indicates that using iRR on FTIR and UV-vis data, each particular oil can be very successfully quantified (in all 8 cases RMSEP<1.2%). This means that FTIR-ATR coupled with iRR can very rapidly and effectively determine the level of adulteration in the adulterated hempseed oil (R(2)>0.99). Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Genetic-evolution-based optimization methods for engineering design

    NASA Technical Reports Server (NTRS)

    Rao, S. S.; Pan, T. S.; Dhingra, A. K.; Venkayya, V. B.; Kumar, V.

    1990-01-01

    This paper presents the applicability of a biological model, based on genetic evolution, for engineering design optimization. Algorithms embodying the ideas of reproduction, crossover, and mutation are developed and applied to solve different types of structural optimization problems. Both continuous and discrete variable optimization problems are solved. A two-bay truss for maximum fundamental frequency is considered to demonstrate the continuous variable case. The selection of locations of actuators in an actively controlled structure, for minimum energy dissipation, is considered to illustrate the discrete variable case.

  8. Analyses of the most influential factors for vibration monitoring of planetary power transmissions in pellet mills by adaptive neuro-fuzzy technique

    NASA Astrophysics Data System (ADS)

    Milovančević, Miloš; Nikolić, Vlastimir; Anđelković, Boban

    2017-01-01

    Vibration-based structural health monitoring is widely recognized as an attractive strategy for early damage detection in civil structures. Vibration monitoring and prediction is important for any system since it can save many unpredictable behaviors of the system. If the vibration monitoring is properly managed, that can ensure economic and safe operations. Potentials for further improvement of vibration monitoring lie in the improvement of current control strategies. One of the options is the introduction of model predictive control. Multistep ahead predictive models of vibration are a starting point for creating a successful model predictive strategy. For the purpose of this article, predictive models of are created for vibration monitoring of planetary power transmissions in pellet mills. The models were developed using the novel method based on ANFIS (adaptive neuro fuzzy inference system). The aim of this study is to investigate the potential of ANFIS for selecting the most relevant variables for predictive models of vibration monitoring of pellet mills power transmission. The vibration data are collected by PIC (Programmable Interface Controller) microcontrollers. The goal of the predictive vibration monitoring of planetary power transmissions in pellet mills is to indicate deterioration in the vibration of the power transmissions before the actual failure occurs. The ANFIS process for variable selection was implemented in order to detect the predominant variables affecting the prediction of vibration monitoring. It was also used to select the minimal input subset of variables from the initial set of input variables - current and lagged variables (up to 11 steps) of vibration. The obtained results could be used for simplification of predictive methods so as to avoid multiple input variables. It was preferable to used models with less inputs because of overfitting between training and testing data. While the obtained results are promising, further work is required in order to get results that could be directly applied in practice.

  9. Fast periodic stimulation (FPS): a highly effective approach in fMRI brain mapping.

    PubMed

    Gao, Xiaoqing; Gentile, Francesco; Rossion, Bruno

    2018-06-01

    Defining the neural basis of perceptual categorization in a rapidly changing natural environment with low-temporal resolution methods such as functional magnetic resonance imaging (fMRI) is challenging. Here, we present a novel fast periodic stimulation (FPS)-fMRI approach to define face-selective brain regions with natural images. Human observers are presented with a dynamic stream of widely variable natural object images alternating at a fast rate (6 images/s). Every 9 s, a short burst of variable face images contrasting with object images in pairs induces an objective face-selective neural response at 0.111 Hz. A model-free Fourier analysis achieves a twofold increase in signal-to-noise ratio compared to a conventional block-design approach with identical stimuli and scanning duration, allowing to derive a comprehensive map of face-selective areas in the ventral occipito-temporal cortex, including the anterior temporal lobe (ATL), in all individual brains. Critically, periodicity of the desired category contrast and random variability among widely diverse images effectively eliminates the contribution of low-level visual cues, and lead to the highest values (80-90%) of test-retest reliability in the spatial activation map yet reported in imaging higher level visual functions. FPS-fMRI opens a new avenue for understanding brain function with low-temporal resolution methods.

  10. Weighted partial least squares based on the error and variance of the recovery rate in calibration set.

    PubMed

    Yu, Shaohui; Xiao, Xue; Ding, Hong; Xu, Ge; Li, Haixia; Liu, Jing

    2017-08-05

    The quantitative analysis is very difficult for the emission-excitation fluorescence spectroscopy of multi-component mixtures whose fluorescence peaks are serious overlapping. As an effective method for the quantitative analysis, partial least squares can extract the latent variables from both the independent variables and the dependent variables, so it can model for multiple correlations between variables. However, there are some factors that usually affect the prediction results of partial least squares, such as the noise, the distribution and amount of the samples in calibration set etc. This work focuses on the problems in the calibration set that are mentioned above. Firstly, the outliers in the calibration set are removed by leave-one-out cross-validation. Then, according to two different prediction requirements, the EWPLS method and the VWPLS method are proposed. The independent variables and dependent variables are weighted in the EWPLS method by the maximum error of the recovery rate and weighted in the VWPLS method by the maximum variance of the recovery rate. Three organic matters with serious overlapping excitation-emission fluorescence spectroscopy are selected for the experiments. The step adjustment parameter, the iteration number and the sample amount in the calibration set are discussed. The results show the EWPLS method and the VWPLS method are superior to the PLS method especially for the case of small samples in the calibration set. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Variable-intercept panel model for deformation zoning of a super-high arch dam.

    PubMed

    Shi, Zhongwen; Gu, Chongshi; Qin, Dong

    2016-01-01

    This study determines dam deformation similarity indexes based on an analysis of deformation zoning features and panel data clustering theory, with comprehensive consideration to the actual deformation law of super-high arch dams and the spatial-temporal features of dam deformation. Measurement methods of these indexes are studied. Based on the established deformation similarity criteria, the principle used to determine the number of dam deformation zones is constructed through entropy weight method. This study proposes the deformation zoning method for super-high arch dams and the implementation steps, analyzes the effect of special influencing factors of different dam zones on the deformation, introduces dummy variables that represent the special effect of dam deformation, and establishes a variable-intercept panel model for deformation zoning of super-high arch dams. Based on different patterns of the special effect in the variable-intercept panel model, two panel analysis models were established to monitor fixed and random effects of dam deformation. Hausman test method of model selection and model effectiveness assessment method are discussed. Finally, the effectiveness of established models is verified through a case study.

  12. Optimization Of Mean-Semivariance-Skewness Portfolio Selection Model In Fuzzy Random Environment

    NASA Astrophysics Data System (ADS)

    Chatterjee, Amitava; Bhattacharyya, Rupak; Mukherjee, Supratim; Kar, Samarjit

    2010-10-01

    The purpose of the paper is to construct a mean-semivariance-skewness portfolio selection model in fuzzy random environment. The objective is to maximize the skewness with predefined maximum risk tolerance and minimum expected return. Here the security returns in the objectives and constraints are assumed to be fuzzy random variables in nature and then the vagueness of the fuzzy random variables in the objectives and constraints are transformed into fuzzy variables which are similar to trapezoidal numbers. The newly formed fuzzy model is then converted into a deterministic optimization model. The feasibility and effectiveness of the proposed method is verified by numerical example extracted from Bombay Stock Exchange (BSE). The exact parameters of fuzzy membership function and probability density function are obtained through fuzzy random simulating the past dates.

  13. Developing deterioration models for Wyoming bridges.

    DOT National Transportation Integrated Search

    2016-05-01

    Deterioration models for the Wyoming Bridge Inventory were developed using both stochastic and deterministic models. : The selection of explanatory variables is investigated and a new method using LASSO regression to eliminate human bias : in explana...

  14. A Discriminant Distance Based Composite Vector Selection Method for Odor Classification

    PubMed Central

    Choi, Sang-Il; Jeong, Gu-Min

    2014-01-01

    We present a composite vector selection method for an effective electronic nose system that performs well even in noisy environments. Each composite vector generated from a electronic nose data sample is evaluated by computing the discriminant distance. By quantitatively measuring the amount of discriminative information in each composite vector, composite vectors containing informative variables can be distinguished and the final composite features for odor classification are extracted using the selected composite vectors. Using the only informative composite vectors can be also helpful to extract better composite features instead of using all the generated composite vectors. Experimental results with different volatile organic compound data show that the proposed system has good classification performance even in a noisy environment compared to other methods. PMID:24747735

  15. DISCOVERING THE MISSING 2.2 < z < 3 QUASARS BY COMBINING OPTICAL VARIABILITY AND OPTICAL/NEAR-INFRARED COLORS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu Xuebing; Wang Ran; Bian Fuyan

    2011-09-15

    The identification of quasars in the redshift range 2.2 < z < 3 is known to be very inefficient because the optical colors of such quasars are indistinguishable from those of stars. Recent studies have proposed using optical variability or near-infrared (near-IR) colors to improve the identification of the missing quasars in this redshift range. Here we present a case study combining both methods. We select a sample of 70 quasar candidates from variables in Sloan Digital Sky Survey (SDSS) Stripe 82, which are non-ultraviolet excess sources and have UKIDSS near-IR public data. They are clearly separated into two partsmore » on the Y - K/g - z color-color diagram, and 59 of them meet or lie close to a newly proposed Y - K/g - z selection criterion for z < 4 quasars. Of these 59 sources, 44 were previously identified as quasars in SDSS DR7, and 35 of them are quasars at 2.2 < z < 3. We present spectroscopic observations of 14 of 15 remaining quasar candidates using the Bok 2.3 m telescope and the MMT 6.5 m telescope, and successfully identify all of them as new quasars at z = 2.36-2.88. We also apply this method to a sample of 643 variable quasar candidates with SDSS-UKIDSS nine-band photometric data selected from 1875 new quasar candidates in SDSS Stripe 82 given by Butler and Bloom based on the time-series selections, and find that 188 of them are probably new quasars with photometric redshifts at 2.2 < z < 3. Our results indicate that the combination of optical variability and optical/near-IR colors is probably the most efficient way to find 2.2 < z < 3 quasars and is very helpful for constructing a complete quasar sample. We discuss its implications for ongoing and upcoming large optical and near-IR sky surveys.« less

  16. Variability-selected active galactic nuclei from supernova search in the Chandra deep field south

    NASA Astrophysics Data System (ADS)

    Trevese, D.; Boutsia, K.; Vagnetti, F.; Cappellaro, E.; Puccetti, S.

    2008-09-01

    Context: Variability is a property shared by virtually all active galactic nuclei (AGNs), and was adopted as a criterion for their selection using data from multi epoch surveys. Low Luminosity AGNs (LLAGNs) are contaminated by the light of their host galaxies, and cannot therefore be detected by the usual colour techniques. For this reason, their evolution in cosmic time is poorly known. Consistency with the evolution derived from X-ray detected samples has not been clearly established so far, also because the low luminosity population consists of a mixture of different object types. LLAGNs can be detected by the nuclear optical variability of extended objects. Aims: Several variability surveys have been, or are being, conducted for the detection of supernovae (SNe). We propose to re-analyse these SNe data using a variability criterion optimised for AGN detection, to select a new AGN sample and study its properties. Methods: We analysed images acquired with the wide field imager at the 2.2 m ESO/MPI telescope, in the framework of the STRESS supernova survey. We selected the AXAF field centred on the Chandra Deep Field South where, besides the deep X-ray survey, various optical data exist, originating in the EIS and COMBO-17 photometric surveys and the spectroscopic database of GOODS. Results: We obtained a catalogue of 132 variable AGN candidates. Several of the candidates are X-ray sources. We compare our results with an HST variability study of X-ray and IR detected AGNs, finding consistent results. The relatively high fraction of confirmed AGNs in our sample (60%) allowed us to extract a list of reliable AGN candidates for spectroscopic follow-up observations. Table [see full text] is only available in electronic form at http://www.aanda.org

  17. Comparison of statistical tests for association between rare variants and binary traits.

    PubMed

    Bacanu, Silviu-Alin; Nelson, Matthew R; Whittaker, John C

    2012-01-01

    Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait's variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.

  18. A method for the dynamic management of genetic variability in dairy cattle

    PubMed Central

    Colleau, Jean-Jacques; Moureaux, Sophie; Briend, Michèle; Bechu, Jérôme

    2004-01-01

    According to the general approach developed in this paper, dynamic management of genetic variability in selected populations of dairy cattle is carried out for three simultaneous purposes: procreation of young bulls to be further progeny-tested, use of service bulls already selected and approval of recently progeny-tested bulls for use. At each step, the objective is to minimize the average pairwise relationship coefficient in the future population born from programmed matings and the existing population. As a common constraint, the average estimated breeding value of the new population, for a selection goal including many important traits, is set to a desired value. For the procreation of young bulls, breeding costs are additionally constrained. Optimization is fully analytical and directly considers matings. Corresponding algorithms are presented in detail. The efficiency of these procedures was tested on the current Norman population. Comparisons between optimized and real matings, clearly showed that optimization would have saved substantial genetic variability without reducing short-term genetic gains. PMID:15231230

  19. Detection of genomic loci associated with environmental variables using generalized linear mixed models.

    PubMed

    Lobréaux, Stéphane; Melodelima, Christelle

    2015-02-01

    We tested the use of Generalized Linear Mixed Models to detect associations between genetic loci and environmental variables, taking into account the population structure of sampled individuals. We used a simulation approach to generate datasets under demographically and selectively explicit models. These datasets were used to analyze and optimize GLMM capacity to detect the association between markers and selective coefficients as environmental data in terms of false and true positive rates. Different sampling strategies were tested, maximizing the number of populations sampled, sites sampled per population, or individuals sampled per site, and the effect of different selective intensities on the efficiency of the method was determined. Finally, we apply these models to an Arabidopsis thaliana SNP dataset from different accessions, looking for loci associated with spring minimal temperature. We identified 25 regions that exhibit unusual correlations with the climatic variable and contain genes with functions related to temperature stress. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random.

    PubMed

    Pritikin, Joshua N; Brick, Timothy R; Neale, Michael C

    2018-04-01

    A novel method for the maximum likelihood estimation of structural equation models (SEM) with both ordinal and continuous indicators is introduced using a flexible multivariate probit model for the ordinal indicators. A full information approach ensures unbiased estimates for data missing at random. Exceeding the capability of prior methods, up to 13 ordinal variables can be included before integration time increases beyond 1 s per row. The method relies on the axiom of conditional probability to split apart the distribution of continuous and ordinal variables. Due to the symmetry of the axiom, two similar methods are available. A simulation study provides evidence that the two similar approaches offer equal accuracy. A further simulation is used to develop a heuristic to automatically select the most computationally efficient approach. Joint ordinal continuous SEM is implemented in OpenMx, free and open-source software.

  1. Quantifying Effects of Pharmacological Blockers of Cardiac Autonomous Control Using Variability Parameters.

    PubMed

    Miyabara, Renata; Berg, Karsten; Kraemer, Jan F; Baltatu, Ovidiu C; Wessel, Niels; Campos, Luciana A

    2017-01-01

    Objective: The aim of this study was to identify the most sensitive heart rate and blood pressure variability (HRV and BPV) parameters from a given set of well-known methods for the quantification of cardiovascular autonomic function after several autonomic blockades. Methods: Cardiovascular sympathetic and parasympathetic functions were studied in freely moving rats following peripheral muscarinic (methylatropine), β1-adrenergic (metoprolol), muscarinic + β1-adrenergic, α1-adrenergic (prazosin), and ganglionic (hexamethonium) blockades. Time domain, frequency domain and symbolic dynamics measures for each of HRV and BPV were classified through paired Wilcoxon test for all autonomic drugs separately. In order to select those variables that have a high relevance to, and stable influence on our target measurements (HRV, BPV) we used Fisher's Method to combine the p -value of multiple tests. Results: This analysis led to the following best set of cardiovascular variability parameters: The mean normal beat-to-beat-interval/value (HRV/BPV: meanNN), the coefficient of variation (cvNN = standard deviation over meanNN) and the root mean square differences of successive (RMSSD) of the time domain analysis. In frequency domain analysis the very-low-frequency (VLF) component was selected. From symbolic dynamics Shannon entropy of the word distribution (FWSHANNON) as well as POLVAR3, the non-linear parameter to detect intermittently decreased variability, showed the best ability to discriminate between the different autonomic blockades. Conclusion: Throughout a complex comparative analysis of HRV and BPV measures altered by a set of autonomic drugs, we identified the most sensitive set of informative cardiovascular variability indexes able to pick up the modifications imposed by the autonomic challenges. These indexes may help to increase our understanding of cardiovascular sympathetic and parasympathetic functions in translational studies of experimental diseases.

  2. A comparison of regression methods for model selection in individual-based landscape genetic analysis.

    PubMed

    Shirk, Andrew J; Landguth, Erin L; Cushman, Samuel A

    2018-01-01

    Anthropogenic migration barriers fragment many populations and limit the ability of species to respond to climate-induced biome shifts. Conservation actions designed to conserve habitat connectivity and mitigate barriers are needed to unite fragmented populations into larger, more viable metapopulations, and to allow species to track their climate envelope over time. Landscape genetic analysis provides an empirical means to infer landscape factors influencing gene flow and thereby inform such conservation actions. However, there are currently many methods available for model selection in landscape genetics, and considerable uncertainty as to which provide the greatest accuracy in identifying the true landscape model influencing gene flow among competing alternative hypotheses. In this study, we used population genetic simulations to evaluate the performance of seven regression-based model selection methods on a broad array of landscapes that varied by the number and type of variables contributing to resistance, the magnitude and cohesion of resistance, as well as the functional relationship between variables and resistance. We also assessed the effect of transformations designed to linearize the relationship between genetic and landscape distances. We found that linear mixed effects models had the highest accuracy in every way we evaluated model performance; however, other methods also performed well in many circumstances, particularly when landscape resistance was high and the correlation among competing hypotheses was limited. Our results provide guidance for which regression-based model selection methods provide the most accurate inferences in landscape genetic analysis and thereby best inform connectivity conservation actions. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  3. Genetic Algorithm for Initial Orbit Determination with Too Short Arc

    NASA Astrophysics Data System (ADS)

    Li, Xin-ran; Wang, Xin

    2017-01-01

    A huge quantity of too-short-arc (TSA) observational data have been obtained in sky surveys of space objects. However, reasonable results for the TSAs can hardly be obtained with the classical methods of initial orbit determination (IOD). In this paper, the IOD is reduced to a two-stage hierarchical optimization problem containing three variables for each stage. Using the genetic algorithm, a new method of the IOD for TSAs is established, through the selections of the optimized variables and the corresponding genetic operators for specific problems. Numerical experiments based on the real measurements show that the method can provide valid initial values for the follow-up work.

  4. Genetic Algorithm for Initial Orbit Determination with Too Short Arc

    NASA Astrophysics Data System (ADS)

    Li, X. R.; Wang, X.

    2016-01-01

    The sky surveys of space objects have obtained a huge quantity of too-short-arc (TSA) observation data. However, the classical method of initial orbit determination (IOD) can hardly get reasonable results for the TSAs. The IOD is reduced to a two-stage hierarchical optimization problem containing three variables for each stage. Using the genetic algorithm, a new method of the IOD for TSAs is established, through the selection of optimizing variables as well as the corresponding genetic operator for specific problems. Numerical experiments based on the real measurements show that the method can provide valid initial values for the follow-up work.

  5. Socio-economic variables influencing mean age at marriage in Karnataka and Kerala.

    PubMed

    Prakasam, C P; Upadhyay, R B

    1985-01-01

    "In this paper an attempt was made to study the influence of certain socio-economic variables on the male and the female age at marriage in Karnataka and Kerala [India] for the year 1971. Step-wise regression method has been used to select the predictor variables influencing mean age at marriage. The results reveal that percent female literate...and percent female in labour force...are found to influence female mean age at marriage in Kerala, while the variables for Karnataka were percent female literate..., percent male literate..., and percent urban male population...." excerpt

  6. Phenotypic variability and selection of lipid-producing microalgae in a microfluidic centrifuge

    NASA Astrophysics Data System (ADS)

    Estévez-Torres, André.; Mestler, Troy; Austin, Robert H.

    2010-03-01

    Isogenic cells are known to display various expression levels that may result in different phenotypes within a population. Here we focus on the phenotypic variability of a species of unicellular algae that produce neutral lipids. Lipid-producing algae are one of the most promising sources of biofuel. We have implemented a simple microfluidic method to assess lipid-production variability in a population of algae that relays on density differences. We will discuss the reasons of this variability and address the promising avenues of this technique for directing the evolution of algae towards high lipid productivity.

  7. Predicted long-term effects of group selection on species composition and stand structure in northern hardwood forests

    Treesearch

    Corey R. Halpin; Craig G. Lorimer; Jacob J. Hanson; Brian J. Palik

    2017-01-01

    The group selection method can potentially increase the proportion of shade-intolerant and midtolerant tree species in forests dominated by shade-tolerant species, but previous results have been variable, and concerns have been raised about possible effects on forest fragmentation and forest structure. Limited evidence is available on these issues for forests managed...

  8. A Design for a Multi-Use Object Editor with Connections

    DTIC Science & Technology

    1991-10-01

    Variables of Class Select 14 3.6.2 Methods of Class Select 14 3.6.3 Application States 14 3.7 Class Clipboard 15 3.8 Class Crestore 15 3.9 Conclusions 15 4...this is not directly related to cutting and past- ing between applications. 3.8 CLASS CRESTORE Class crestore holds a connection until all the

  9. Determination of variables in the prediction of strontium distribution coefficients for selected sediments

    USGS Publications Warehouse

    Pace, M.N.; Rosentreter, J.J.; Bartholomay, R.C.

    2001-01-01

    Idaho State University and the US Geological Survey, in cooperation with the US Department of Energy, conducted a study to determine and evaluate strontium distribution coefficients (Kds) of subsurface materials at the Idaho National Engineering and Environmental Laboratory (INEEL). The Kds were determined to aid in assessing the variability of strontium Kds and their effects on chemical transport of strontium-90 in the Snake River Plain aquifer system. Data from batch experiments done to determine strontium Kds of five sediment-infill samples and six standard reference material samples were analyzed by using multiple linear regression analysis and the stepwise variable-selection method in the statistical program, Statistical Product and Service Solutions, to derive an equation of variables that can be used to predict strontium Kds of sediment-infill samples. The sediment-infill samples were from basalt vesicles and fractures from a selected core at the INEEL; strontium Kds ranged from ???201 to 356 ml g-1. The standard material samples consisted of clay minerals and calcite. The statistical analyses of the batch-experiment results showed that the amount of strontium in the initial solution, the amount of manganese oxide in the sample material, and the amount of potassium in the initial solution are the most important variables in predicting strontium Kds of sediment-infill samples.

  10. Vis-NIR spectrometric determination of Brix and sucrose in sugar production samples using kernel partial least squares with interval selection based on the successive projections algorithm.

    PubMed

    de Almeida, Valber Elias; de Araújo Gomes, Adriano; de Sousa Fernandes, David Douglas; Goicoechea, Héctor Casimiro; Galvão, Roberto Kawakami Harrop; Araújo, Mario Cesar Ugulino

    2018-05-01

    This paper proposes a new variable selection method for nonlinear multivariate calibration, combining the Successive Projections Algorithm for interval selection (iSPA) with the Kernel Partial Least Squares (Kernel-PLS) modelling technique. The proposed iSPA-Kernel-PLS algorithm is employed in a case study involving a Vis-NIR spectrometric dataset with complex nonlinear features. The analytical problem consists of determining Brix and sucrose content in samples from a sugar production system, on the basis of transflectance spectra. As compared to full-spectrum Kernel-PLS, the iSPA-Kernel-PLS models involve a smaller number of variables and display statistically significant superiority in terms of accuracy and/or bias in the predictions. Published by Elsevier B.V.

  11. Wavelet neural networks: a practical guide.

    PubMed

    Alexandridis, Antonios K; Zapranis, Achilleas D

    2013-06-01

    Wavelet networks (WNs) are a new class of networks which have been used with great success in a wide range of applications. However a general accepted framework for applying WNs is missing from the literature. In this study, we present a complete statistical model identification framework in order to apply WNs in various applications. The following subjects were thoroughly examined: the structure of a WN, training methods, initialization algorithms, variable significance and variable selection algorithms, model selection methods and finally methods to construct confidence and prediction intervals. In addition the complexity of each algorithm is discussed. Our proposed framework was tested in two simulated cases, in one chaotic time series described by the Mackey-Glass equation and in three real datasets described by daily temperatures in Berlin, daily wind speeds in New York and breast cancer classification. Our results have shown that the proposed algorithms produce stable and robust results indicating that our proposed framework can be applied in various applications. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Evaluation of Scat Deposition Transects versus Radio Telemetry for Developing a Species Distribution Model for a Rare Desert Carnivore, the Kit Fox.

    PubMed

    Dempsey, Steven J; Gese, Eric M; Kluever, Bryan M; Lonsinger, Robert C; Waits, Lisette P

    2015-01-01

    Development and evaluation of noninvasive methods for monitoring species distribution and abundance is a growing area of ecological research. While noninvasive methods have the advantage of reduced risk of negative factors associated with capture, comparisons to methods using more traditional invasive sampling is lacking. Historically kit foxes (Vulpes macrotis) occupied the desert and semi-arid regions of southwestern North America. Once the most abundant carnivore in the Great Basin Desert of Utah, the species is now considered rare. In recent decades, attempts have been made to model the environmental variables influencing kit fox distribution. Using noninvasive scat deposition surveys for determination of kit fox presence, we modeled resource selection functions to predict kit fox distribution using three popular techniques (Maxent, fixed-effects, and mixed-effects generalized linear models) and compared these with similar models developed from invasive sampling (telemetry locations from radio-collared foxes). Resource selection functions were developed using a combination of landscape variables including elevation, slope, aspect, vegetation height, and soil type. All models were tested against subsequent scat collections as a method of model validation. We demonstrate the importance of comparing multiple model types for development of resource selection functions used to predict a species distribution, and evaluating the importance of environmental variables on species distribution. All models we examined showed a large effect of elevation on kit fox presence, followed by slope and vegetation height. However, the invasive sampling method (i.e., radio-telemetry) appeared to be better at determining resource selection, and therefore may be more robust in predicting kit fox distribution. In contrast, the distribution maps created from the noninvasive sampling (i.e., scat transects) were significantly different than the invasive method, thus scat transects may be appropriate when used in an occupancy framework to predict species distribution. We concluded that while scat deposition transects may be useful for monitoring kit fox abundance and possibly occupancy, they do not appear to be appropriate for determining resource selection. On our study area, scat transects were biased to roadways, while data collected using radio-telemetry was dictated by movements of the kit foxes themselves. We recommend that future studies applying noninvasive scat sampling should consider a more robust random sampling design across the landscape (e.g., random transects or more complete road coverage) that would then provide a more accurate and unbiased depiction of resource selection useful to predict kit fox distribution.

  13. Neuropsychological Test Selection for Cognitive Impairment Classification: A Machine Learning Approach

    PubMed Central

    Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.

    2016-01-01

    Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171

  14. SPECTROPHOTOMETRIC DETERMINATION OF CALCIUM WITH GLYOXAL BIS (2-HYDROXY- ANIL)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Florence, T.M.; Morgan, J.

    1961-03-01

    A selective method is described for the spectrophotometric determination of calcium using glyoxal bis(2hydroxy-anil) as the chromogenic agent. A comprehensive study of interferences and reagent variables was made. (auth)

  15. Genome-wide regression and prediction with the BGLR statistical package.

    PubMed

    Pérez, Paulino; de los Campos, Gustavo

    2014-10-01

    Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.

  16. Economic evaluation of genomic selection in small ruminants: a sheep meat breeding program.

    PubMed

    Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Lemarié, S; Fugeray-Scarbel, A; Elsen, J M

    2016-06-01

    Recent genomic evaluation studies using real data and predicting genetic gain by modeling breeding programs have reported moderate expected benefits from the replacement of classic selection schemes by genomic selection (GS) in small ruminants. The objectives of this study were to compare the cost, monetary genetic gain and economic efficiency of classic selection and GS schemes in the meat sheep industry. Deterministic methods were used to model selection based on multi-trait indices from a sheep meat breeding program. Decisional variables related to male selection candidates and progeny testing were optimized to maximize the annual monetary genetic gain (AMGG), that is, a weighted sum of meat and maternal traits annual genetic gains. For GS, a reference population of 2000 individuals was assumed and genomic information was available for evaluation of male candidates only. In the classic selection scheme, males breeding values were estimated from own and offspring phenotypes. In GS, different scenarios were considered, differing by the information used to select males (genomic only, genomic+own performance, genomic+offspring phenotypes). The results showed that all GS scenarios were associated with higher total variable costs than classic selection (if the cost of genotyping was 123 euros/animal). In terms of AMGG and economic returns, GS scenarios were found to be superior to classic selection only if genomic information was combined with their own meat phenotypes (GS-Pheno) or with their progeny test information. The predicted economic efficiency, defined as returns (proportional to number of expressions of AMGG in the nucleus and commercial flocks) minus total variable costs, showed that the best GS scenario (GS-Pheno) was up to 15% more efficient than classic selection. For all selection scenarios, optimization increased the overall AMGG, returns and economic efficiency. As a conclusion, our study shows that some forms of GS strategies are more advantageous than classic selection, provided that GS is already initiated (i.e. the initial reference population is available). Optimizing decisional variables of the classic selection scheme could be of greater benefit than including genomic information in optimized designs.

  17. Factors that influence utilisation of HIV/AIDS prevention methods among university students residing at a selected university campus.

    PubMed

    Ndabarora, Eléazar; Mchunu, Gugu

    2014-01-01

    Various studies have reported that university students, who are mostly young people, rarely use existing HIV/AIDS preventive methods. Although studies have shown that young university students have a high degree of knowledge about HIV/AIDS and HIV modes of transmission, they are still not utilising the existing HIV prevention methods and still engage in risky sexual practices favourable to HIV. Some variables, such as awareness of existing HIV/AIDS prevention methods, have been associated with utilisation of such methods. The study aimed to explore factors that influence use of existing HIV/AIDS prevention methods among university students residing in a selected campus, using the Health Belief Model (HBM) as a theoretical framework. A quantitative research approach and an exploratory-descriptive design were used to describe perceived factors that influence utilisation by university students of HIV/AIDS prevention methods. A total of 335 students completed online and manual questionnaires. Study findings showed that the factors which influenced utilisation of HIV/AIDS prevention methods were mainly determined by awareness of the existing university-based HIV/AIDS prevention strategies. Most utilised prevention methods were voluntary counselling and testing services and free condoms. Perceived susceptibility and perceived threat of HIV/AIDS score was also found to correlate with HIV risk index score. Perceived susceptibility and perceived threat of HIV/AIDS showed correlation with self-efficacy on condoms and their utilisation. Most HBM variables were not predictors of utilisation of HIV/AIDS prevention methods among students. Intervention aiming to improve the utilisation of HIV/AIDS prevention methods among students at the selected university should focus on removing identified barriers, promoting HIV/AIDS prevention services and providing appropriate resources to implement such programmes.

  18. Implementation of MCA Method for Identification of Factors for Conceptual Cost Estimation of Residential Buildings

    NASA Astrophysics Data System (ADS)

    Juszczyk, Michał; Leśniak, Agnieszka; Zima, Krzysztof

    2013-06-01

    Conceptual cost estimation is important for construction projects. Either underestimation or overestimation of building raising cost may lead to failure of a project. In the paper authors present application of a multicriteria comparative analysis (MCA) in order to select factors influencing residential building raising cost. The aim of the analysis is to indicate key factors useful in conceptual cost estimation in the early design stage. Key factors are being investigated on basis of the elementary information about the function, form and structure of the building, and primary assumptions of technological and organizational solutions applied in construction process. The mentioned factors are considered as variables of the model which aim is to make possible conceptual cost estimation fast and with satisfying accuracy. The whole analysis included three steps: preliminary research, choice of a set of potential variables and reduction of this set to select the final set of variables. Multicriteria comparative analysis is applied in problem solution. Performed analysis allowed to select group of factors, defined well enough at the conceptual stage of the design process, to be used as a describing variables of the model.

  19. Empirical Assessment of Spatial Prediction Methods for Location Cost Adjustment Factors

    PubMed Central

    Migliaccio, Giovanni C.; Guindani, Michele; D'Incognito, Maria; Zhang, Linlin

    2014-01-01

    In the feasibility stage, the correct prediction of construction costs ensures that budget requirements are met from the start of a project's lifecycle. A very common approach for performing quick-order-of-magnitude estimates is based on using Location Cost Adjustment Factors (LCAFs) that compute historically based costs by project location. Nowadays, numerous LCAF datasets are commercially available in North America, but, obviously, they do not include all locations. Hence, LCAFs for un-sampled locations need to be inferred through spatial interpolation or prediction methods. Currently, practitioners tend to select the value for a location using only one variable, namely the nearest linear-distance between two sites. However, construction costs could be affected by socio-economic variables as suggested by macroeconomic theories. Using a commonly used set of LCAFs, the City Cost Indexes (CCI) by RSMeans, and the socio-economic variables included in the ESRI Community Sourcebook, this article provides several contributions to the body of knowledge. First, the accuracy of various spatial prediction methods in estimating LCAF values for un-sampled locations was evaluated and assessed in respect to spatial interpolation methods. Two Regression-based prediction models were selected, a Global Regression Analysis and a Geographically-weighted regression analysis (GWR). Once these models were compared against interpolation methods, the results showed that GWR is the most appropriate way to model CCI as a function of multiple covariates. The outcome of GWR, for each covariate, was studied for all the 48 states in the contiguous US. As a direct consequence of spatial non-stationarity, it was possible to discuss the influence of each single covariate differently from state to state. In addition, the article includes a first attempt to determine if the observed variability in cost index values could be, at least partially explained by independent socio-economic variables. PMID:25018582

  20. Rate-Compatible Protograph LDPC Codes

    NASA Technical Reports Server (NTRS)

    Nguyen, Thuy V. (Inventor); Nosratinia, Aria (Inventor); Divsalar, Dariush (Inventor)

    2014-01-01

    Digital communication coding methods resulting in rate-compatible low density parity-check (LDPC) codes built from protographs. Described digital coding methods start with a desired code rate and a selection of the numbers of variable nodes and check nodes to be used in the protograph. Constraints are set to satisfy a linear minimum distance growth property for the protograph. All possible edges in the graph are searched for the minimum iterative decoding threshold and the protograph with the lowest iterative decoding threshold is selected. Protographs designed in this manner are used in decode and forward relay channels.

  1. A multi-fidelity analysis selection method using a constrained discrete optimization formulation

    NASA Astrophysics Data System (ADS)

    Stults, Ian C.

    The purpose of this research is to develop a method for selecting the fidelity of contributing analyses in computer simulations. Model uncertainty is a significant component of result validity, yet it is neglected in most conceptual design studies. When it is considered, it is done so in only a limited fashion, and therefore brings the validity of selections made based on these results into question. Neglecting model uncertainty can potentially cause costly redesigns of concepts later in the design process or can even cause program cancellation. Rather than neglecting it, if one were to instead not only realize the model uncertainty in tools being used but also use this information to select the tools for a contributing analysis, studies could be conducted more efficiently and trust in results could be quantified. Methods for performing this are generally not rigorous or traceable, and in many cases the improvement and additional time spent performing enhanced calculations are washed out by less accurate calculations performed downstream. The intent of this research is to resolve this issue by providing a method which will minimize the amount of time spent conducting computer simulations while meeting accuracy and concept resolution requirements for results. In many conceptual design programs, only limited data is available for quantifying model uncertainty. Because of this data sparsity, traditional probabilistic means for quantifying uncertainty should be reconsidered. This research proposes to instead quantify model uncertainty using an evidence theory formulation (also referred to as Dempster-Shafer theory) in lieu of the traditional probabilistic approach. Specific weaknesses in using evidence theory for quantifying model uncertainty are identified and addressed for the purposes of the Fidelity Selection Problem. A series of experiments was conducted to address these weaknesses using n-dimensional optimization test functions. These experiments found that model uncertainty present in analyses with 4 or fewer input variables could be effectively quantified using a strategic distribution creation method; if more than 4 input variables exist, a Frontier Finding Particle Swarm Optimization should instead be used. Once model uncertainty in contributing analysis code choices has been quantified, a selection method is required to determine which of these choices should be used in simulations. Because much of the selection done for engineering problems is driven by the physics of the problem, these are poor candidate problems for testing the true fitness of a candidate selection method. Specifically moderate and high dimensional problems' variability can often be reduced to only a few dimensions and scalability often cannot be easily addressed. For these reasons a simple academic function was created for the uncertainty quantification, and a canonical form of the Fidelity Selection Problem (FSP) was created. Fifteen best- and worst-case scenarios were identified in an effort to challenge the candidate selection methods both with respect to the characteristics of the tradeoff between time cost and model uncertainty and with respect to the stringency of the constraints and problem dimensionality. The results from this experiment show that a Genetic Algorithm (GA) was able to consistently find the correct answer, but under certain circumstances, a discrete form of Particle Swarm Optimization (PSO) was able to find the correct answer more quickly. To better illustrate how the uncertainty quantification and discrete optimization might be conducted for a "real world" problem, an illustrative example was conducted using gas turbine engines.

  2. Use of EPANET solver to manage water distribution in Smart City

    NASA Astrophysics Data System (ADS)

    Antonowicz, A.; Brodziak, R.; Bylka, J.; Mazurkiewicz, J.; Wojtecki, S.; Zakrzewski, P.

    2018-02-01

    Paper presents a method of using EPANET solver to support manage water distribution system in Smart City. The main task is to develop the application that allows remote access to the simulation model of the water distribution network developed in the EPANET environment. Application allows to perform both single and cyclic simulations with the specified step of changing the values of the selected process variables. In the paper the architecture of application was shown. The application supports the selection of the best device control algorithm using optimization methods. Optimization procedures are possible with following methods: brute force, SLSQP (Sequential Least SQuares Programming), Modified Powell Method. Article was supplemented by example of using developed computer tool.

  3. Robust nonlinear variable selective control for networked systems

    NASA Astrophysics Data System (ADS)

    Rahmani, Behrooz

    2016-10-01

    This paper is concerned with the networked control of a class of uncertain nonlinear systems. In this way, Takagi-Sugeno (T-S) fuzzy modelling is used to extend the previously proposed variable selective control (VSC) methodology to nonlinear systems. This extension is based upon the decomposition of the nonlinear system to a set of fuzzy-blended locally linearised subsystems and further application of the VSC methodology to each subsystem. To increase the applicability of the T-S approach for uncertain nonlinear networked control systems, this study considers the asynchronous premise variables in the plant and the controller, and then introduces a robust stability analysis and control synthesis. The resulting optimal switching-fuzzy controller provides a minimum guaranteed cost on an H2 performance index. Simulation studies on three nonlinear benchmark problems demonstrate the effectiveness of the proposed method.

  4. Fast classification of hazelnut cultivars through portable infrared spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Manfredi, Marcello; Robotti, Elisa; Quasso, Fabio; Mazzucco, Eleonora; Calabrese, Giorgio; Marengo, Emilio

    2018-01-01

    The authentication and traceability of hazelnuts is very important for both the consumer and the food industry, to safeguard the protected varieties and the food quality. This study investigates the use of a portable FTIR spectrometer coupled to multivariate statistical analysis for the classification of raw hazelnuts. The method discriminates hazelnuts from different origins/cultivars based on differences of the signal intensities of their IR spectra. The multivariate classification methods, namely principal component analysis (PCA) followed by linear discriminant analysis (LDA) and partial least square discriminant analysis (PLS-DA), with or without variable selection, allowed a very good discrimination among the groups, with PLS-DA coupled to variable selection providing the best results. Due to the fast analysis, high sensitivity, simplicity and no sample preparation, the proposed analytical methodology could be successfully used to verify the cultivar of hazelnuts, and the analysis can be performed quickly and directly on site.

  5. Method and apparatus for determining the hydraulic conductivity of earthen material

    DOEpatents

    Sisson, James B.; Honeycutt, Thomas K.; Hubbell, Joel M.

    1996-01-01

    An earthen material hydraulic conductivity determining apparatus includes, a) a semipermeable membrane having a fore earthen material bearing surface and an opposing rear liquid receiving surface; b) a pump in fluid communication with the semipermeable membrane rear surface, the pump being capable of delivering liquid to the membrane rear surface at a plurality of selected variable flow rates or at a plurality of selected variable pressures; c) a liquid reservoir in fluid communication with the pump, the liquid reservoir retaining a liquid for pumping to the membrane rear surface; and d) a pressure sensor in fluid communication with the membrane rear surface to measure pressure of liquid delivered to the membrane by the pump. Preferably, the pump comprises a pair of longitudinally opposed and aligned syringes which are operable to simultaneously fill one syringe while emptying the other. Methods of determining the hydraulic conductivity of earthen material are also disclosed.

  6. Method and apparatus for determining the hydraulic conductivity of earthen material

    DOEpatents

    Sisson, J.B.; Honeycutt, T.K.; Hubbell, J.M.

    1996-05-28

    An earthen material hydraulic conductivity determining apparatus includes: (a) a semipermeable membrane having a fore earthen material bearing surface and an opposing rear liquid receiving surface; (b) a pump in fluid communication with the semipermeable membrane rear surface, the pump being capable of delivering liquid to the membrane rear surface at a plurality of selected variable flow rates or at a plurality of selected variable pressures; (c) a liquid reservoir in fluid communication with the pump, the liquid reservoir retaining a liquid for pumping to the membrane rear surface; and (d) a pressure sensor in fluid communication with the membrane rear surface to measure pressure of liquid delivered to the membrane by the pump. Preferably, the pump comprises a pair of longitudinally opposed and aligned syringes which are operable to simultaneously fill one syringe while emptying the other. Methods of determining the hydraulic conductivity of earthen material are also disclosed. 15 figs.

  7. Machine learning search for variable stars

    NASA Astrophysics Data System (ADS)

    Pashchenko, Ilya N.; Sokolovsky, Kirill V.; Gavras, Panagiotis

    2018-04-01

    Photometric variability detection is often considered as a hypothesis testing problem: an object is variable if the null hypothesis that its brightness is constant can be ruled out given the measurements and their uncertainties. The practical applicability of this approach is limited by uncorrected systematic errors. We propose a new variability detection technique sensitive to a wide range of variability types while being robust to outliers and underestimated measurement uncertainties. We consider variability detection as a classification problem that can be approached with machine learning. Logistic Regression (LR), Support Vector Machines (SVM), k Nearest Neighbours (kNN), Neural Nets (NN), Random Forests (RF), and Stochastic Gradient Boosting classifier (SGB) are applied to 18 features (variability indices) quantifying scatter and/or correlation between points in a light curve. We use a subset of Optical Gravitational Lensing Experiment phase two (OGLE-II) Large Magellanic Cloud (LMC) photometry (30 265 light curves) that was searched for variability using traditional methods (168 known variable objects) as the training set and then apply the NN to a new test set of 31 798 OGLE-II LMC light curves. Among 205 candidates selected in the test set, 178 are real variables, while 13 low-amplitude variables are new discoveries. The machine learning classifiers considered are found to be more efficient (select more variables and fewer false candidates) compared to traditional techniques using individual variability indices or their linear combination. The NN, SGB, SVM, and RF show a higher efficiency compared to LR and kNN.

  8. An evaluation of unsupervised and supervised learning algorithms for clustering landscape types in the United States

    USGS Publications Warehouse

    Wendel, Jochen; Buttenfield, Barbara P.; Stanislawski, Larry V.

    2016-01-01

    Knowledge of landscape type can inform cartographic generalization of hydrographic features, because landscape characteristics provide an important geographic context that affects variation in channel geometry, flow pattern, and network configuration. Landscape types are characterized by expansive spatial gradients, lacking abrupt changes between adjacent classes; and as having a limited number of outliers that might confound classification. The US Geological Survey (USGS) is exploring methods to automate generalization of features in the National Hydrography Data set (NHD), to associate specific sequences of processing operations and parameters with specific landscape characteristics, thus obviating manual selection of a unique processing strategy for every NHD watershed unit. A chronology of methods to delineate physiographic regions for the United States is described, including a recent maximum likelihood classification based on seven input variables. This research compares unsupervised and supervised algorithms applied to these seven input variables, to evaluate and possibly refine the recent classification. Evaluation metrics for unsupervised methods include the Davies–Bouldin index, the Silhouette index, and the Dunn index as well as quantization and topographic error metrics. Cross validation and misclassification rate analysis are used to evaluate supervised classification methods. The paper reports the comparative analysis and its impact on the selection of landscape regions. The compared solutions show problems in areas of high landscape diversity. There is some indication that additional input variables, additional classes, or more sophisticated methods can refine the existing classification.

  9. Variable importance in nonlinear kernels (VINK): classification of digitized histopathology.

    PubMed

    Ginsburg, Shoshana; Ali, Sahirzeeshan; Lee, George; Basavanhally, Ajay; Madabhushi, Anant

    2013-01-01

    Quantitative histomorphometry is the process of modeling appearance of disease morphology on digitized histopathology images via image-based features (e.g., texture, graphs). Due to the curse of dimensionality, building classifiers with large numbers of features requires feature selection (which may require a large training set) or dimensionality reduction (DR). DR methods map the original high-dimensional features in terms of eigenvectors and eigenvalues, which limits the potential for feature transparency or interpretability. Although methods exist for variable selection and ranking on embeddings obtained via linear DR schemes (e.g., principal components analysis (PCA)), similar methods do not yet exist for nonlinear DR (NLDR) methods. In this work we present a simple yet elegant method for approximating the mapping between the data in the original feature space and the transformed data in the kernel PCA (KPCA) embedding space; this mapping provides the basis for quantification of variable importance in nonlinear kernels (VINK). We show how VINK can be implemented in conjunction with the popular Isomap and Laplacian eigenmap algorithms. VINK is evaluated in the contexts of three different problems in digital pathology: (1) predicting five year PSA failure following radical prostatectomy, (2) predicting Oncotype DX recurrence risk scores for ER+ breast cancers, and (3) distinguishing good and poor outcome p16+ oropharyngeal tumors. We demonstrate that subsets of features identified by VINK provide similar or better classification or regression performance compared to the original high dimensional feature sets.

  10. Sample selection and spatial models of housing price indexes, and, A disequilibrium analysis of the U.S. gasoline market using panel data

    NASA Astrophysics Data System (ADS)

    Hu, Haixin

    This dissertation consists of two parts. The first part studies the sample selection and spatial models of housing price index using transaction data on detached single-family houses of two California metropolitan areas from 1990 through 2008. House prices are often spatially correlated due to shared amenities, or when the properties are viewed as close substitutes in a housing submarket. There have been many studies that address spatial correlation in the context of housing markets. However, none has used spatial models to construct housing price indexes at zip code level for the entire time period analyzed in this dissertation to the best of my knowledge. In this paper, I study a first-order autoregressive spatial model with four different weighing matrix schemes. Four sets of housing price indexes are constructed accordingly. Gatzlaff and Haurin (1997, 1998) study the sample selection problem in housing index by using Heckman's two-step method. This method, however, is generally inefficient and can cause multicollinearity problem. Also, it requires data on unsold houses in order to carry out the first-step probit regression. Maximum likelihood (ML) method can be used to estimate a truncated incidental model which allows one to correct for sample selection based on transaction data only. However, convergence problem is very prevalent in practice. In this paper I adopt Lewbel's (2007) sample selection correction method which does not require one to model or estimate the selection model, except for some very general assumptions. I then extend this method to correct for spatial correlation. In the second part, I analyze the U.S. gasoline market with a disequilibrium model that allows lagged-latent variables, endogenous prices, and panel data with fixed effects. Most existing studies (see the survey of Espey, 1998, Energy Economics) of the gasoline market assume equilibrium. In practice, however, prices do not always adjust fast enough to clear the market. Equilibrium assumptions greatly simplify statistical inference, but are very restrictive and can produce conflicting estimates. For example, econometric models of markets that assume equilibrium often produce more elastic demand price elasticity than their disequilibrium counterparts (Holt and Johnson, 1989, Review of Economics and Statistics, Oczkowski, 1998, Economics Letters). The few studies that allow disequilibrium, however, have been limited to macroeconomic time-series data without lagged-latent variables. While time series data allows one to investigate national trends, it cannot be used to identify and analyze regional differences and the role of local markets. Exclusion of the lagged-latent variables is also undesirable because such variables capture adjustment costs and inter-temporal spillovers. Simulation methods offer tractable solutions to dynamic and panel data disequilibrium models (Lee, 1997, Journal of Econometrics), but assume normally distributed errors. This paper compares estimates of price/income elasticity and excess supply/demand across time periods, regions, and model specifications, using both equilibrium and disequilibrium methods. In the equilibrium model, I compare the within group estimator with Anderson and Hsiao's first-difference 2SLS estimator. In the disequilibrium model, I extend Amemiya's 2SLS by using Newey's efficient estimator with optimal instruments.

  11. VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS

    PubMed Central

    Huang, Jian; Horowitz, Joel L.; Wei, Fengrong

    2010-01-01

    We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is “small” relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method. PMID:21127739

  12. Analysis of host response to bacterial infection using error model based gene expression microarray experiments

    PubMed Central

    Stekel, Dov J.; Sarti, Donatella; Trevino, Victor; Zhang, Lihong; Salmon, Mike; Buckley, Chris D.; Stevens, Mark; Pallen, Mark J.; Penn, Charles; Falciani, Francesco

    2005-01-01

    A key step in the analysis of microarray data is the selection of genes that are differentially expressed. Ideally, such experiments should be properly replicated in order to infer both technical and biological variability, and the data should be subjected to rigorous hypothesis tests to identify the differentially expressed genes. However, in microarray experiments involving the analysis of very large numbers of biological samples, replication is not always practical. Therefore, there is a need for a method to select differentially expressed genes in a rational way from insufficiently replicated data. In this paper, we describe a simple method that uses bootstrapping to generate an error model from a replicated pilot study that can be used to identify differentially expressed genes in subsequent large-scale studies on the same platform, but in which there may be no replicated arrays. The method builds a stratified error model that includes array-to-array variability, feature-to-feature variability and the dependence of error on signal intensity. We apply this model to the characterization of the host response in a model of bacterial infection of human intestinal epithelial cells. We demonstrate the effectiveness of error model based microarray experiments and propose this as a general strategy for a microarray-based screening of large collections of biological samples. PMID:15800204

  13. Urban Heat Wave Vulnerability Analysis Considering Climate Change

    NASA Astrophysics Data System (ADS)

    JE, M.; KIM, H.; Jung, S.

    2017-12-01

    Much attention has been paid to thermal environments in Seoul City in South Korea since 2016 when the worst heatwave in 22 years. It is necessary to provide a selective measure by singling out vulnerable regions in advance to cope with the heat wave-related damage. This study aims to analyze and categorize vulnerable regions of thermal environments in the Seoul and analyzes and discusses the factors and risk factors for each type. To do this, this study conducted the following processes: first, based on the analyzed various literature reviews, indices that can evaluate vulnerable regions of thermal environment are collated. The indices were divided into climate exposure index related to temperature, sensitivity index including demographic, social, and economic indices, and adaptation index related to urban environment and climate adaptation policy status. Second, significant variables were derived to evaluate a vulnerable region of thermal environment based on the summarized indices in the above. this study analyzed a relationship between the number of heat-related patients in Seoul and variables that affected the number using multi-variate statistical analysis to derive significant variables. Third, the importance of each variable was calculated quantitatively by integrating the statistical analysis results and analytic hierarchy process (AHP) method. Fourth, a distribution of data for each index was identified based on the selected variables and indices were normalized and overlapped. Fifth, For the climate exposure index, evaluations were conducted as same as the current vulnerability evaluation method by selecting future temperature of Seoul predicted through the representative concentration pathways (RCPs) climate change scenarios as an evaluation variable. The results of this study can be utilized as foundational data to establish a countermeasure against heatwave in Seoul. Although it is limited to control heatwave occurrences itself completely, improvements on environment for heatwave alleviation and response can be done. In particular, if vulnerable regions of heatwave can be identified and managed in advance, the study results are expected to be utilized as a basis of policy utilization in local communities accordingly.

  14. Application-Dedicated Selection of Filters (ADSF) using covariance maximization and orthogonal projection.

    PubMed

    Hadoux, Xavier; Kumar, Dinesh Kant; Sarossy, Marc G; Roger, Jean-Michel; Gorretta, Nathalie

    2016-05-19

    Visible and near-infrared (Vis-NIR) spectra are generated by the combination of numerous low resolution features. Spectral variables are thus highly correlated, which can cause problems for selecting the most appropriate ones for a given application. Some decomposition bases such as Fourier or wavelet generally help highlighting spectral features that are important, but are by nature constraint to have both positive and negative components. Thus, in addition to complicating the selected features interpretability, it impedes their use for application-dedicated sensors. In this paper we have proposed a new method for feature selection: Application-Dedicated Selection of Filters (ADSF). This method relaxes the shape constraint by enabling the selection of any type of user defined custom features. By considering only relevant features, based on the underlying nature of the data, high regularization of the final model can be obtained, even in the small sample size context often encountered in spectroscopic applications. For larger scale deployment of application-dedicated sensors, these predefined feature constraints can lead to application specific optical filters, e.g., lowpass, highpass, bandpass or bandstop filters with positive only coefficients. In a similar fashion to Partial Least Squares, ADSF successively selects features using covariance maximization and deflates their influences using orthogonal projection in order to optimally tune the selection to the data with limited redundancy. ADSF is well suited for spectroscopic data as it can deal with large numbers of highly correlated variables in supervised learning, even with many correlated responses. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection

    PubMed Central

    Sabourin, Jeremy A; Valdar, William; Nobel, Andrew B

    2015-01-01

    Summary We describe a simple, computationally effcient, permutation-based procedure for selecting the penalty parameter in LASSO penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), Scaled Sparse Linear Regression, and a selection method based on recently developed testing procedures for the LASSO. PMID:26243050

  16. Intractable Ménière's disease. Modelling of the treatment by means of statistical analysis.

    PubMed

    Sanchez-Ferrandiz, Noelia; Fernandez-Gonzalez, Secundino; Guillen-Grima, Francisco; Perez-Fernandez, Nicolas

    2010-08-01

    To evaluate the value of different variables of the clinical history, auditory and vestibular tests and handicap measurements to define intractable or disabling Ménière's disease. This is a prospective study with 212 patients of which 155 were treated with intratympanic gentamicin and considered to be suffering a medically intractable Ménière's disease. Age and sex adjustments were performed with the 11 variables selected. Discriminant analysis was performed either using the aforementioned variables or following the stepwise method. Different variables needed to be sex and/or age adjusted and both data were included in the discriminant function. Two different mathematical formulas were obtained and four models were analyzed. With the model selected, diagnostic accuracy is 77.7%, sensitivity is 94.9% and specificity is 52.8%. After discriminant analysis we found that the most informative variables were the number of vertigo spells, the speech discrimination score, the time constant of the VOR and a measure of handicap, the "dizziness index". Copyright 2009 Elsevier Ireland Ltd. All rights reserved.

  17. The Effect of Latent Binary Variables on the Uncertainty of the Prediction of a Dichotomous Outcome Using Logistic Regression Based Propensity Score Matching.

    PubMed

    Szekér, Szabolcs; Vathy-Fogarassy, Ágnes

    2018-01-01

    Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.

  18. Multiple approaches to detect outliers in a genome scan for selection in ocellated lizards (Lacerta lepida) along an environmental gradient.

    PubMed

    Nunes, Vera L; Beaumont, Mark A; Butlin, Roger K; Paulo, Octávio S

    2011-01-01

    Identification of loci with adaptive importance is a key step to understand the speciation process in natural populations, because those loci are responsible for phenotypic variation that affects fitness in different environments. We conducted an AFLP genome scan in populations of ocellated lizards (Lacerta lepida) to search for candidate loci influenced by selection along an environmental gradient in the Iberian Peninsula. This gradient is strongly influenced by climatic variables, and two subspecies can be recognized at the opposite extremes: L. lepida iberica in the northwest and L. lepida nevadensis in the southeast. Both subspecies show substantial morphological differences that may be involved in their local adaptation to the climatic extremes. To investigate how the use of a particular outlier detection method can influence the results, a frequentist method, DFDIST, and a Bayesian method, BayeScan, were used to search for outliers influenced by selection. Additionally, the spatial analysis method was used to test for associations of AFLP marker band frequencies with 54 climatic variables by logistic regression. Results obtained with each method highlight differences in their sensitivity. DFDIST and BayeScan detected a similar proportion of outliers (3-4%), but only a few loci were simultaneously detected by both methods. Several loci detected as outliers were also associated with temperature, insolation or precipitation according to spatial analysis method. These results are in accordance with reported data in the literature about morphological and life-history variation of L. lepida subspecies along the environmental gradient. © 2010 Blackwell Publishing Ltd.

  19. Reliability study of tibialis posterior and selected leg muscle EMG and multi-segment foot kinematics in rheumatoid arthritis associated pes planovalgus

    PubMed Central

    Barn, Ruth; Rafferty, Daniel; Turner, Deborah E.; Woodburn, James

    2012-01-01

    Objective To determine within- and between-day reliability characteristics of electromyographic (EMG) activity patterns of selected lower leg muscles and kinematic variables in patients with rheumatoid arthritis (RA) and pes planovalgus. Methods Five patients with RA underwent gait analysis barefoot and shod on two occasions 1 week apart. Fine-wire (tibialis posterior [TP]) and surface EMG for selected muscles and 3D kinematics using a multi-segmented foot model was undertaken barefoot and shod. Reliability of pre-determined variables including EMG activity patterns and inter-segment kinematics were analysed using coefficients of multiple correlation, intraclass correlation coefficients (ICC) and the standard error of the measurement (SEM). Results Muscle activation patterns within- and between-day ranged from fair-to-good to excellent in both conditions. Discrete temporal and amplitude variables were highly variable across all muscle groups in both conditions but particularly poor for TP and peroneus longus. SEMs ranged from 1% to 9% of stance and 4% to 27% of maximum voluntary contraction; in most cases the 95% confidence interval crossed zero. Excellent within-day reliability was found for the inter-segment kinematics in both conditions. Between-day reliability ranged from fair-to-good to excellent for kinematic variables and all ICCs were excellent; the SEM ranged from 0.60° to 1.99°. Conclusion Multi-segmented foot kinematics can be reliably measured in RA patients with pes planovalgus. Serial measurement of discrete variables for TP and other selected leg muscles via EMG is not supported from the findings in this cohort of RA patients. Caution should be exercised when EMG measurements are considered to study disease progression or intervention effects. PMID:22721819

  20. Variability in Floral Scent in Rewarding and Deceptive Orchids: The Signature of Pollinator-imposed Selection?

    PubMed Central

    Salzmann, Charlotte C.; Nardella, Antonio M.; Cozzolino, Salvatore; Schiestl, Florian P.

    2007-01-01

    Background and Aims A comparative investigation was made of floral scent variation in the closely related, food-rewarding Anacamptis coriophora and the food-deceptive Anacamptis morio in order to identify patterns of variability of odour compounds in the two species and their role in pollinator attraction/avoidance learning. Methods Scent was collected from plants in natural populations and samples were analysed via quantitative gas chromatography and mass spectrometry. Combined gas chromatography and electroantennographic detection was used to identify compounds that are detected by the pollinators. Experimental reduction of scent variability was performed in the field with plots of A. morio plants supplemented with a uniform amount of anisaldehyde. Key Results Both orchid species emitted complex odour bouquets. In A. coriophora the two main benzenoid compounds, hydroquinone dimethyl ether (1,4-dimethoxybenzene) and anisaldehyde (methoxybenzaldehyde), triggered electrophysiological responses in olfactory neurons of honey-bee and bumble-bee workers. The scent of A. morio, however, was too weak to elicit any electrophysiological responses. The overall variation in scent was significantly lower in the rewarding A. coriophora than in the deceptive A. morio, suggesting pollinator avoidance-learning selecting for high variation in the deceptive species. A. morio flowers supplemented with non-variable scent in plot experiments, however, did not show significantly reduced pollination success. Conclusions Whereas in the rewarding A. coriophora stabilizing selection imposed by floral constancy of the pollinators may reduce scent variability, in the deceptive A. morio the emitted scent seems to be too weak to be detected by pollinators and thus its high variability may result from relaxed selection on this floral trait. PMID:17684024

  1. Procedures for shape optimization of gas turbine disks

    NASA Technical Reports Server (NTRS)

    Cheu, Tsu-Chien

    1989-01-01

    Two procedures, the feasible direction method and sequential linear programming, for shape optimization of gas turbine disks are presented. The objective of these procedures is to obtain optimal designs of turbine disks with geometric and stress constraints. The coordinates of the selected points on the disk contours are used as the design variables. Structural weight, stress and their derivatives with respect to the design variables are calculated by an efficient finite element method for design senitivity analysis. Numerical examples of the optimal designs of a disk subjected to thermo-mechanical loadings are presented to illustrate and compare the effectiveness of these two procedures.

  2. LandScape: a simple method to aggregate p-values and other stochastic variables without a priori grouping.

    PubMed

    Wiuf, Carsten; Schaumburg-Müller Pallesen, Jonatan; Foldager, Leslie; Grove, Jakob

    2016-08-01

    In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups. We provide different ways to evaluate the significance of the aggregated variables based on theoretical considerations and resampling techniques, and show that under certain assumptions the FWER is controlled in the strong sense. Validity of the method was demonstrated using simulations and real data analyses. Our method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The method is implemented in Python and freely available online (through GitHub, see the Supplementary information).

  3. Quantitative structure-activity relationship study of P2X7 receptor inhibitors using combination of principal component analysis and artificial intelligence methods.

    PubMed

    Ahmadi, Mehdi; Shahlaei, Mohsen

    2015-01-01

    P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure-activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7-7-1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure-activity relationship model suggested is robust and satisfactory.

  4. Quantitative structure–activity relationship study of P2X7 receptor inhibitors using combination of principal component analysis and artificial intelligence methods

    PubMed Central

    Ahmadi, Mehdi; Shahlaei, Mohsen

    2015-01-01

    P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure–activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7−7−1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure–activity relationship model suggested is robust and satisfactory. PMID:26600858

  5. A public dataset of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics

    PubMed Central

    Fukuchi, Claudiane A.; Duarte, Marcos

    2017-01-01

    Background The goals of this study were (1) to present the set of data evaluating running biomechanics (kinematics and kinetics), including data on running habits, demographics, and levels of muscle strength and flexibility made available at Figshare (DOI: 10.6084/m9.figshare.4543435); and (2) to examine the effect of running speed on selected gait-biomechanics variables related to both running injuries and running economy. Methods The lower-extremity kinematics and kinetics data of 28 regular runners were collected using a three-dimensional (3D) motion-capture system and an instrumented treadmill while the subjects ran at 2.5 m/s, 3.5 m/s, and 4.5 m/s wearing standard neutral shoes. Results A dataset comprising raw and processed kinematics and kinetics signals pertaining to this experiment is available in various file formats. In addition, a file of metadata, including demographics, running characteristics, foot-strike patterns, and muscle strength and flexibility measurements is provided. Overall, there was an effect of running speed on most of the gait-biomechanics variables selected for this study. However, the foot-strike patterns were not affected by running speed. Discussion Several applications of this dataset can be anticipated, including testing new methods of data reduction and variable selection; for educational purposes; and answering specific research questions. This last application was exemplified in the study’s second objective. PMID:28503379

  6. Variance Component Selection With Applications to Microbiome Taxonomic Data.

    PubMed

    Zhai, Jing; Kim, Juhyun; Knox, Kenneth S; Twigg, Homer L; Zhou, Hua; Zhou, Jin J

    2018-01-01

    High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection.

  7. Methods for Improving Information from "Undesigned" Human Factors Experiments. Technical Report No. p75-287.

    ERIC Educational Resources Information Center

    Simon, Charles W.

    An "undesigned" experiment is one in which the predictor variables are correlated, either due to a failure to complete a design or because the investigator was unable to select or control relevant experimental conditions. The traditional method of analyzing this class of experiment--multiple regression analysis based on a least squares…

  8. Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales

    NASA Astrophysics Data System (ADS)

    Kristoufek, Ladislav

    2015-02-01

    We propose a framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential nonstationarity and power-law correlations. The former feature allows for distinguishing between effects for a pair of variables from different temporal perspectives. The latter ones make the method a significant improvement over the standard least squares estimation. Theoretical claims are supported by Monte Carlo simulations. The method is then applied on selected examples from physics, finance, environmental science, and epidemiology. For most of the studied cases, the relationship between variables of interest varies strongly across scales.

  9. General shape optimization capability

    NASA Technical Reports Server (NTRS)

    Chargin, Mladen K.; Raasch, Ingo; Bruns, Rudolf; Deuermeyer, Dawson

    1991-01-01

    A method is described for calculating shape sensitivities, within MSC/NASTRAN, in a simple manner without resort to external programs. The method uses natural design variables to define the shape changes in a given structure. Once the shape sensitivities are obtained, the shape optimization process is carried out in a manner similar to property optimization processes. The capability of this method is illustrated by two examples: the shape optimization of a cantilever beam with holes, loaded by a point load at the free end (with the shape of the holes and the thickness of the beam selected as the design variables), and the shape optimization of a connecting rod subjected to several different loading and boundary conditions.

  10. A Bayesian method for assessing multiscalespecies-habitat relationships

    USGS Publications Warehouse

    Stuber, Erica F.; Gruber, Lutz F.; Fontaine, Joseph J.

    2017-01-01

    ContextScientists face several theoretical and methodological challenges in appropriately describing fundamental wildlife-habitat relationships in models. The spatial scales of habitat relationships are often unknown, and are expected to follow a multi-scale hierarchy. Typical frequentist or information theoretic approaches often suffer under collinearity in multi-scale studies, fail to converge when models are complex or represent an intractable computational burden when candidate model sets are large.ObjectivesOur objective was to implement an automated, Bayesian method for inference on the spatial scales of habitat variables that best predict animal abundance.MethodsWe introduce Bayesian latent indicator scale selection (BLISS), a Bayesian method to select spatial scales of predictors using latent scale indicator variables that are estimated with reversible-jump Markov chain Monte Carlo sampling. BLISS does not suffer from collinearity, and substantially reduces computation time of studies. We present a simulation study to validate our method and apply our method to a case-study of land cover predictors for ring-necked pheasant (Phasianus colchicus) abundance in Nebraska, USA.ResultsOur method returns accurate descriptions of the explanatory power of multiple spatial scales, and unbiased and precise parameter estimates under commonly encountered data limitations including spatial scale autocorrelation, effect size, and sample size. BLISS outperforms commonly used model selection methods including stepwise and AIC, and reduces runtime by 90%.ConclusionsGiven the pervasiveness of scale-dependency in ecology, and the implications of mismatches between the scales of analyses and ecological processes, identifying the spatial scales over which species are integrating habitat information is an important step in understanding species-habitat relationships. BLISS is a widely applicable method for identifying important spatial scales, propagating scale uncertainty, and testing hypotheses of scaling relationships.

  11. Evaluation of Maryland abutment scour equation through selected threshold velocity methods

    USGS Publications Warehouse

    Benedict, S.T.

    2010-01-01

    The U.S. Geological Survey, in cooperation with the Maryland State Highway Administration, used field measurements of scour to evaluate the sensitivity of the Maryland abutment scour equation to the critical (or threshold) velocity variable. Four selected methods for estimating threshold velocity were applied to the Maryland abutment scour equation, and the predicted scour to the field measurements were compared. Results indicated that performance of the Maryland abutment scour equation was sensitive to the threshold velocity with some threshold velocity methods producing better estimates of predicted scour than did others. In addition, results indicated that regional stream characteristics can affect the performance of the Maryland abutment scour equation with moderate-gradient streams performing differently from low-gradient streams. On the basis of the findings of the investigation, guidance for selecting threshold velocity methods for application to the Maryland abutment scour equation are provided, and limitations are noted.

  12. Primary outcome indices in illicit drug dependence treatment research: systematic approach to selection and measurement of drug use end-points in clinical trials

    PubMed Central

    Donovan, Dennis M.; Bigelow, George E.; Brigham, Gregory S.; Carroll, Kathleen M.; Cohen, Allan J.; Gardin, John G.; Hamilton, John A.; Huestis, Marilyn A.; Hughes, John R.; Lindblad, Robert; Marlatt, G. Alan; Preston, Kenzie L.; Selzer, Jeffrey A.; Somoza, Eugene C.; Wakim, Paul G.; Wells, Elizabeth A.

    2012-01-01

    Aims Clinical trials test the safety and efficacy of behavioral and pharmacological interventions in drug-dependent individuals. However, there is no consensus about the most appropriate outcome(s) to consider in determining treatment efficacy or on the most appropriate methods for assessing selected outcome(s). We summarize the discussion and recommendations of treatment and research experts, convened by the US National Institute on Drug Abuse, to select appropriate primary outcomes for drug dependence treatment clinical trials, and in particular the feasibility of selecting a common outcome to be included in all or most trials. Methods A brief history of outcomes employed in prior drug dependence treatment research, incorporating perspectives from tobacco and alcohol research, is included. The relative merits and limitations of focusing on drug-taking behavior, as measured by self-report and qualitative or quantitative biological markers, are evaluated. Results Drug-taking behavior, measured ideally by a combination of self-report and biological indicators, is seen as the most appropriate proximal primary outcome in drug dependence treatment clinical trials. Conclusions We conclude that the most appropriate outcome will vary as a function of salient variables inherent in the clinical trial, such as the type of intervention, its target, treatment goals (e.g. abstinence or reduction of use) and the perspective being taken (e.g. researcher, clinical program, patient, society). It is recommended that a decision process, based on such trial variables, be developed to guide the selection of primary and secondary outcomes as well as the methods to assess them. PMID:21781202

  13. Identification of Urinary Polyphenol Metabolite Patterns Associated with Polyphenol-Rich Food Intake in Adults from Four European Countries.

    PubMed

    Noh, Hwayoung; Freisling, Heinz; Assi, Nada; Zamora-Ros, Raul; Achaintre, David; Affret, Aurélie; Mancini, Francesca; Boutron-Ruault, Marie-Christine; Flögel, Anna; Boeing, Heiner; Kühn, Tilman; Schübel, Ruth; Trichopoulou, Antonia; Naska, Androniki; Kritikou, Maria; Palli, Domenico; Pala, Valeria; Tumino, Rosario; Ricceri, Fulvio; Santucci de Magistris, Maria; Cross, Amanda; Slimani, Nadia; Scalbert, Augustin; Ferrari, Pietro

    2017-07-25

    We identified urinary polyphenol metabolite patterns by a novel algorithm that combines dimension reduction and variable selection methods to explain polyphenol-rich food intake, and compared their respective performance with that of single biomarkers in the European Prospective Investigation into Cancer and Nutrition (EPIC) study. The study included 475 adults from four European countries (Germany, France, Italy, and Greece). Dietary intakes were assessed with 24-h dietary recalls (24-HDR) and dietary questionnaires (DQ). Thirty-four polyphenols were measured by ultra-performance liquid chromatography-electrospray ionization-tandem mass spectrometry (UPLC-ESI-MS-MS) in 24-h urine. Reduced rank regression-based variable importance in projection (RRR-VIP) and least absolute shrinkage and selection operator (LASSO) methods were used to select polyphenol metabolites. Reduced rank regression (RRR) was then used to identify patterns in these metabolites, maximizing the explained variability in intake of pre-selected polyphenol-rich foods. The performance of RRR models was evaluated using internal cross-validation to control for over-optimistic findings from over-fitting. High performance was observed for explaining recent intake (24-HDR) of red wine ( r = 0.65; AUC = 89.1%), coffee ( r = 0.51; AUC = 89.1%), and olives ( r = 0.35; AUC = 82.2%). These metabolite patterns performed better or equally well compared to single polyphenol biomarkers. Neither metabolite patterns nor single biomarkers performed well in explaining habitual intake (as reported in the DQ) of polyphenol-rich foods. This proposed strategy of biomarker pattern identification has the potential of expanding the currently still limited list of available dietary intake biomarkers.

  14. Optical wavelength selection for portable hemoglobin determination by near-infrared spectroscopy method

    NASA Astrophysics Data System (ADS)

    Tian, Han; Li, Ming; Wang, Yue; Sheng, Dinggao; Liu, Jun; Zhang, Linna

    2017-11-01

    Hemoglobin concentration is commonly used in clinical medicine to diagnose anemia, identify bleeding, and manage red blood cell transfusions. The golden standard method for determining hemoglobin concentration in blood requires reagent. Spectral methods were advantageous at fast and non-reagent measurement. However, model calibration with full spectrum is time-consuming. Moreover, it is necessary to use a few variables considering size and cost of instrumentation, especially for a portable biomedical instrument. This study presents different wavelength selection methods for optical wavelengths for total hemoglobin concentration determination in whole blood. The results showed that modelling using only two wavelengths combination (1143 nm, 1298 nm) can keep on the fine predictability with full spectrum. It appears that the proper selection of optical wavelengths can be more effective than using the whole spectra for determination hemoglobin in whole blood. We also discussed the influence of water absorptivity on the wavelength selection. This research provides valuable references for designing portable NIR instruments determining hemoglobin concentration, and may provide some experience for noninvasive hemoglobin measurement by NIR methods.

  15. On using surface-source downhole-receiver logging to determine seismic slownesses

    USGS Publications Warehouse

    Boore, D.M.; Thompson, E.M.

    2007-01-01

    We present a method to solve for slowness models from surface-source downhole-receiver seismic travel-times. The method estimates the slownesses in a single inversion of the travel-times from all receiver depths and accounts for refractions at layer boundaries. The number and location of layer interfaces in the model can be selected based on lithologic changes or linear trends in the travel-time data. The interfaces based on linear trends in the data can be picked manually or by an automated algorithm. We illustrate the method with example sites for which geologic descriptions of the subsurface materials and independent slowness measurements are available. At each site we present slowness models that result from different interpretations of the data. The examples were carefully selected to address the reliability of interface-selection and the ability of the inversion to identify thin layers, large slowness contrasts, and slowness gradients. Additionally, we compare the models in terms of ground-motion amplification. These plots illustrate the sensitivity of site amplifications to the uncertainties in the slowness model. We show that one-dimensional site amplifications are insensitive to thin layers in the slowness models; although slowness is variable over short ranges of depth, this variability has little affect on ground-motion amplification at frequencies up to 5 Hz.

  16. Selection of optimal welding condition for GTA pulse welding in root-pass of V-groove butt joint

    NASA Astrophysics Data System (ADS)

    Yun, Seok-Chul; Kim, Jae-Woong

    2010-12-01

    In the manufacture of high-quality welds or pipeline, a full-penetration weld has to be made along the weld joint. Therefore, root-pass welding is very important, and its conditions have to be selected carefully. In this study, an experimental method for the selection of optimal welding conditions is proposed for gas tungsten arc (GTA) pulse welding in the root pass which is done along the V-grooved butt-weld joint. This method uses response surface analysis in which the width and height of back bead are chosen as quality variables of the weld. The overall desirability function, which is the combined desirability function for the two quality variables, is used as the objective function to obtain the optimal welding conditions. In our experiments, the target values of back bead width and height are 4 mm and zero, respectively, for a V-grooved butt-weld joint of a 7-mm-thick steel plate. The optimal welding conditions could determine the back bead profile (bead width and height) as 4.012 mm and 0.02 mm. From a series of welding tests, it was revealed that a uniform and full-penetration weld bead can be obtained by adopting the optimal welding conditions determined according to the proposed method.

  17. Radiograph and passive data analysis using mixed variable optimization

    DOEpatents

    Temple, Brian A.; Armstrong, Jerawan C.; Buescher, Kevin L.; Favorite, Jeffrey A.

    2015-06-02

    Disclosed herein are representative embodiments of methods, apparatus, and systems for performing radiography analysis. For example, certain embodiments perform radiographic analysis using mixed variable computation techniques. One exemplary system comprises a radiation source, a two-dimensional detector for detecting radiation transmitted through a object between the radiation source and detector, and a computer. In this embodiment, the computer is configured to input the radiographic image data from the two-dimensional detector and to determine one or more materials that form the object by using an iterative analysis technique that selects the one or more materials from hierarchically arranged solution spaces of discrete material possibilities and selects the layer interfaces from the optimization of the continuous interface data.

  18. Determination of oleamide and erucamide in polyethylene films by pressurised fluid extraction and gas chromatography.

    PubMed

    Garrido-López, Alvaro; Esquiu, Vanesa; Tena, María Teresa

    2006-08-18

    A pressurized fluid extraction (PFE) and gas chromatography-flame ionization detection (GC-FID) method is proposed to determine the slip agents in polyethylene (PE) films. The study of PFE variables was performed using a fractional factorial design (FFD) for screening and a central composite design (CCD) for optimizing the main variables obtained from the Pareto charts. The variables that were studied include temperature, static time, percentage of cyclohexane and the number of extraction cycles. The final condition selected was pure isopropanol (two times) at 105 degrees C for 16min. The recovery of spiked oleamide and erucamide was around 100%. The repeatability of the method was between 9.6% for oleamide and 8% for erucamide, expressed as relative standard deviation. Finally, the method was applied to determine oleamide and erucamide in several polyethylene films and the results were statistically equal to those obtained by pyrolysis and gas-phase chemiluminescence (CL).

  19. Quantifying human disturbance in watersheds: Variable selection and performance of a GIS-based disturbance index for predicting the biological condition of perennial streams

    USGS Publications Warehouse

    Falcone, James A.; Carlisle, Daren M.; Weber, Lisa C.

    2010-01-01

    Characterizing the relative severity of human disturbance in watersheds is often part of stream assessments and is frequently done with the aid of Geographic Information System (GIS)-derived data. However, the choice of variables and how they are used to quantify disturbance are often subjective. In this study, we developed a number of disturbance indices by testing sets of variables, scoring methods, and weightings of 33 potential disturbance factors derived from readily available GIS data. The indices were calibrated using 770 watersheds located in the western United States for which the severity of disturbance had previously been classified from detailed local data by the United States Environmental Protection Agency (USEPA) Environmental Monitoring and Assessment Program (EMAP). The indices were calibrated by determining which variable or variable combinations and aggregation method best differentiated between least- and most-disturbed sites. Indices composed of several variables performed better than any individual variable, and best results came from a threshold method of scoring using six uncorrelated variables: housing unit density, road density, pesticide application, dam storage, land cover along a mainstem buffer, and distance to nearest canal/pipeline. The final index was validated with 192 withheld watersheds and correctly classified about two-thirds (68%) of least- and most-disturbed sites. These results provide information about the potential for using a disturbance index as a screening tool for a priori ranking of watersheds at a regional/national scale, and which landscape variables and methods of combination may be most helpful in doing so.

  20. Research of Water Level Prediction for a Continuous Flood due to Typhoons Based on a Machine Learning Method

    NASA Astrophysics Data System (ADS)

    Nakatsugawa, M.; Kobayashi, Y.; Okazaki, R.; Taniguchi, Y.

    2017-12-01

    This research aims to improve accuracy of water level prediction calculations for more effective river management. In August 2016, Hokkaido was visited by four typhoons, whose heavy rainfall caused severe flooding. In the Tokoro river basin of Eastern Hokkaido, the water level (WL) at the Kamikawazoe gauging station, which is at the lower reaches exceeded the design high-water level and the water rose to the highest level on record. To predict such flood conditions and mitigate disaster damage, it is necessary to improve the accuracy of prediction as well as to prolong the lead time (LT) required for disaster mitigation measures such as flood-fighting activities and evacuation actions by residents. There is the need to predict the river water level around the peak stage earlier and more accurately. Previous research dealing with WL prediction had proposed a method in which the WL at the lower reaches is estimated by the correlation with the WL at the upper reaches (hereinafter: "the water level correlation method"). Additionally, a runoff model-based method has been generally used in which the discharge is estimated by giving rainfall prediction data to a runoff model such as a storage function model and then the WL is estimated from that discharge by using a WL discharge rating curve (H-Q curve). In this research, an attempt was made to predict WL by applying the Random Forest (RF) method, which is a machine learning method that can estimate the contribution of explanatory variables. Furthermore, from the practical point of view, we investigated the prediction of WL based on a multiple correlation (MC) method involving factors using explanatory variables with high contribution in the RF method, and we examined the proper selection of explanatory variables and the extension of LT. The following results were found: 1) Based on the RF method tuned up by learning from previous floods, the WL for the abnormal flood case of August 2016 was properly predicted with a lead time of 6 h. 2) Based on the contribution of explanatory variables, factors were selected for the MC method. In this way, plausible prediction results were obtained.

  1. Phage diabody repertoires for selection of large numbers of bispecific antibody fragments.

    PubMed

    McGuinness, B T; Walter, G; FitzGerald, K; Schuler, P; Mahoney, W; Duncan, A R; Hoogenboom, H R

    1996-09-01

    Methods for the generation of large numbers of different bispecific antibodies are presented. Cloning strategies are detailed to create repertoires of bispecific diabody molecules with variability on one or both of the antigen binding sites. This diabody format, when combined with the power of phage display technology, allows the generation and analysis of thousands of different bispecific molecules. Selection for binding presumably also selects for more stable diabodies. Phage diabody libraries enable screening or selection of the best combination bispecific molecule with regards to affinity of binding, epitope recognition and pairing before manufacture of the best candidate.

  2. Rapid and Simultaneous Prediction of Eight Diesel Quality Parameters through ATR-FTIR Analysis.

    PubMed

    Nespeca, Maurilio Gustavo; Hatanaka, Rafael Rodrigues; Flumignan, Danilo Luiz; de Oliveira, José Eduardo

    2018-01-01

    Quality assessment of diesel fuel is highly necessary for society, but the costs and time spent are very high while using standard methods. Therefore, this study aimed to develop an analytical method capable of simultaneously determining eight diesel quality parameters (density; flash point; total sulfur content; distillation temperatures at 10% (T10), 50% (T50), and 85% (T85) recovery; cetane index; and biodiesel content) through attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy and the multivariate regression method, partial least square (PLS). For this purpose, the quality parameters of 409 samples were determined using standard methods, and their spectra were acquired in ranges of 4000-650 cm -1 . The use of the multivariate filters, generalized least squares weighting (GLSW) and orthogonal signal correction (OSC), was evaluated to improve the signal-to-noise ratio of the models. Likewise, four variable selection approaches were tested: manual exclusion, forward interval PLS (FiPLS), backward interval PLS (BiPLS), and genetic algorithm (GA). The multivariate filters and variables selection algorithms generated more fitted and accurate PLS models. According to the validation, the FTIR/PLS models presented accuracy comparable to the reference methods and, therefore, the proposed method can be applied in the diesel routine monitoring to significantly reduce costs and analysis time.

  3. Rapid and Simultaneous Prediction of Eight Diesel Quality Parameters through ATR-FTIR Analysis

    PubMed Central

    Hatanaka, Rafael Rodrigues; Flumignan, Danilo Luiz; de Oliveira, José Eduardo

    2018-01-01

    Quality assessment of diesel fuel is highly necessary for society, but the costs and time spent are very high while using standard methods. Therefore, this study aimed to develop an analytical method capable of simultaneously determining eight diesel quality parameters (density; flash point; total sulfur content; distillation temperatures at 10% (T10), 50% (T50), and 85% (T85) recovery; cetane index; and biodiesel content) through attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy and the multivariate regression method, partial least square (PLS). For this purpose, the quality parameters of 409 samples were determined using standard methods, and their spectra were acquired in ranges of 4000–650 cm−1. The use of the multivariate filters, generalized least squares weighting (GLSW) and orthogonal signal correction (OSC), was evaluated to improve the signal-to-noise ratio of the models. Likewise, four variable selection approaches were tested: manual exclusion, forward interval PLS (FiPLS), backward interval PLS (BiPLS), and genetic algorithm (GA). The multivariate filters and variables selection algorithms generated more fitted and accurate PLS models. According to the validation, the FTIR/PLS models presented accuracy comparable to the reference methods and, therefore, the proposed method can be applied in the diesel routine monitoring to significantly reduce costs and analysis time. PMID:29629209

  4. A comparison of small-area estimation techniques to estimate selected stand attributes using LiDAR-derived auxiliary variables

    Treesearch

    Michael E. Goerndt; Vicente J. Monleon; Hailemariam Temesgen

    2011-01-01

    One of the challenges often faced in forestry is the estimation of forest attributes for smaller areas of interest within a larger population. Small-area estimation (SAE) is a set of techniques well suited to estimation of forest attributes for small areas in which the existing sample size is small and auxiliary information is available. Selected SAE methods were...

  5. An effective selection method for low-mass active black holes and first spectroscopic identification

    NASA Astrophysics Data System (ADS)

    Morokuma, Tomoki; Tominaga, Nozomu; Tanaka, Masaomi; Yasuda, Naoki; Furusawa, Hisanori; Taniguchi, Yuki; Kato, Takahiro; Jiang, Ji-an; Nagao, Tohru; Kuncarayakti, Hanindyo; Morokuma-Matsui, Kana; Ikeda, Hiroyuki; Blinnikov, Sergei; Nomoto, Ken'ichi; Kokubo, Mitsuru; Doi, Mamoru

    2016-06-01

    We present a new method for effectively selecting objects which may be low-mass active black holes (BHs) at galaxy centers using high-cadence optical imaging data, and our first spectroscopic identification of an active 2.7 × 106 M⊙ BH at z = 0.164. This active BH was originally selected due to its rapid optical variability, from a few hours to a day, based on Subaru Hyper Suprime-Cam g-band imaging data taken with a 1 hr cadence. Broad and narrow Hα lines and many other emission ones are detected in our optical spectra taken with Subaru FOCAS, and the BH mass is measured via the broad Hα emission line width (1880 km s-1) and luminosity (4.2 × 1040 erg s-1) after careful correction to the atmospheric absorption around 7580-7720 Å. We measure the Eddington ratio and find it to be as low as 0.05, considerably smaller than those in a previous SDSS sample with similar BH mass and redshift, which indicates one of the special potentials of our Subaru survey. The g - r color and morphology of the extended component indicate that the host galaxy is a star-forming galaxy. We also show the effectiveness of our variability selection for low-mass active BHs.

  6. [Characteristic wavelengths selection of soluble solids content of pear based on NIR spectral and LS-SVM].

    PubMed

    Fan, Shu-xiang; Huang, Wen-qian; Li, Jiang-bo; Zhao, Chun-jiang; Zhang, Bao-hua

    2014-08-01

    To improve the precision and robustness of the NIR model of the soluble solid content (SSC) on pear. The total number of 160 pears was for the calibration (n=120) and prediction (n=40). Different spectral pretreatment methods, including standard normal variate (SNV) and multiplicative scatter correction (MSC) were used before further analysis. A combination of genetic algorithm (GA) and successive projections algorithm (SPA) was proposed to select most effective wavelengths after uninformative variable elimination (UVE) from original spectra, SNV pretreated spectra and MSC pretreated spectra respectively. The selected variables were used as the inputs of least squares-support vector machine (LS-SVM) model to build models for de- termining the SSC of pear. The results indicated that LS-SVM model built using SNVE-UVE-GA-SPA on 30 characteristic wavelengths selected from full-spectrum which had 3112 wavelengths achieved the optimal performance. The correlation coefficient (Rp) and root mean square error of prediction (RMSEP) for prediction sets were 0.956, 0.271 for SSC. The model is reliable and the predicted result is effective. The method can meet the requirement of quick measuring SSC of pear and might be important for the development of portable instruments and online monitoring.

  7. Genome scans for divergent selection in natural populations of the widespread hardwood species Eucalyptus grandis (Myrtaceae) using microsatellites

    PubMed Central

    Song, Zhijiao; Zhang, Miaomiao; Li, Fagen; Weng, Qijie; Zhou, Chanpin; Li, Mei; Li, Jie; Huang, Huanhua; Mo, Xiaoyong; Gan, Siming

    2016-01-01

    Identification of loci or genes under natural selection is important for both understanding the genetic basis of local adaptation and practical applications, and genome scans provide a powerful means for such identification purposes. In this study, genome-wide simple sequence repeats markers (SSRs) were used to scan for molecular footprints of divergent selection in Eucalyptus grandis, a hardwood species occurring widely in costal areas from 32° S to 16° S in Australia. High population diversity levels and weak population structure were detected with putatively neutral genomic SSRs. Using three FST outlier detection methods, a total of 58 outlying SSRs were collectively identified as loci under divergent selection against three non-correlated climatic variables, namely, mean annual temperature, isothermality and annual precipitation. Using a spatial analysis method, nine significant associations were revealed between FST outlier allele frequencies and climatic variables, involving seven alleles from five SSR loci. Of the five significant SSRs, two (EUCeSSR1044 and Embra394) contained alleles of putative genes with known functional importance for response to climatic factors. Our study presents critical information on the population diversity and structure of the important woody species E. grandis and provides insight into the adaptive responses of perennial trees to climatic variations. PMID:27748400

  8. Can Geostatistical Models Represent Nature's Variability? An Analysis Using Flume Experiments

    NASA Astrophysics Data System (ADS)

    Scheidt, C.; Fernandes, A. M.; Paola, C.; Caers, J.

    2015-12-01

    The lack of understanding in the Earth's geological and physical processes governing sediment deposition render subsurface modeling subject to large uncertainty. Geostatistics is often used to model uncertainty because of its capability to stochastically generate spatially varying realizations of the subsurface. These methods can generate a range of realizations of a given pattern - but how representative are these of the full natural variability? And how can we identify the minimum set of images that represent this natural variability? Here we use this minimum set to define the geostatistical prior model: a set of training images that represent the range of patterns generated by autogenic variability in the sedimentary environment under study. The proper definition of the prior model is essential in capturing the variability of the depositional patterns. This work starts with a set of overhead images from an experimental basin that showed ongoing autogenic variability. We use the images to analyze the essential characteristics of this suite of patterns. In particular, our goal is to define a prior model (a minimal set of selected training images) such that geostatistical algorithms, when applied to this set, can reproduce the full measured variability. A necessary prerequisite is to define a measure of variability. In this study, we measure variability using a dissimilarity distance between the images. The distance indicates whether two snapshots contain similar depositional patterns. To reproduce the variability in the images, we apply an MPS algorithm to the set of selected snapshots of the sedimentary basin that serve as training images. The training images are chosen from among the initial set by using the distance measure to ensure that only dissimilar images are chosen. Preliminary investigations show that MPS can reproduce fairly accurately the natural variability of the experimental depositional system. Furthermore, the selected training images provide process information. They fall into three basic patterns: a channelized end member, a sheet flow end member, and one intermediate case. These represent the continuum between autogenic bypass or erosion, and net deposition.

  9. Variably insulating portable heater/cooler

    DOEpatents

    Potter, Thomas F.

    1998-01-01

    A compact vacuum insulation panel comprising a chamber enclosed by two sheets of metal, glass-like spaces disposed in the chamber between the sidewalls, and a high-grade vacuum in the chamber includes apparatus and methods for enabling and disabling, or turning "on" and "off" the thermal insulating capability of the panel. One type of enabling and disabling apparatus and method includes a metal hydride for releasing hydrogen gas into the chamber in response to heat, and a hydrogen grate between the metal hydride and the chamber for selectively preventing and allowing return of the hydrogen gas to the metal hydride. Another type of enabling and disabling apparatus and method includes a variable emissivity coating on the sheets of metal in which the emissivity is controllably variable by heat or electricity. Still another type of enabling and disabling apparatus and method includes metal-to-metal contact devices that can be actuated to establish or break metal-to-metal heat paths or thermal short circuits between the metal sidewalls.

  10. Variably insulating portable heater/cooler

    DOEpatents

    Potter, T.F.

    1998-09-29

    A compact vacuum insulation panel is described comprising a chamber enclosed by two sheets of metal, glass-like spaces disposed in the chamber between the sidewalls, and a high-grade vacuum in the chamber includes apparatus and methods for enabling and disabling, or turning ``on`` and ``off`` the thermal insulating capability of the panel. One type of enabling and disabling apparatus and method includes a metal hydride for releasing hydrogen gas into the chamber in response to heat, and a hydrogen grate between the metal hydride and the chamber for selectively preventing and allowing return of the hydrogen gas to the metal hydride. Another type of enabling and disabling apparatus and method includes a variable emissivity coating on the sheets of metal in which the emissivity is controllably variable by heat or electricity. Still another type of enabling and disabling apparatus and method includes metal-to-metal contact devices that can be actuated to establish or break metal-to-metal heat paths or thermal short circuits between the metal sidewalls. 25 figs.

  11. Quantifying Variability of Avian Colours: Are Signalling Traits More Variable?

    PubMed Central

    Delhey, Kaspar; Peters, Anne

    2008-01-01

    Background Increased variability in sexually selected ornaments, a key assumption of evolutionary theory, is thought to be maintained through condition-dependence. Condition-dependent handicap models of sexual selection predict that (a) sexually selected traits show amplified variability compared to equivalent non-sexually selected traits, and since males are usually the sexually selected sex, that (b) males are more variable than females, and (c) sexually dimorphic traits more variable than monomorphic ones. So far these predictions have only been tested for metric traits. Surprisingly, they have not been examined for bright coloration, one of the most prominent sexual traits. This omission stems from computational difficulties: different types of colours are quantified on different scales precluding the use of coefficients of variation. Methodology/Principal Findings Based on physiological models of avian colour vision we develop an index to quantify the degree of discriminable colour variation as it can be perceived by conspecifics. A comparison of variability in ornamental and non-ornamental colours in six bird species confirmed (a) that those coloured patches that are sexually selected or act as indicators of quality show increased chromatic variability. However, we found no support for (b) that males generally show higher levels of variability than females, or (c) that sexual dichromatism per se is associated with increased variability. Conclusions/Significance We show that it is currently possible to realistically estimate variability of animal colours as perceived by them, something difficult to achieve with other traits. Increased variability of known sexually-selected/quality-indicating colours in the studied species, provides support to the predictions borne from sexual selection theory but the lack of increased overall variability in males or dimorphic colours in general indicates that sexual differences might not always be shaped by similar selective forces. PMID:18301766

  12. Framework for making better predictions by directly estimating variables' predictivity.

    PubMed

    Lo, Adeline; Chernoff, Herman; Zheng, Tian; Lo, Shaw-Hwa

    2016-12-13

    We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.

  13. Potential benefits of genomic selection on genetic gain of small ruminant breeding programs.

    PubMed

    Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Elsen, J M

    2013-08-01

    In conventional small ruminant breeding programs, only pedigree and phenotype records are used to make selection decisions but prospects of including genomic information are now under consideration. The objective of this study was to assess the potential benefits of genomic selection on the genetic gain in French sheep and goat breeding designs of today. Traditional and genomic scenarios were modeled with deterministic methods for 3 breeding programs. The models included decisional variables related to male selection candidates, progeny testing capacity, and economic weights that were optimized to maximize annual genetic gain (AGG) of i) a meat sheep breeding program that improved a meat trait of heritability (h(2)) = 0.30 and a maternal trait of h(2) = 0.09 and ii) dairy sheep and goat breeding programs that improved a milk trait of h(2) = 0.30. Values of ±0.20 of genetic correlation between meat and maternal traits were considered to study their effects on AGG. The Bulmer effect was accounted for and the results presented here are the averages of AGG after 10 generations of selection. Results showed that current traditional breeding programs provide an AGG of 0.095 genetic standard deviation (σa) for meat and 0.061 σa for maternal trait in meat breed and 0.147 σa and 0.120 σa in sheep and goat dairy breeds, respectively. By optimizing decisional variables, the AGG with traditional selection methods increased to 0.139 σa for meat and 0.096 σa for maternal traits in meat breeding programs and to 0.174 σa and 0.183 σa in dairy sheep and goat breeding programs, respectively. With a medium-sized reference population (nref) of 2,000 individuals, the best genomic scenarios gave an AGG that was 17.9% greater than with traditional selection methods with optimized values of decisional variables for combined meat and maternal traits in meat sheep, 51.7% in dairy sheep, and 26.2% in dairy goats. The superiority of genomic schemes increased with the size of the reference population and genomic selection gave the best results when nref > 1,000 individuals for dairy breeds and nref > 2,000 individuals for meat breed. Genetic correlation between meat and maternal traits had a large impact on the genetic gain of both traits. Changes in AGG due to correlation were greatest for low heritable maternal traits. As a general rule, AGG was increased both by optimizing selection designs and including genomic information.

  14. Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall

    NASA Astrophysics Data System (ADS)

    Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik

    2016-02-01

    Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.

  15. VARSEDIG: an algorithm for morphometric characters selection and statistical validation in morphological taxonomy.

    PubMed

    Guisande, Cástor; Vari, Richard P; Heine, Jürgen; García-Roselló, Emilio; González-Dacosta, Jacinto; Perez-Schofield, Baltasar J García; González-Vilas, Luis; Pelayo-Villamil, Patricia

    2016-09-12

    We present and discuss VARSEDIG, an algorithm which identifies the morphometric features that significantly discriminate two taxa and validates the morphological distinctness between them via a Monte-Carlo test. VARSEDIG is freely available as a function of the RWizard application PlotsR (http://www.ipez.es/RWizard) and as R package on CRAN. The variables selected by VARSEDIG with the overlap method were very similar to those selected by logistic regression and discriminant analysis, but overcomes some shortcomings of these methods. VARSEDIG is, therefore, a good alternative by comparison to current classical classification methods for identifying morphometric features that significantly discriminate a taxon and for validating its morphological distinctness from other taxa. As a demonstration of the potential of VARSEDIG for this purpose, we analyze morphological discrimination among some species of the Neotropical freshwater family Characidae.

  16. Generalized structural equations improve sexual-selection analyses

    PubMed Central

    Santini, Giacomo; Marchetti, Giovanni Maria; Focardi, Stefano

    2017-01-01

    Sexual selection is an intense evolutionary force, which operates through competition for the access to breeding resources. There are many cases where male copulatory success is highly asymmetric, and few males are able to sire most females. Two main hypotheses were proposed to explain this asymmetry: “female choice” and “male dominance”. The literature reports contrasting results. This variability may reflect actual differences among studied populations, but it may also be generated by methodological differences and statistical shortcomings in data analysis. A review of the statistical methods used so far in lek studies, shows a prevalence of Linear Models (LM) and Generalized Linear Models (GLM) which may be affected by problems in inferring cause-effect relationships; multi-collinearity among explanatory variables and erroneous handling of non-normal and non-continuous distributions of the response variable. In lek breeding, selective pressure is maximal, because large numbers of males and females congregate in small arenas. We used a dataset on lekking fallow deer (Dama dama), to contrast the methods and procedures employed so far, and we propose a novel approach based on Generalized Structural Equations Models (GSEMs). GSEMs combine the power and flexibility of both SEM and GLM in a unified modeling framework. We showed that LMs fail to identify several important predictors of male copulatory success and yields very imprecise parameter estimates. Minor variations in data transformation yield wide changes in results and the method appears unreliable. GLMs improved the analysis, but GSEMs provided better results, because the use of latent variables decreases the impact of measurement errors. Using GSEMs, we were able to test contrasting hypotheses and calculate both direct and indirect effects, and we reached a high precision of the estimates, which implies a high predictive ability. In synthesis, we recommend the use of GSEMs in studies on lekking behaviour, and we provide guidelines to implement these models. PMID:28809923

  17. The cross-validated AUC for MCP-logistic regression with high-dimensional data.

    PubMed

    Jiang, Dingfeng; Huang, Jian; Zhang, Ying

    2013-10-01

    We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

  18. Principal components analysis in clinical studies.

    PubMed

    Zhang, Zhongheng; Castelló, Adela

    2017-09-01

    In multivariate analysis, independent variables are usually correlated to each other which can introduce multicollinearity in the regression models. One approach to solve this problem is to apply principal components analysis (PCA) over these variables. This method uses orthogonal transformation to represent sets of potentially correlated variables with principal components (PC) that are linearly uncorrelated. PCs are ordered so that the first PC has the largest possible variance and only some components are selected to represent the correlated variables. As a result, the dimension of the variable space is reduced. This tutorial illustrates how to perform PCA in R environment, the example is a simulated dataset in which two PCs are responsible for the majority of the variance in the data. Furthermore, the visualization of PCA is highlighted.

  19. Space Operations Center orbit altitude selection strategy

    NASA Technical Reports Server (NTRS)

    Indrikis, J.; Myers, H. L.

    1982-01-01

    The strategy for the operational altitude selection has to respond to the Space Operation Center's (SOC) maintenance requirements and the logistics demands of the missions to be supported by the SOC. Three orbit strategies are developed: two are constant altitude, and one variable altitude. In order to minimize the effect of atmospheric uncertainty the dynamic altitude method is recommended. In this approach the SOC will operate at the optimum altitude for the prevailing atmospheric conditions and logistics model, provided that mission safety constraints are not violated. Over a typical solar activity cycle this method produces significant savings in the overall logistics cost.

  20. An image-based search for pulsars among Fermi unassociated LAT sources

    NASA Astrophysics Data System (ADS)

    Frail, D. A.; Ray, P. S.; Mooley, K. P.; Hancock, P.; Burnett, T. H.; Jagannathan, P.; Ferrara, E. C.; Intema, H. T.; de Gasperin, F.; Demorest, P. B.; Stovall, K.; McKinnon, M. M.

    2018-03-01

    We describe an image-based method that uses two radio criteria, compactness, and spectral index, to identify promising pulsar candidates among Fermi Large Area Telescope (LAT) unassociated sources. These criteria are applied to those radio sources from the Giant Metrewave Radio Telescope all-sky survey at 150 MHz (TGSS ADR1) found within the error ellipses of unassociated sources from the 3FGL catalogue and a preliminary source list based on 7 yr of LAT data. After follow-up interferometric observations to identify extended or variable sources, a list of 16 compact, steep-spectrum candidates is generated. An ongoing search for pulsations in these candidates, in gamma rays and radio, has found 6 ms pulsars and one normal pulsar. A comparison of this method with existing selection criteria based on gamma-ray spectral and variability properties suggests that the pulsar discovery space using Fermi may be larger than previously thought. Radio imaging is a hitherto underutilized source selection method that can be used, as with other multiwavelength techniques, in the search for Fermi pulsars.

  1. Resolving the Conflict Between Associative Overdominance and Background Selection

    PubMed Central

    Zhao, Lei; Charlesworth, Brian

    2016-01-01

    In small populations, genetic linkage between a polymorphic neutral locus and loci subject to selection, either against partially recessive mutations or in favor of heterozygotes, may result in an apparent selective advantage to heterozygotes at the neutral locus (associative overdominance) and a retardation of the rate of loss of variability by genetic drift at this locus. In large populations, selection against deleterious mutations has previously been shown to reduce variability at linked neutral loci (background selection). We describe analytical, numerical, and simulation studies that shed light on the conditions under which retardation vs. acceleration of loss of variability occurs at a neutral locus linked to a locus under selection. We consider a finite, randomly mating population initiated from an infinite population in equilibrium at a locus under selection. With mutation and selection, retardation occurs only when S, the product of twice the effective population size and the selection coefficient, is of order 1. With S >> 1, background selection always causes an acceleration of loss of variability. Apparent heterozygote advantage at the neutral locus is, however, always observed when mutations are partially recessive, even if there is an accelerated rate of loss of variability. With heterozygote advantage at the selected locus, loss of variability is nearly always retarded. The results shed light on experiments on the loss of variability at marker loci in laboratory populations and on the results of computer simulations of the effects of multiple selected loci on neutral variability. PMID:27182952

  2. Modification of the Integrated Sasang Constitutional Diagnostic Model

    PubMed Central

    Nam, Jiho

    2017-01-01

    In 2012, the Korea Institute of Oriental Medicine proposed an objective and comprehensive physical diagnostic model to address quantification problems in the existing Sasang constitutional diagnostic method. However, certain issues have been raised regarding a revision of the proposed diagnostic model. In this paper, we propose various methodological approaches to address the problems of the previous diagnostic model. Firstly, more useful variables are selected in each component. Secondly, the least absolute shrinkage and selection operator is used to reduce multicollinearity without the modification of explanatory variables. Thirdly, proportions of SC types and age are considered to construct individual diagnostic models and classify the training set and the test set for reflecting the characteristics of the entire dataset. Finally, an integrated model is constructed with explanatory variables of individual diagnosis models. The proposed integrated diagnostic model significantly improves the sensitivities for both the male SY type (36.4% → 62.0%) and the female SE type (43.7% → 64.5%), which were areas of limitation of the previous integrated diagnostic model. The ideas of these new algorithms are expected to contribute not only to the scientific development of Sasang constitutional medicine in Korea but also to that of other diagnostic methods for traditional medicine. PMID:29317897

  3. Design sensitivity analysis of rotorcraft airframe structures for vibration reduction

    NASA Technical Reports Server (NTRS)

    Murthy, T. Sreekanta

    1987-01-01

    Optimization of rotorcraft structures for vibration reduction was studied. The objective of this study is to develop practical computational procedures for structural optimization of airframes subject to steady-state vibration response constraints. One of the key elements of any such computational procedure is design sensitivity analysis. A method for design sensitivity analysis of airframes under vibration response constraints is presented. The mathematical formulation of the method and its implementation as a new solution sequence in MSC/NASTRAN are described. The results of the application of the method to a simple finite element stick model of the AH-1G helicopter airframe are presented and discussed. Selection of design variables that are most likely to bring about changes in the response at specified locations in the airframe is based on consideration of forced response strain energy. Sensitivity coefficients are determined for the selected design variable set. Constraints on the natural frequencies are also included in addition to the constraints on the steady-state response. Sensitivity coefficients for these constraints are determined. Results of the analysis and insights gained in applying the method to the airframe model are discussed. The general nature of future work to be conducted is described.

  4. Independence screening for high dimensional nonlinear additive ODE models with applications to dynamic gene regulatory networks.

    PubMed

    Xue, Hongqi; Wu, Shuang; Wu, Yichao; Ramirez Idarraga, Juan C; Wu, Hulin

    2018-05-02

    Mechanism-driven low-dimensional ordinary differential equation (ODE) models are often used to model viral dynamics at cellular levels and epidemics of infectious diseases. However, low-dimensional mechanism-based ODE models are limited for modeling infectious diseases at molecular levels such as transcriptomic or proteomic levels, which is critical to understand pathogenesis of diseases. Although linear ODE models have been proposed for gene regulatory networks (GRNs), nonlinear regulations are common in GRNs. The reconstruction of large-scale nonlinear networks from time-course gene expression data remains an unresolved issue. Here, we use high-dimensional nonlinear additive ODEs to model GRNs and propose a 4-step procedure to efficiently perform variable selection for nonlinear ODEs. To tackle the challenge of high dimensionality, we couple the 2-stage smoothing-based estimation method for ODEs and a nonlinear independence screening method to perform variable selection for the nonlinear ODE models. We have shown that our method possesses the sure screening property and it can handle problems with non-polynomial dimensionality. Numerical performance of the proposed method is illustrated with simulated data and a real data example for identifying the dynamic GRN of Saccharomyces cerevisiae. Copyright © 2018 John Wiley & Sons, Ltd.

  5. Flood-frequency prediction methods for unregulated streams of Tennessee, 2000

    USGS Publications Warehouse

    Law, George S.; Tasker, Gary D.

    2003-01-01

    Up-to-date flood-frequency prediction methods for unregulated, ungaged rivers and streams of Tennessee have been developed. Prediction methods include the regional-regression method and the newer region-of-influence method. The prediction methods were developed using stream-gage records from unregulated streams draining basins having from 1 percent to about 30 percent total impervious area. These methods, however, should not be used in heavily developed or storm-sewered basins with impervious areas greater than 10 percent. The methods can be used to estimate 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence-interval floods of most unregulated rural streams in Tennessee. A computer application was developed that automates the calculation of flood frequency for unregulated, ungaged rivers and streams of Tennessee. Regional-regression equations were derived by using both single-variable and multivariable regional-regression analysis. Contributing drainage area is the explanatory variable used in the single-variable equations. Contributing drainage area, main-channel slope, and a climate factor are the explanatory variables used in the multivariable equations. Deleted-residual standard error for the single-variable equations ranged from 32 to 65 percent. Deleted-residual standard error for the multivariable equations ranged from 31 to 63 percent. These equations are included in the computer application to allow easy comparison of results produced by the different methods. The region-of-influence method calculates multivariable regression equations for each ungaged site and recurrence interval using basin characteristics from 60 similar sites selected from the study area. Explanatory variables that may be used in regression equations computed by the region-of-influence method include contributing drainage area, main-channel slope, a climate factor, and a physiographic-region factor. Deleted-residual standard error for the region-of-influence method tended to be only slightly smaller than those for the regional-regression method and ranged from 27 to 62 percent.

  6. A Novel Group-Fused Sparse Partial Correlation Method for Simultaneous Estimation of Functional Networks in Group Comparison Studies.

    PubMed

    Liang, Xiaoyun; Vaughan, David N; Connelly, Alan; Calamante, Fernando

    2018-05-01

    The conventional way to estimate functional networks is primarily based on Pearson correlation along with classic Fisher Z test. In general, networks are usually calculated at the individual-level and subsequently aggregated to obtain group-level networks. However, such estimated networks are inevitably affected by the inherent large inter-subject variability. A joint graphical model with Stability Selection (JGMSS) method was recently shown to effectively reduce inter-subject variability, mainly caused by confounding variations, by simultaneously estimating individual-level networks from a group. However, its benefits might be compromised when two groups are being compared, given that JGMSS is blinded to other groups when it is applied to estimate networks from a given group. We propose a novel method for robustly estimating networks from two groups by using group-fused multiple graphical-lasso combined with stability selection, named GMGLASS. Specifically, by simultaneously estimating similar within-group networks and between-group difference, it is possible to address inter-subject variability of estimated individual networks inherently related with existing methods such as Fisher Z test, and issues related to JGMSS ignoring between-group information in group comparisons. To evaluate the performance of GMGLASS in terms of a few key network metrics, as well as to compare with JGMSS and Fisher Z test, they are applied to both simulated and in vivo data. As a method aiming for group comparison studies, our study involves two groups for each case, i.e., normal control and patient groups; for in vivo data, we focus on a group of patients with right mesial temporal lobe epilepsy.

  7. Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    PubMed Central

    Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.

    2014-01-01

    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289

  8. A family of variable step-size affine projection adaptive filter algorithms using statistics of channel impulse response

    NASA Astrophysics Data System (ADS)

    Shams Esfand Abadi, Mohammad; AbbasZadeh Arani, Seyed Ali Asghar

    2011-12-01

    This paper extends the recently introduced variable step-size (VSS) approach to the family of adaptive filter algorithms. This method uses prior knowledge of the channel impulse response statistic. Accordingly, optimal step-size vector is obtained by minimizing the mean-square deviation (MSD). The presented algorithms are the VSS affine projection algorithm (VSS-APA), the VSS selective partial update NLMS (VSS-SPU-NLMS), the VSS-SPU-APA, and the VSS selective regressor APA (VSS-SR-APA). In VSS-SPU adaptive algorithms the filter coefficients are partially updated which reduce the computational complexity. In VSS-SR-APA, the optimal selection of input regressors is performed during the adaptation. The presented algorithms have good convergence speed, low steady state mean square error (MSE), and low computational complexity features. We demonstrate the good performance of the proposed algorithms through several simulations in system identification scenario.

  9. Climate Change Impacts at Department of Defense

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kotamarthi, Rao; Wang, Jiali; Zoebel, Zach

    This project is aimed at providing the U.S. Department of Defense (DoD) with a comprehensive analysis of the uncertainty associated with generating climate projections at the regional scale that can be used by stakeholders and decision makers to quantify and plan for the impacts of future climate change at specific locations. The merits and limitations of commonly used downscaling models, ranging from simple to complex, are compared, and their appropriateness for application at installation scales is evaluated. Downscaled climate projections are generated at selected DoD installations using dynamic and statistical methods with an emphasis on generating probability distributions of climatemore » variables and their associated uncertainties. The sites selection and selection of variables and parameters for downscaling was based on a comprehensive understanding of the current and projected roles that weather and climate play in operating, maintaining, and planning DoD facilities and installations.« less

  10. Factor analysis of some socio-economic and demographic variables for Bangladesh.

    PubMed

    Islam, S M

    1986-01-01

    The author carries out an exploratory factor analysis of some socioeconomic and demographic variables for Bangladesh using the classical or common factor approach with the varimax rotation method. The socioeconomic and demographic indicators used in this study include literacy, rate of growth, female employment, economic development, urbanization, population density, childlessness, sex ratio, proportion of women ever married, and fertility. The 18 administrative districts of Bangladesh constitute the unit of analysis. 3 common factors--modernization, fertility, and social progress--are identified in this study to explain the correlations among the set of selected socioeconomic and demographic variables.

  11. Enhanced individual selection for selecting fast growing fish: the "PROSPER" method, with application on brown trout (Salmo trutta fario)

    PubMed Central

    Chevassus, Bernard; Quillet, Edwige; Krieg, Francine; Hollebecq, Marie-Gwénola; Mambrini, Muriel; Fauré, André; Labbé, Laurent; Hiseux, Jean-Pierre; Vandeputte, Marc

    2004-01-01

    Growth rate is the main breeding goal of fish breeders, but individual selection has often shown poor responses in fish species. The PROSPER method was developed to overcome possible factors that may contribute to this low success, using (1) a variable base population and high number of breeders (Ne > 100), (2) selection within groups with low non-genetic effects and (3) repeated growth challenges. Using calculations, we show that individual selection within groups, with appropriate management of maternal effects, can be superior to mass selection as soon as the maternal effect ratio exceeds 0.15, when heritability is 0.25. Practically, brown trout were selected on length at the age of one year with the PROSPER method. The genetic gain was evaluated against an unselected control line. After four generations, the mean response per generation in length at one year was 6.2% of the control mean, while the mean correlated response in weight was 21.5% of the control mean per generation. At the 4th generation, selected fish also appeared to be leaner than control fish when compared at the same size, and the response on weight was maximal (≈130% of the control mean) between 386 and 470 days post fertilisation. This high response is promising, however, the key points of the method have to be investigated in more detail. PMID:15496285

  12. Heterogeneity among violence-exposed women: applying person-oriented research methods.

    PubMed

    Nurius, Paula S; Macy, Rebecca J

    2008-03-01

    Variability of experience and outcomes among violence-exposed people pose considerable challenges toward developing effective prevention and treatment protocols. To address these needs, the authors present an approach to research and a class of methodologies referred to as person oriented. Person-oriented tools support assessment of meaningful patterns among people that distinguish one group from another, subgroups for whom different interventions are indicated. The authors review the conceptual base of person-oriented methods, outline their distinction from more familiar variable-oriented methods, present descriptions of selected methods as well as empirical applications of person-oriented methods germane to violence exposure, and conclude with discussion of implications for future research and translation between research and practice. The authors focus on violence against women as a population, drawing on stress and coping theory as a theoretical framework. However, person-oriented methods hold utility for investigating diversity among violence-exposed people's experiences and needs across populations and theoretical foundations.

  13. Mining Feature of Data Fusion in the Classification of Beer Flavor Information Using E-Tongue and E-Nose

    PubMed Central

    Men, Hong; Shi, Yan; Fu, Songlin; Jiao, Yanan; Qiao, Yu; Liu, Jingjing

    2017-01-01

    Multi-sensor data fusion can provide more comprehensive and more accurate analysis results. However, it also brings some redundant information, which is an important issue with respect to finding a feature-mining method for intuitive and efficient analysis. This paper demonstrates a feature-mining method based on variable accumulation to find the best expression form and variables’ behavior affecting beer flavor. First, e-tongue and e-nose were used to gather the taste and olfactory information of beer, respectively. Second, principal component analysis (PCA), genetic algorithm-partial least squares (GA-PLS), and variable importance of projection (VIP) scores were applied to select feature variables of the original fusion set. Finally, the classification models based on support vector machine (SVM), random forests (RF), and extreme learning machine (ELM) were established to evaluate the efficiency of the feature-mining method. The result shows that the feature-mining method based on variable accumulation obtains the main feature affecting beer flavor information, and the best classification performance for the SVM, RF, and ELM models with 96.67%, 94.44%, and 98.33% prediction accuracy, respectively. PMID:28753917

  14. Exact tests using two correlated binomial variables in contemporary cancer clinical trials.

    PubMed

    Yu, Jihnhee; Kepner, James L; Iyer, Renuka

    2009-12-01

    New therapy strategies for the treatment of cancer are rapidly emerging because of recent technology advances in genetics and molecular biology. Although newer targeted therapies can improve survival without measurable changes in tumor size, clinical trial conduct has remained nearly unchanged. When potentially efficacious therapies are tested, current clinical trial design and analysis methods may not be suitable for detecting therapeutic effects. We propose an exact method with respect to testing cytostatic cancer treatment using correlated bivariate binomial random variables to simultaneously assess two primary outcomes. The method is easy to implement. It does not increase the sample size over that of the univariate exact test and in most cases reduces the sample size required. Sample size calculations are provided for selected designs.

  15. Practical way to avoid spurious geometrical contributions in Brillouin light scattering experiments at variable scattering angles.

    PubMed

    Battistoni, Andrea; Bencivenga, Filippo; Fioretto, Daniele; Masciovecchio, Claudio

    2014-10-15

    In this Letter, we present a simple method to avoid the well-known spurious contributions in the Brillouin light scattering (BLS) spectrum arising from the finite aperture of collection optics. The method relies on the use of special spatial filters able to select the scattered light with arbitrary precision around a given value of the momentum transfer (Q). We demonstrate the effectiveness of such filters by analyzing the BLS spectra of a reference sample as a function of scattering angle. This practical and inexpensive method could be an extremely useful tool to fully exploit the potentiality of Brillouin acoustic spectroscopy, as it will easily allow for effective Q-variable experiments with unparalleled luminosity and resolution.

  16. Gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and S-plot using microarray data.

    PubMed

    Yang, Mingxing; Li, Xiumin; Li, Zhibin; Ou, Zhimin; Liu, Ming; Liu, Suhuan; Li, Xuejun; Yang, Shuyu

    2013-01-01

    DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes. Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub's leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.

  17. Comparison of two methods for estimating base flow in selected reaches of the South Platte River, Colorado

    USGS Publications Warehouse

    Capesius, Joseph P.; Arnold, L. Rick

    2012-01-01

    The Mass Balance results were quite variable over time such that they appeared suspect with respect to the concept of groundwater flow as being gradual and slow. The large degree of variability in the day-to-day and month-to-month Mass Balance results is likely the result of many factors. These factors could include ungaged stream inflows or outflows, short-term streamflow losses to and gains from temporary bank storage, and any lag in streamflow accounting owing to streamflow lag time of flow within a reach. The Pilot Point time series results were much less variable than the Mass Balance results and extreme values were effectively constrained. Less day-to-day variability, smaller magnitude extreme values, and smoother transitions in base-flow estimates provided by the Pilot Point method are more consistent with a conceptual model of groundwater flow being gradual and slow. The Pilot Point method provided a better fit to the conceptual model of groundwater flow and appeared to provide reasonable estimates of base flow.

  18. Boosting for detection of gene-environment interactions.

    PubMed

    Pashova, H; LeBlanc, M; Kooperberg, C

    2013-01-30

    In genetic association studies, it is typically thought that genetic variants and environmental variables jointly will explain more of the inheritance of a phenotype than either of these two components separately. Traditional methods to identify gene-environment interactions typically consider only one measured environmental variable at a time. However, in practice, multiple environmental factors may each be imprecise surrogates for the underlying physiological process that actually interacts with the genetic factors. In this paper, we develop a variant of L(2) boosting that is specifically designed to identify combinations of environmental variables that jointly modify the effect of a gene on a phenotype. Because the effect modifiers might have a small signal compared with the main effects, working in a space that is orthogonal to the main predictors allows us to focus on the interaction space. In a simulation study that investigates some plausible underlying model assumptions, our method outperforms the least absolute shrinkage and selection and Akaike Information Criterion and Bayesian Information Criterion model selection procedures as having the lowest test error. In an example for the Women's Health Initiative-Population Architecture using Genomics and Epidemiology study, the dedicated boosting method was able to pick out two single-nucleotide polymorphisms for which effect modification appears present. The performance was evaluated on an independent test set, and the results are promising. Copyright © 2012 John Wiley & Sons, Ltd.

  19. The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.

    PubMed

    Cheng, Qiang; Zhou, Hongbo; Cheng, Jie

    2011-06-01

    Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.

  20. Optimization and design of ibuprofen-loaded nanostructured lipid carriers using a hybrid-design approach for ocular drug delivery

    NASA Astrophysics Data System (ADS)

    Rathod, Vishal

    The objective of the present project was to develop the Ibuprofen-loaded Nanostructured Lipid Carrier (IBU-NLCs) for topical ocular delivery based on substantial pre-formulation screening of the components and understanding the interplay between the formulation and process variables. The BCS Class II drug: Ibuprofen was selected as the model drug for the current study. IBU-NLCs were prepared by melt emulsification and ultrasonication technique. Extensive pre-formulation studies were performed to screen the lipid components (solid and liquid) based on drug's solubility and affinity as well as components compatibility. The results from DSC & XRD assisted in selecting the most suitable ratio to be utilized for future studies. DynasanRTM 114 was selected as the solid lipid & MiglyolRTM 840 was selected as the liquid lipid based on preliminary lipid screening. The ratio of 6:4 was predicted to be the best based on its crystallinity index and the thermal events. As there are many variables involved for further optimization of the formulation, a single design approach is not always adequate. A hybrid-design approach was applied by employing the Plackett Burman design (PBD) for preliminary screening of 7 critical variables, followed by Box-Behnken design (BBD), a sub-type of response surface methodology (RSM) design using 2 relatively significant variables from the former design and incorporating Surfactant/Co-surfactant ratio as the third variable. Comparatively, KolliphorRTM HS15 demonstrated lower Mean Particle Size (PS) & Polydispersity Index (PDI) and KolliphorRTM P188 resulted in Zeta Potential (ZP) < -20 mV during the surfactant screening & stability studies. Hence, Surfactant/Cosurfactant ratio was employed as the third variable to understand its synergistic effect on the response variables. We selected PS, PDI, and ZP as critical response variables in the PBD since they significantly influence the stability & performance of NLCs. Formulations prepared using BBD were further characterized and evaluated concerning PS, PDI, ZP and Entrapment Efficiency (EE) to identify the multi-factor interactions between selected formulation variables. In vitro release studies were performed using Spectra/por dialysis membrane on Franz diffusion cell and Phosphate Saline buffer (7.4 pH) as the medium. Samples for assay, EE, Loading Capacity (LC), Solubility studies & in-vitro release were filtered using Amicon 50K and analyzed via UPLC system (Waters) at a detection wavelength of 220 nm. Significant variables were selected through PBD, and the third variable was incorporated based on surfactant screening & stability studies for the next design. Assay of the BBD based formulations was found to be within 95-104% of the theoretically calculated values. Further studies were investigated for PS, PDI, ZP & EE. PS was found to be in the range of 103-194 nm with PDI ranging from 0.118 to 0.265. The ZP and EE were observed to be in the range of -22.2 to -11 mV & 90 to 98.7 % respectively. Drug release of 30% was observed from the optimized formulation in the first 6 hr of in-vitro studies, and the drug release showed a sustained release of ibuprofen thereafter over several hours. These values also confirm that the production method, and all other selected variables, effectively promoted the incorporation of ibuprofen in NLC. Quality by Design (QbD) approach was successfully implemented in developing a robust ophthalmic formulation with superior physicochemical and morphometric properties. NLCs as the nanocarrier demonstrated promising perspective for topical delivery of poorly water-soluble drugs.

  1. Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.

    PubMed

    Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G

    2016-01-01

    Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks.

  2. Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data

    PubMed Central

    Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.

    2016-01-01

    Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732

  3. Microbiome Selection Could Spur Next-Generation Plant Breeding Strategies.

    PubMed

    Gopal, Murali; Gupta, Alka

    2016-01-01

    " No plant is an island too …" Plants, though sessile, have developed a unique strategy to counter biotic and abiotic stresses by symbiotically co-evolving with microorganisms and tapping into their genome for this purpose. Soil is the bank of microbial diversity from which a plant selectively sources its microbiome to suit its needs. Besides soil, seeds, which carry the genetic blueprint of plants during trans-generational propagation, are home to diverse microbiota that acts as the principal source of microbial inoculum in crop cultivation. Overall, a plant is ensconced both on the outside and inside with a diverse assemblage of microbiota. Together, the plant genome and the genes of the microbiota that the plant harbors in different plant tissues, i.e., the 'plant microbiome,' form the holobiome which is now considered as unit of selection: 'the holobiont.' The 'plant microbiome' not only helps plants to remain fit but also offers critical genetic variability, hitherto, not employed in the breeding strategy by plant breeders, who traditionally have exploited the genetic variability of the host for developing high yielding or disease tolerant or drought resistant varieties. This fresh knowledge of the microbiome, particularly of the rhizosphere, offering genetic variability to plants, opens up new horizons for breeding that could usher in cultivation of next-generation crops depending less on inorganic inputs, resistant to insect pest and diseases and resilient to climatic perturbations. We surmise, from ever increasing evidences, that plants and their microbial symbionts need to be co-propagated as life-long partners in future strategies for plant breeding. In this perspective, we propose bottom-up approach to co-propagate the co-evolved, the plant along with the target microbiome, through - (i) reciprocal soil transplantation method, or (ii) artificial ecosystem selection method of synthetic microbiome inocula, or (iii) by exploration of microRNA transfer method - for realizing this next-generation plant breeding approach. Our aim, thus, is to bring closer the information accrued through the advanced nucleotide sequencing and bioinformatics in conjunction with conventional culture-dependent isolation method for practical application in plant breeding and overall agriculture.

  4. Selecting a proper design period for heliostat field layout optimization using Campo code

    NASA Astrophysics Data System (ADS)

    Saghafifar, Mohammad; Gadalla, Mohamed

    2016-09-01

    In this paper, different approaches are considered to calculate the cosine factor which is utilized in Campo code to expand the heliostat field layout and maximize its annual thermal output. Furthermore, three heliostat fields containing different number of mirrors are taken into consideration. Cosine factor is determined by considering instantaneous and time-average approaches. For instantaneous method, different design days and design hours are selected. For the time average method, daily time average, monthly time average, seasonally time average, and yearly time averaged cosine factor determinations are considered. Results indicate that instantaneous methods are more appropriate for small scale heliostat field optimization. Consequently, it is proposed to consider the design period as the second design variable to ensure the best outcome. For medium and large scale heliostat fields, selecting an appropriate design period is more important. Therefore, it is more reliable to select one of the recommended time average methods to optimize the field layout. Optimum annual weighted efficiency for heliostat fields (small, medium, and large) containing 350, 1460, and 3450 mirrors are 66.14%, 60.87%, and 54.04%, respectively.

  5. Dry and wet arc track propagation resistance testing

    NASA Technical Reports Server (NTRS)

    Beach, Rex

    1995-01-01

    The wet arc-propagation resistance test for wire insulation provides an assessment of the ability of an insulation to prevent damage in an electrical environment. Results of an arc-propagation test may vary slightly due to the method of arc initiation; therefore a standard test method must be selected to evaluate the general arc-propagation resistance characteristics of an insulation. This test method initiates an arc by dripping salt water over pre-damaged wires which creates a conductive path between the wires. The power supply, test current, circuit resistances, and other variables are optimized for testing 20 guage wires. The use of other wire sizes may require modifications to the test variables. The dry arc-propagation resistance test for wire insulation also provides an assessment of the ability of an insulation to prevent damage in an electrical arc environment. In service, electrical arcs may originate form a variety of factors including insulation deterioration, faulty installation, and chafing. Here too, a standard test method must be selected to evaluate the general arc-propagation resistance characteristics of an insulation. This test method initiates an arc with a vibrating blade. The test also evaluates the ability of the insulation to prevent further arc-propagation when the electrical arc is re-energized.

  6. Framework for making better predictions by directly estimating variables’ predictivity

    PubMed Central

    Chernoff, Herman; Lo, Shaw-Hwa

    2016-01-01

    We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the I-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the I-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the I-score on real data to demonstrate the statistic’s predictive performance on sample data. We conjecture that using the partition retention and I-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired. PMID:27911830

  7. Firefly algorithm versus genetic algorithm as powerful variable selection tools and their effect on different multivariate calibration models in spectroscopy: A comparative study

    NASA Astrophysics Data System (ADS)

    Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed

    2017-01-01

    For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration.

  8. tICA-Metadynamics: Accelerating Metadynamics by Using Kinetically Selected Collective Variables.

    PubMed

    M Sultan, Mohammad; Pande, Vijay S

    2017-06-13

    Metadynamics is a powerful enhanced molecular dynamics sampling method that accelerates simulations by adding history-dependent multidimensional Gaussians along selective collective variables (CVs). In practice, choosing a small number of slow CVs remains challenging due to the inherent high dimensionality of biophysical systems. Here we show that time-structure based independent component analysis (tICA), a recent advance in Markov state model literature, can be used to identify a set of variationally optimal slow coordinates for use as CVs for Metadynamics. We show that linear and nonlinear tICA-Metadynamics can complement existing MD studies by explicitly sampling the system's slowest modes and can even drive transitions along the slowest modes even when no such transitions are observed in unbiased simulations.

  9. Explaining the Positive Relationship between Fourth-Grade Children’s Body Mass Index and Energy Intake at School-Provided Meals (Breakfast and Lunch)

    PubMed Central

    Baxter, Suzanne Domel; Royer, Julie A.; Hitchcock, David B.

    2013-01-01

    BACKGROUND A positive relationship exists between children’s body mass index (BMI) and energy intake at school-provided meals. To help explain this relationship, we investigated 7 outcome variables concerning aspects of school-provided meals—energy content of items selected, number of meal components selected, number of meal components eaten, amounts eaten of standardized school-meal portions, energy intake from flavored milk, energy intake received in trades, and energy content given in trades. METHODS We observed children in grade 4 (N=465) eating school-provided breakfast and lunch on one to 4 days per child. We measured children’s weight and height. For daily values at school meals, a generalized linear model was fit with BMI (dependent variable) and the 7 outcome variables, sex, and age (independent variables). RESULTS BMI was positively related to amounts eaten of standardized school-meal portions (p < .0001) and increased 8.45 kg/m2 per serving, controlling for other variables in the model. BMI was positively related to energy intake from flavored milk (p = .0041) and increased 0.347 kg/m2 for every 100-kcal consumed. BMI was negatively related to energy intake received in trades (p = .0003) and decreased 0.468 kg/m2 for every 100-kcal received. BMI was not significantly related to 4 outcome variables. CONCLUSIONS Knowing that relationships between BMI and actual consumption, not selection, at school-provided meals explained the (previously found) positive relationship between BMI and energy intake at school-provided meals is helpful for school-based obesity interventions. PMID:23517000

  10. Recursive regularization for inferring gene networks from time-course gene expression profiles

    PubMed Central

    Shimamura, Teppei; Imoto, Seiya; Yamaguchi, Rui; Fujita, André; Nagasaki, Masao; Miyano, Satoru

    2009-01-01

    Background Inferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives. Results By incorporating relative importance of the VAR coefficients into the elastic net, we propose a new class of regularization, called recursive elastic net, to increase the capability of the elastic net and estimate gene networks based on the VAR model. The recursive elastic net can reduce the number of false positives gradually by updating the importance. Numerical simulations and comparisons demonstrate that the proposed method succeeds in reducing the number of false positives drastically while keeping the high number of true positives in the network inference and achieves two or more times higher true discovery rate (the proportion of true positives among the selected edges) than the competing methods even when the number of time points is small. We also compared our method with various reverse-engineering algorithms on experimental data of MCF-7 breast cancer cells stimulated with two ErbB ligands, EGF and HRG. Conclusion The recursive elastic net is a powerful tool for inferring gene networks from time-course gene expression profiles. PMID:19386091

  11. Variable Selection for Road Segmentation in Aerial Images

    NASA Astrophysics Data System (ADS)

    Warnke, S.; Bulatov, D.

    2017-05-01

    For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.

  12. On the period determination of ASAS eclipsing binaries

    NASA Astrophysics Data System (ADS)

    Mayangsari, L.; Priyatikanto, R.; Putra, M.

    2014-03-01

    Variable stars, or particularly eclipsing binaries, are very essential astronomical occurrence. Surveys are the backbone of astronomy, and many discoveries of variable stars are the results of surveys. All-Sky Automated Survey (ASAS) is one of the observing projects whose ultimate goal is photometric monitoring of variable stars. Since its first light in 1997, ASAS has collected 50,099 variable stars, with 11,076 eclipsing binaries among them. In the present work we focus on the period determination of the eclipsing binaries. Since the number of data points in each ASAS eclipsing binary light curve is sparse, period determination of any system is a not straightforward process. For 30 samples of such systems we compare the implementation of Lomb-Scargle algorithm which is an Fast Fourier Transform (FFT) basis and Phase Dispersion Minimization (PDM) method which is non-FFT basis to determine their period. It is demonstrated that PDM gives better performance at handling eclipsing detached (ED) systems whose variability are non-sinusoidal. More over, using semi-automatic recipes, we get better period solution and satisfactorily improve 53% of the selected object's light curves, but failed against another 7% of selected objects. In addition, we also highlight 4 interesting objects for further investigation.

  13. Sparse High Dimensional Models in Economics

    PubMed Central

    Fan, Jianqing; Lv, Jinchi; Qi, Lei

    2010-01-01

    This paper reviews the literature on sparse high dimensional models and discusses some applications in economics and finance. Recent developments of theory, methods, and implementations in penalized least squares and penalized likelihood methods are highlighted. These variable selection methods are proved to be effective in high dimensional sparse modeling. The limits of dimensionality that regularization methods can handle, the role of penalty functions, and their statistical properties are detailed. Some recent advances in ultra-high dimensional sparse modeling are also briefly discussed. PMID:22022635

  14. Development of a Robust Identifier for NPPs Transients Combining ARIMA Model and EBP Algorithm

    NASA Astrophysics Data System (ADS)

    Moshkbar-Bakhshayesh, Khalil; Ghofrani, Mohammad B.

    2014-08-01

    This study introduces a novel identification method for recognition of nuclear power plants (NPPs) transients by combining the autoregressive integrated moving-average (ARIMA) model and the neural network with error backpropagation (EBP) learning algorithm. The proposed method consists of three steps. First, an EBP based identifier is adopted to distinguish the plant normal states from the faulty ones. In the second step, ARIMA models use integrated (I) process to convert non-stationary data of the selected variables into stationary ones. Subsequently, ARIMA processes, including autoregressive (AR), moving-average (MA), or autoregressive moving-average (ARMA) are used to forecast time series of the selected plant variables. In the third step, for identification the type of transients, the forecasted time series are fed to the modular identifier which has been developed using the latest advances of EBP learning algorithm. Bushehr nuclear power plant (BNPP) transients are probed to analyze the ability of the proposed identifier. Recognition of transient is based on similarity of its statistical properties to the reference one, rather than the values of input patterns. More robustness against noisy data and improvement balance between memorization and generalization are salient advantages of the proposed identifier. Reduction of false identification, sole dependency of identification on the sign of each output signal, selection of the plant variables for transients training independent of each other, and extendibility for identification of more transients without unfavorable effects are other merits of the proposed identifier.

  15. Variable selection in a flexible parametric mixture cure model with interval-censored data.

    PubMed

    Scolas, Sylvie; El Ghouch, Anouar; Legrand, Catherine; Oulhaj, Abderrahim

    2016-03-30

    In standard survival analysis, it is generally assumed that every individual will experience someday the event of interest. However, this is not always the case, as some individuals may not be susceptible to this event. Also, in medical studies, it is frequent that patients come to scheduled interviews and that the time to the event is only known to occur between two visits. That is, the data are interval-censored with a cure fraction. Variable selection in such a setting is of outstanding interest. Covariates impacting the survival are not necessarily the same as those impacting the probability to experience the event. The objective of this paper is to develop a parametric but flexible statistical model to analyze data that are interval-censored and include a fraction of cured individuals when the number of potential covariates may be large. We use the parametric mixture cure model with an accelerated failure time regression model for the survival, along with the extended generalized gamma for the error term. To overcome the issue of non-stable and non-continuous variable selection procedures, we extend the adaptive LASSO to our model. By means of simulation studies, we show good performance of our method and discuss the behavior of estimates with varying cure and censoring proportion. Lastly, our proposed method is illustrated with a real dataset studying the time until conversion to mild cognitive impairment, a possible precursor of Alzheimer's disease. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  16. Novel high-resolution computed tomography-based radiomic classifier for screen-identified pulmonary nodules in the National Lung Screening Trial.

    PubMed

    Peikert, Tobias; Duan, Fenghai; Rajagopalan, Srinivasan; Karwoski, Ronald A; Clay, Ryan; Robb, Richard A; Qin, Ziling; Sicks, JoRean; Bartholmai, Brian J; Maldonado, Fabien

    2018-01-01

    Optimization of the clinical management of screen-detected lung nodules is needed to avoid unnecessary diagnostic interventions. Herein we demonstrate the potential value of a novel radiomics-based approach for the classification of screen-detected indeterminate nodules. Independent quantitative variables assessing various radiologic nodule features such as sphericity, flatness, elongation, spiculation, lobulation and curvature were developed from the NLST dataset using 726 indeterminate nodules (all ≥ 7 mm, benign, n = 318 and malignant, n = 408). Multivariate analysis was performed using least absolute shrinkage and selection operator (LASSO) method for variable selection and regularization in order to enhance the prediction accuracy and interpretability of the multivariate model. The bootstrapping method was then applied for the internal validation and the optimism-corrected AUC was reported for the final model. Eight of the originally considered 57 quantitative radiologic features were selected by LASSO multivariate modeling. These 8 features include variables capturing Location: vertical location (Offset carina centroid z), Size: volume estimate (Minimum enclosing brick), Shape: flatness, Density: texture analysis (Score Indicative of Lesion/Lung Aggression/Abnormality (SILA) texture), and surface characteristics: surface complexity (Maximum shape index and Average shape index), and estimates of surface curvature (Average positive mean curvature and Minimum mean curvature), all with P<0.01. The optimism-corrected AUC for these 8 features is 0.939. Our novel radiomic LDCT-based approach for indeterminate screen-detected nodule characterization appears extremely promising however independent external validation is needed.

  17. Attributing uncertainty in streamflow simulations due to variable inputs via the Quantile Flow Deviation metric

    NASA Astrophysics Data System (ADS)

    Shoaib, Syed Abu; Marshall, Lucy; Sharma, Ashish

    2018-06-01

    Every model to characterise a real world process is affected by uncertainty. Selecting a suitable model is a vital aspect of engineering planning and design. Observation or input errors make the prediction of modelled responses more uncertain. By way of a recently developed attribution metric, this study is aimed at developing a method for analysing variability in model inputs together with model structure variability to quantify their relative contributions in typical hydrological modelling applications. The Quantile Flow Deviation (QFD) metric is used to assess these alternate sources of uncertainty. The Australian Water Availability Project (AWAP) precipitation data for four different Australian catchments is used to analyse the impact of spatial rainfall variability on simulated streamflow variability via the QFD. The QFD metric attributes the variability in flow ensembles to uncertainty associated with the selection of a model structure and input time series. For the case study catchments, the relative contribution of input uncertainty due to rainfall is higher than that due to potential evapotranspiration, and overall input uncertainty is significant compared to model structure and parameter uncertainty. Overall, this study investigates the propagation of input uncertainty in a daily streamflow modelling scenario and demonstrates how input errors manifest across different streamflow magnitudes.

  18. A review of covariate selection for non-experimental comparative effectiveness research.

    PubMed

    Sauer, Brian C; Brookhart, M Alan; Roy, Jason; VanderWeele, Tyler

    2013-11-01

    This paper addresses strategies for selecting variables for adjustment in non-experimental comparative effectiveness research and uses causal graphs to illustrate the causal network that relates treatment to outcome. Variables in the causal network take on multiple structural forms. Adjustment for a common cause pathway between treatment and outcome can remove confounding, whereas adjustment for other structural types may increase bias. For this reason, variable selection would ideally be based on an understanding of the causal network; however, the true causal network is rarely known. Therefore, we describe more practical variable selection approaches based on background knowledge when the causal structure is only partially known. These approaches include adjustment for all observed pretreatment variables thought to have some connection to the outcome, all known risk factors for the outcome, and all direct causes of the treatment or the outcome. Empirical approaches, such as forward and backward selection and automatic high-dimensional proxy adjustment, are also discussed. As there is a continuum between knowing and not knowing the causal, structural relations of variables, we recommend addressing variable selection in a practical way that involves a combination of background knowledge and empirical selection and that uses high-dimensional approaches. This empirical approach can be used to select from a set of a priori variables based on the researcher's knowledge to be included in the final analysis or to identify additional variables for consideration. This more limited use of empirically derived variables may reduce confounding while simultaneously reducing the risk of including variables that may increase bias. Copyright © 2013 John Wiley & Sons, Ltd.

  19. A Review of Covariate Selection for Nonexperimental Comparative Effectiveness Research

    PubMed Central

    Sauer, Brian C.; Brookhart, Alan; Roy, Jason; Vanderweele, Tyler

    2014-01-01

    This paper addresses strategies for selecting variables for adjustment in non-experimental comparative effectiveness research (CER), and uses causal graphs to illustrate the causal network that relates treatment to outcome. Variables in the causal network take on multiple structural forms. Adjustment for on a common cause pathway between treatment and outcome can remove confounding, while adjustment for other structural types may increase bias. For this reason variable selection would ideally be based on an understanding of the causal network; however, the true causal network is rarely know. Therefore, we describe more practical variable selection approaches based on background knowledge when the causal structure is only partially known. These approaches include adjustment for all observed pretreatment variables thought to have some connection to the outcome, all known risk factors for the outcome, and all direct causes of the treatment or the outcome. Empirical approaches, such as forward and backward selection and automatic high-dimensional proxy adjustment, are also discussed. As there is a continuum between knowing and not knowing the causal, structural relations of variables, we recommend addressing variable selection in a practical way that involves a combination of background knowledge and empirical selection and that uses the high-dimensional approaches. This empirical approach can be used to select from a set of a priori variables based on the researcher’s knowledge to be included in the final analysis or to identify additional variables for consideration. This more limited use of empirically-derived variables may reduce confounding while simultaneously reducing the risk of including variables that may increase bias. PMID:24006330

  20. The Determinants of College Student Retention

    ERIC Educational Resources Information Center

    Guerrero, Adam A.

    2010-01-01

    This study attempts to add to the college student dropout literature by examining persistence decisions at private, non-selective university using previously unstudied explanatory variables and advanced econometric methods. Three main contributions are provided. First, proprietary data obtained from a type of university that is underrepresented in…

  1. Labor Productivity Standards in Texas School Foodservice Operations

    ERIC Educational Resources Information Center

    Sherrin, A. Rachelle; Bednar, Carolyn; Kwon, Junehee

    2009-01-01

    Purpose: Purpose of this research was to investigate utilization of labor productivity standards and variables that affect productivity in Texas school foodservice operations. Methods: A questionnaire was developed, validated, and pilot tested, then mailed to 200 randomly selected Texas school foodservice directors. Descriptive statistics for…

  2. Back propagation artificial neural network for community Alzheimer's disease screening in China.

    PubMed

    Tang, Jun; Wu, Lei; Huang, Helang; Feng, Jiang; Yuan, Yefeng; Zhou, Yueping; Huang, Peng; Xu, Yan; Yu, Chao

    2013-01-25

    Alzheimer's disease patients diagnosed with the Chinese Classification of Mental Disorders diagnostic criteria were selected from the community through on-site sampling. Levels of macro and trace elements were measured in blood samples using an atomic absorption method, and neurotransmitters were measured using a radioimmunoassay method. SPSS 13.0 was used to establish a database, and a back propagation artificial neural network for Alzheimer's disease prediction was simulated using Clementine 12.0 software. With scores of activities of daily living, creatinine, 5-hydroxytryptamine, age, dopamine and aluminum as input variables, the results revealed that the area under the curve in our back propagation artificial neural network was 0.929 (95% confidence interval: 0.868-0.968), sensitivity was 90.00%, specificity was 95.00%, and accuracy was 92.50%. The findings indicated that the results of back propagation artificial neural network established based on the above six variables were satisfactory for screening and diagnosis of Alzheimer's disease in patients selected from the community.

  3. Back propagation artificial neural network for community Alzheimer's disease screening in China★

    PubMed Central

    Tang, Jun; Wu, Lei; Huang, Helang; Feng, Jiang; Yuan, Yefeng; Zhou, Yueping; Huang, Peng; Xu, Yan; Yu, Chao

    2013-01-01

    Alzheimer's disease patients diagnosed with the Chinese Classification of Mental Disorders diagnostic criteria were selected from the community through on-site sampling. Levels of macro and trace elements were measured in blood samples using an atomic absorption method, and neurotransmitters were measured using a radioimmunoassay method. SPSS 13.0 was used to establish a database, and a back propagation artificial neural network for Alzheimer's disease prediction was simulated using Clementine 12.0 software. With scores of activities of daily living, creatinine, 5-hydroxytryptamine, age, dopamine and aluminum as input variables, the results revealed that the area under the curve in our back propagation artificial neural network was 0.929 (95% confidence interval: 0.868–0.968), sensitivity was 90.00%, specificity was 95.00%, and accuracy was 92.50%. The findings indicated that the results of back propagation artificial neural network established based on the above six variables were satisfactory for screening and diagnosis of Alzheimer's disease in patients selected from the community. PMID:25206598

  4. A selective review of the first 20 years of instrumental variables models in health-services research and medicine.

    PubMed

    Cawley, John

    2015-01-01

    The method of instrumental variables (IV) is useful for estimating causal effects. Intuitively, it exploits exogenous variation in the treatment, sometimes called natural experiments or instruments. This study reviews the literature in health-services research and medical research that applies the method of instrumental variables, documents trends in its use, and offers examples of various types of instruments. A literature search of the PubMed and EconLit research databases for English-language journal articles published after 1990 yielded a total of 522 original research articles. Citations counts for each article were derived from the Web of Science. A selective review was conducted, with articles prioritized based on number of citations, validity and power of the instrument, and type of instrument. The average annual number of papers in health services research and medical research that apply the method of instrumental variables rose from 1.2 in 1991-1995 to 41.8 in 2006-2010. Commonly-used instruments (natural experiments) in health and medicine are relative distance to a medical care provider offering the treatment and the medical care provider's historic tendency to administer the treatment. Less common but still noteworthy instruments include randomization of treatment for reasons other than research, randomized encouragement to undertake the treatment, day of week of admission as an instrument for waiting time for surgery, and genes as an instrument for whether the respondent has a heritable condition. The use of the method of IV has increased dramatically in the past 20 years, and a wide range of instruments have been used. Applications of the method of IV have in several cases upended conventional wisdom that was based on correlations and led to important insights about health and healthcare. Future research should pursue new applications of existing instruments and search for new instruments that are powerful and valid.

  5. Developing the formula for state subsidies for health care in Finland.

    PubMed

    Häkkinen, Unto; Järvelin, Jutta

    2004-01-01

    The aim was to generate a research-based proposal for a new subsidy formula for municipal healthcare services in Finland. Small-area data on potential need variables, supply of and access to services, and age-, sex- and case-mix-standardised service utilisation per capita were used. Utilisation was regressed in order to identify need variables and the cost weights for the selected need variables were subsequently derived using various multilevel models and structural equation methods. The variables selected for the subsidy formula were as follows: age- and sex-standardised mortality (age under 65 years) and income for outpatient primary health services; age- and sex-standardised mortality (all ages) and index of overcrowded housing for elderly care and long-term inpatient care; index of disability pensions for those aged 15-55 years and migration for specialised non-psychiatric care; and index of living alone and income for psychiatric care. Decisions on the amount of state subsidies can be divided into three stages, of which the first two are mainly political and the third is based on the results of this study.

  6. Quantitative effects of composting state variables on C/N ratio through GA-aided multivariate analysis.

    PubMed

    Sun, Wei; Huang, Guo H; Zeng, Guangming; Qin, Xiaosheng; Yu, Hui

    2011-03-01

    It is widely known that variation of the C/N ratio is dependent on many state variables during composting processes. This study attempted to develop a genetic algorithm aided stepwise cluster analysis (GASCA) method to describe the nonlinear relationships between the selected state variables and the C/N ratio in food waste composting. The experimental data from six bench-scale composting reactors were used to demonstrate the applicability of GASCA. Within the GASCA framework, GA searched optimal sets of both specified state variables and SCA's internal parameters; SCA established statistical nonlinear relationships between state variables and the C/N ratio; to avoid unnecessary and time-consuming calculation, a proxy table was introduced to save around 70% computational efforts. The obtained GASCA cluster trees had smaller sizes and higher prediction accuracy than the conventional SCA trees. Based on the optimal GASCA tree, the effects of the GA-selected state variables on the C/N ratio were ranged in a descending order as: NH₄+-N concentration>Moisture content>Ash Content>Mean Temperature>Mesophilic bacteria biomass. Such a rank implied that the variation of ammonium nitrogen concentration, the associated temperature and the moisture conditions, the total loss of both organic matters and available mineral constituents, and the mesophilic bacteria activity, were critical factors affecting the C/N ratio during the investigated food waste composting. This first application of GASCA to composting modelling indicated that more direct search algorithms could be coupled with SCA or other multivariate analysis methods to analyze complicated relationships during composting and many other environmental processes. Copyright © 2010 Elsevier B.V. All rights reserved.

  7. Bayesian LASSO, scale space and decision making in association genetics.

    PubMed

    Pasanen, Leena; Holmström, Lasse; Sillanpää, Mikko J

    2015-01-01

    LASSO is a penalized regression method that facilitates model fitting in situations where there are as many, or even more explanatory variables than observations, and only a few variables are relevant in explaining the data. We focus on the Bayesian version of LASSO and consider four problems that need special attention: (i) controlling false positives, (ii) multiple comparisons, (iii) collinearity among explanatory variables, and (iv) the choice of the tuning parameter that controls the amount of shrinkage and the sparsity of the estimates. The particular application considered is association genetics, where LASSO regression can be used to find links between chromosome locations and phenotypic traits in a biological organism. However, the proposed techniques are relevant also in other contexts where LASSO is used for variable selection. We separate the true associations from false positives using the posterior distribution of the effects (regression coefficients) provided by Bayesian LASSO. We propose to solve the multiple comparisons problem by using simultaneous inference based on the joint posterior distribution of the effects. Bayesian LASSO also tends to distribute an effect among collinear variables, making detection of an association difficult. We propose to solve this problem by considering not only individual effects but also their functionals (i.e. sums and differences). Finally, whereas in Bayesian LASSO the tuning parameter is often regarded as a random variable, we adopt a scale space view and consider a whole range of fixed tuning parameters, instead. The effect estimates and the associated inference are considered for all tuning parameters in the selected range and the results are visualized with color maps that provide useful insights into data and the association problem considered. The methods are illustrated using two sets of artificial data and one real data set, all representing typical settings in association genetics.

  8. A formal and data-based comparison of measures of motor-equivalent covariation.

    PubMed

    Verrel, Julius

    2011-09-15

    Different analysis methods have been developed for assessing motor-equivalent organization of movement variability. In the uncontrolled manifold (UCM) method, the structure of variability is analyzed by comparing goal-equivalent and non-goal-equivalent variability components at the level of elemental variables (e.g., joint angles). In contrast, in the covariation by randomization (CR) approach, motor-equivalent organization is assessed by comparing variability at the task level between empirical and decorrelated surrogate data. UCM effects can be due to both covariation among elemental variables and selective channeling of variability to elemental variables with low task sensitivity ("individual variation"), suggesting a link between the UCM and CR method. However, the precise relationship between the notion of covariation in the two approaches has not been analyzed in detail yet. Analysis of empirical and simulated data from a study on manual pointing shows that in general the two approaches are not equivalent, but the respective covariation measures are highly correlated (ρ > 0.7) for two proposed definitions of covariation in the UCM context. For one-dimensional task spaces, a formal comparison is possible and in fact the two notions of covariation are equivalent. In situations in which individual variation does not contribute to UCM effects, for which necessary and sufficient conditions are derived, this entails the equivalence of the UCM and CR analysis. Implications for the interpretation of UCM effects are discussed. Copyright © 2011 Elsevier B.V. All rights reserved.

  9. Linear data mining the Wichita clinical matrix suggests sleep and allostatic load involvement in chronic fatigue syndrome.

    PubMed

    Gurbaxani, Brian M; Jones, James F; Goertzel, Benjamin N; Maloney, Elizabeth M

    2006-04-01

    To provide a mathematical introduction to the Wichita (KS, USA) clinical dataset, which is all of the nongenetic data (no microarray or single nucleotide polymorphism data) from the 2-day clinical evaluation, and show the preliminary findings and limitations, of popular, matrix algebra-based data mining techniques. An initial matrix of 440 variables by 227 human subjects was reduced to 183 variables by 164 subjects. Variables were excluded that strongly correlated with chronic fatigue syndrome (CFS) case classification by design (for example, the multidimensional fatigue inventory [MFI] data), that were otherwise self reporting in nature and also tended to correlate strongly with CFS classification, or were sparse or nonvarying between case and control. Subjects were excluded if they did not clearly fall into well-defined CFS classifications, had comorbid depression with melancholic features, or other medical or psychiatric exclusions. The popular data mining techniques, principle components analysis (PCA) and linear discriminant analysis (LDA), were used to determine how well the data separated into groups. Two different feature selection methods helped identify the most discriminating parameters. Although purely biological features (variables) were found to separate CFS cases from controls, including many allostatic load and sleep-related variables, most parameters were not statistically significant individually. However, biological correlates of CFS, such as heart rate and heart rate variability, require further investigation. Feature selection of a limited number of variables from the purely biological dataset produced better separation between groups than a PCA of the entire dataset. Feature selection highlighted the importance of many of the allostatic load variables studied in more detail by Maloney and colleagues in this issue [1] , as well as some sleep-related variables. Nonetheless, matrix linear algebra-based data mining approaches appeared to be of limited utility when compared with more sophisticated nonlinear analyses on richer data types, such as those found in Maloney and colleagues [1] and Goertzel and colleagues [2] in this issue.

  10. Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak

    1996-01-01

    Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

  11. Using genetic algorithms to achieve an automatic and global optimization of analogue methods for statistical downscaling of precipitation

    NASA Astrophysics Data System (ADS)

    Horton, Pascal; Weingartner, Rolf; Obled, Charles; Jaboyedoff, Michel

    2017-04-01

    Analogue methods (AMs) rely on the hypothesis that similar situations, in terms of atmospheric circulation, are likely to result in similar local or regional weather conditions. These methods consist of sampling a certain number of past situations, based on different synoptic-scale meteorological variables (predictors), in order to construct a probabilistic prediction for a local weather variable of interest (predictand). They are often used for daily precipitation prediction, either in the context of real-time forecasting, reconstruction of past weather conditions, or future climate impact studies. The relationship between predictors and predictands is defined by several parameters (predictor variable, spatial and temporal windows used for the comparison, analogy criteria, and number of analogues), which are often calibrated by means of a semi-automatic sequential procedure that has strong limitations. AMs may include several subsampling levels (e.g. first sorting a set of analogs in terms of circulation, then restricting to those with similar moisture status). The parameter space of the AMs can be very complex, with substantial co-dependencies between the parameters. Thus, global optimization techniques are likely to be necessary for calibrating most AM variants, as they can optimize all parameters of all analogy levels simultaneously. Genetic algorithms (GAs) were found to be successful in finding optimal values of AM parameters. They allow taking into account parameters inter-dependencies, and selecting objectively some parameters that were manually selected beforehand (such as the pressure levels and the temporal windows of the predictor variables), and thus obviate the need of assessing a high number of combinations. The performance scores of the optimized methods increased compared to reference methods, and this even to a greater extent for days with high precipitation totals. The resulting parameters were found to be relevant and spatially coherent. Moreover, they were obtained automatically and objectively, which reduces efforts invested in exploration attempts when adapting the method to a new region or for a new predictand. In addition, the approach allowed for new degrees of freedom, such as a weighting between the pressure levels, and non overlapping spatial windows. Genetic algorithms were then used further in order to automatically select predictor variables and analogy criteria. This resulted in interesting outputs, providing new predictor-criterion combinations. However, some limitations of the approach were encountered, and the need of the expert input is likely to remain necessary. Nevertheless, letting GAs exploring a dataset for the best predictor for a predictand of interest is certainly a useful tool, particularly when applied for a new predictand or a new region under different climatic characteristics.

  12. A progressive data compression scheme based upon adaptive transform coding: Mixture block coding of natural images

    NASA Technical Reports Server (NTRS)

    Rost, Martin C.; Sayood, Khalid

    1991-01-01

    A method for efficiently coding natural images using a vector-quantized variable-blocksized transform source coder is presented. The method, mixture block coding (MBC), incorporates variable-rate coding by using a mixture of discrete cosine transform (DCT) source coders. Which coders are selected to code any given image region is made through a threshold driven distortion criterion. In this paper, MBC is used in two different applications. The base method is concerned with single-pass low-rate image data compression. The second is a natural extension of the base method which allows for low-rate progressive transmission (PT). Since the base method adapts easily to progressive coding, it offers the aesthetic advantage of progressive coding without incorporating extensive channel overhead. Image compression rates of approximately 0.5 bit/pel are demonstrated for both monochrome and color images.

  13. Framing 100-year overflowing and overtopping marine submersion hazard resulting from the propagation of 100-year joint hydrodynamic conditions

    NASA Astrophysics Data System (ADS)

    Nicolae Lerma, A.; Bulteau, T.; Elineau, S.; Paris, F.; Pedreros, R.

    2016-12-01

    Marine submersion is an increasing concern for coastal cities as urban development reinforces their vulnerabilities while climate change is likely to foster the frequency and magnitude of submersions. Characterising the coastal flooding hazard is therefore of paramount importance to ensure the security of people living in such places and for coastal planning. A hazard is commonly defined as an adverse phenomenon, often represented by a magnitude of a variable of interest (e.g. flooded area), hereafter called response variable, associated with a probability of exceedance or, alternatively, a return period. Characterising the coastal flooding hazard consists in finding the correspondence between the magnitude and the return period. The difficulty lies in the fact that the assessment is usually performed using physical numerical models taking as inputs scenarios composed by multiple forcing conditions that are most of the time interdependent. Indeed, a time series of the response variable is usually not available so we have to deal instead with time series of forcing variables (e.g. water level, waves). Thus, the problem is twofold: on the one hand, the definition of scenarios is a multivariate matter; on the other hand, it is tricky and approximate to associate the resulting response, being the output of the physical numerical model, to the return period defined for the scenarios. In this study, we illustrate the problem on the district of Leucate, located in the French Mediterranean coast. A multivariate extreme value analysis of waves and water levels is performed offshore using a conditional extreme model, then two different methods are used to define and select 100-year scenarios of forcing variables: one based on joint exceedance probability contours, a method classically used in coastal risks studies, the other based on environmental contours, which are commonly used in the field of structure design engineering. We show that these two methods enable one to frame the true 100-year response variable. The selected scenarios are propagated to the shore through a high resolution flood modelling coupling overflowing and overtopping processes. Results in terms of inundated areas and inland water volumes are finally compared for the two methods, giving upper and lower bounds for the true response variables.

  14. Optimization of Robust HPLC Method for Quantitation of Ambroxol Hydrochloride and Roxithromycin Using a DoE Approach.

    PubMed

    Patel, Rashmin B; Patel, Nilay M; Patel, Mrunali R; Solanki, Ajay B

    2017-03-01

    The aim of this work was to develop and optimize a robust HPLC method for the separation and quantitation of ambroxol hydrochloride and roxithromycin utilizing Design of Experiment (DoE) approach. The Plackett-Burman design was used to assess the impact of independent variables (concentration of organic phase, mobile phase pH, flow rate and column temperature) on peak resolution, USP tailing and number of plates. A central composite design was utilized to evaluate the main, interaction, and quadratic effects of independent variables on the selected dependent variables. The optimized HPLC method was validated based on ICH Q2R1 guideline and was used to separate and quantify ambroxol hydrochloride and roxithromycin in tablet formulations. The findings showed that DoE approach could be effectively applied to optimize a robust HPLC method for quantification of ambroxol hydrochloride and roxithromycin in tablet formulations. Statistical comparison between results of proposed and reported HPLC method revealed no significant difference; indicating the ability of proposed HPLC method for analysis of ambroxol hydrochloride and roxithromycin in pharmaceutical formulations. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  15. Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

    PubMed

    Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R

    2003-11-22

    Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

  16. Searching for faint AGN in the CDFS: an X-ray (Chandra) vs optical variability (HST) comparison.

    NASA Astrophysics Data System (ADS)

    Georgantopoulos, I.; Pouliasis, E.; Bonanos, A.; Sokolovsky, K.; Yang, M.; Hatzidimitriou, D.; Bellas, I.; Gavras, P.; Spetsieri, Z.

    2017-10-01

    X-ray surveys are believed to be the most efficient way to detect AGN. Recently though, optical variability studies are claimed to probe even fainter AGN. We are presenting results from an HST study aimed to identify Active Galactic Nuclei (AGN) through optical variability selection in the CDFS.. This work is part of the 'Hubble Catalogue of Variables'project of ESA that aims to identify variable sources in the Hubble Source Catalogue.' In particular, we used Hubble Space Telescope (HST) z-band images taken over 5 epochs and performed aperture photometry to derive the lightcurves of the sources. Two statistical methods (standard deviation & interquartile range) resulting in a final sample of 175 variable AGN candidates, having removed the artifacts by visual inspection and known stars and supernovae. The fact that the majority of the sources are extended and variable indicates AGN activity. We compare the efficiency of the method by comparing with the 7Ms Chandra detections. Our work shows that the optical variability probes AGN at comparable redshifts but at deeper optical magnitudes. Our candidate AGN (non detected in X-rays) have luminosities of L_x<6×10^{40} erg/sec at z˜0.7 suggesting that these are associated with low luminosity Seyferts and LINERS.

  17. What weather variables are important in predicting heat-related mortality? A new application of statistical learning methods

    PubMed Central

    Zhang, Kai; Li, Yun; Schwartz, Joel D.; O'Neill, Marie S.

    2014-01-01

    Hot weather increases risk of mortality. Previous studies used different sets of weather variables to characterize heat stress, resulting in variation in heat-mortality- associations depending on the metric used. We employed a statistical learning method – random forests – to examine which of various weather variables had the greatest impact on heat-related mortality. We compiled a summertime daily weather and mortality counts dataset from four U.S. cities (Chicago, IL; Detroit, MI; Philadelphia, PA; and Phoenix, AZ) from 1998 to 2006. A variety of weather variables were ranked in predicting deviation from typical daily all-cause and cause-specific death counts. Ranks of weather variables varied with city and health outcome. Apparent temperature appeared to be the most important predictor of heat-related mortality for all-cause mortality. Absolute humidity was, on average, most frequently selected one of the top variables for all-cause mortality and seven cause-specific mortality categories. Our analysis affirms that apparent temperature is a reasonable variable for activating heat alerts and warnings, which are commonly based on predictions of total mortality in next few days. Additionally, absolute humidity should be included in future heat-health studies. Finally, random forests can be used to guide choice of weather variables in heat epidemiology studies. PMID:24834832

  18. Isolation of phage-display library-derived scFv antibody specific to Listeria monocytogenes by a novel immobilized method.

    PubMed

    Nguyen, X-H; Trinh, T-L; Vu, T-B-H; Le, Q-H; To, K-A

    2018-02-01

    To select Listeria monocytogenes-specific single-chain fragment variable (scFv) antibodies from a phage-display library by a novel simple and cost-effective immobilization method. Light expanded clay aggregate (LECA) was used as biomass support matrix for biopanning of a phage-display library to select L. monocytogenes-specific scFv antibody. Four rounds of positive selection against LECA-immobilized L. monocytogenes and an additional subtractive panning against Listeria innocua were performed. The phage clones selected using this panning scheme and LECA-based immobilization method exhibited the ability to bind L. monocytogenes without cross-reactivity toward 10 other non-L. monocytogenes bacteria. One of the selected phage clones was able to specifically recognize three major pathogenic serotypes (1/2a, 1/2b and 4b) of L. monocytogenes and 11 tested L. monocytogenes strains isolated from foods. The LECA-based immobilization method is applicable for isolating species-specific anti-L. monocytogenes scFv antibodies by phage display. The isolated scFv antibody has potential use in development of immunoassay-based methods for rapid detection of L. monocytogenes in food and environmental samples. In addition, the LECA immobilization method described here could feasibly be employed to isolate specific monoclonal antibodies against any given species of pathogenic bacteria from phage-display libraries. © 2017 The Society for Applied Microbiology.

  19. How Genes Modulate Patterns of Aging-Related Changes on the Way to 100: Biodemographic Models and Methods in Genetic Analyses of Longitudinal Data

    PubMed Central

    Yashin, Anatoliy I.; Arbeev, Konstantin G.; Wu, Deqing; Arbeeva, Liubov; Kulminski, Alexander; Kulminskaya, Irina; Akushevich, Igor; Ukraintseva, Svetlana V.

    2016-01-01

    Background and Objective To clarify mechanisms of genetic regulation of human aging and longevity traits, a number of genome-wide association studies (GWAS) of these traits have been performed. However, the results of these analyses did not meet expectations of the researchers. Most detected genetic associations have not reached a genome-wide level of statistical significance, and suffered from the lack of replication in the studies of independent populations. The reasons for slow progress in this research area include low efficiency of statistical methods used in data analyses, genetic heterogeneity of aging and longevity related traits, possibility of pleiotropic (e.g., age dependent) effects of genetic variants on such traits, underestimation of the effects of (i) mortality selection in genetically heterogeneous cohorts, (ii) external factors and differences in genetic backgrounds of individuals in the populations under study, the weakness of conceptual biological framework that does not fully account for above mentioned factors. One more limitation of conducted studies is that they did not fully realize the potential of longitudinal data that allow for evaluating how genetic influences on life span are mediated by physiological variables and other biomarkers during the life course. The objective of this paper is to address these issues. Data and Methods We performed GWAS of human life span using different subsets of data from the original Framingham Heart Study cohort corresponding to different quality control (QC) procedures and used one subset of selected genetic variants for further analyses. We used simulation study to show that approach to combining data improves the quality of GWAS. We used FHS longitudinal data to compare average age trajectories of physiological variables in carriers and non-carriers of selected genetic variants. We used stochastic process model of human mortality and aging to investigate genetic influence on hidden biomarkers of aging and on dynamic interaction between aging and longevity. We investigated properties of genes related to selected variants and their roles in signaling and metabolic pathways. Results We showed that the use of different QC procedures results in different sets of genetic variants associated with life span. We selected 24 genetic variants negatively associated with life span. We showed that the joint analyses of genetic data at the time of bio-specimen collection and follow up data substantially improved significance of associations of selected 24 SNPs with life span. We also showed that aging related changes in physiological variables and in hidden biomarkers of aging differ for the groups of carriers and non-carriers of selected variants. Conclusions . The results of these analyses demonstrated benefits of using biodemographic models and methods in genetic association studies of these traits. Our findings showed that the absence of a large number of genetic variants with deleterious effects may make substantial contribution to exceptional longevity. These effects are dynamically mediated by a number of physiological variables and hidden biomarkers of aging. The results of these research demonstrated benefits of using integrative statistical models of mortality risks in genetic studies of human aging and longevity. PMID:27773987

  20. Breath-Group Intelligibility in Dysarthria: Characteristics and Underlying Correlates

    ERIC Educational Resources Information Center

    Yunusova, Yana; Weismer, Gary; Kent, Ray D.; Rusche, Nicole M.

    2005-01-01

    Purpose: This study was designed to determine whether within-speaker fluctuations in speech intelligibility occurred among speakers with dysarthria who produced a reading passage, and, if they did, whether selected linguistic and acoustic variables predicted the variations in speech intelligibility. Method: Participants with dysarthria included a…

  1. Selection of a Geostatistical Method to Interpolate Soil Properties of the State Crop Testing Fields using Attributes of a Digital Terrain Model

    NASA Astrophysics Data System (ADS)

    Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.

    2018-03-01

    The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.

  2. Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Singh, Renu; Steinley, Douglas

    2009-01-01

    The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the…

  3. Stability indicating methods for the analysis of cefprozil in the presence of its alkaline induced degradation product

    NASA Astrophysics Data System (ADS)

    Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed

    2016-04-01

    Three simple, specific, accurate and precise spectrophotometric methods were developed for the determination of cefprozil (CZ) in the presence of its alkaline induced degradation product (DCZ). The first method was the bivariate method, while the two other multivariate methods were partial least squares (PLS) and spectral residual augmented classical least squares (SRACLS). The multivariate methods were applied with and without variable selection procedure (genetic algorithm GA). These methods were tested by analyzing laboratory prepared mixtures of the above drug with its alkaline induced degradation product and they were applied to its commercial pharmaceutical products.

  4. [Discrimination of varieties of brake fluid using visual-near infrared spectra].

    PubMed

    Jiang, Lu-lu; Tan, Li-hong; Qiu, Zheng-jun; Lu, Jiang-feng; He, Yong

    2008-06-01

    A new method was developed to fast discriminate brands of brake fluid by means of visual-near infrared spectroscopy. Five different brands of brake fluid were analyzed using a handheld near infrared spectrograph, manufactured by ASD Company, and 60 samples were gotten from each brand of brake fluid. The samples data were pretreated using average smoothing and standard normal variable method, and then analyzed using principal component analysis (PCA). A 2-dimensional plot was drawn based on the first and the second principal components, and the plot indicated that the clustering characteristic of different brake fluid is distinct. The foregoing 6 principal components were taken as input variable, and the band of brake fluid as output variable to build the discriminate model by stepwise discriminant analysis method. Two hundred twenty five samples selected randomly were used to create the model, and the rest 75 samples to verify the model. The result showed that the distinguishing rate was 94.67%, indicating that the method proposed in this paper has good performance in classification and discrimination. It provides a new way to fast discriminate different brands of brake fluid.

  5. Plasticity models of material variability based on uncertainty quantification techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jones, Reese E.; Rizzi, Francesco; Boyce, Brad

    The advent of fabrication techniques like additive manufacturing has focused attention on the considerable variability of material response due to defects and other micro-structural aspects. This variability motivates the development of an enhanced design methodology that incorporates inherent material variability to provide robust predictions of performance. In this work, we develop plasticity models capable of representing the distribution of mechanical responses observed in experiments using traditional plasticity models of the mean response and recently developed uncertainty quantification (UQ) techniques. Lastly, we demonstrate that the new method provides predictive realizations that are superior to more traditional ones, and how these UQmore » techniques can be used in model selection and assessing the quality of calibrated physical parameters.« less

  6. Epithelial–mesenchymal transition biomarkers and support vector machine guided model in preoperatively predicting regional lymph node metastasis for rectal cancer

    PubMed Central

    Fan, X-J; Wan, X-B; Huang, Y; Cai, H-M; Fu, X-H; Yang, Z-L; Chen, D-K; Song, S-X; Wu, P-H; Liu, Q; Wang, L; Wang, J-P

    2012-01-01

    Background: Current imaging modalities are inadequate in preoperatively predicting regional lymph node metastasis (RLNM) status in rectal cancer (RC). Here, we designed support vector machine (SVM) model to address this issue by integrating epithelial–mesenchymal-transition (EMT)-related biomarkers along with clinicopathological variables. Methods: Using tissue microarrays and immunohistochemistry, the EMT-related biomarkers expression was measured in 193 RC patients. Of which, 74 patients were assigned to the training set to select the robust variables for designing SVM model. The SVM model predictive value was validated in the testing set (119 patients). Results: In training set, eight variables, including six EMT-related biomarkers and two clinicopathological variables, were selected to devise SVM model. In testing set, we identified 63 patients with high risk to RLNM and 56 patients with low risk. The sensitivity, specificity and overall accuracy of SVM in predicting RLNM were 68.3%, 81.1% and 72.3%, respectively. Importantly, multivariate logistic regression analysis showed that SVM model was indeed an independent predictor of RLNM status (odds ratio, 11.536; 95% confidence interval, 4.113–32.361; P<0.0001). Conclusion: Our SVM-based model displayed moderately strong predictive power in defining the RLNM status in RC patients, providing an important approach to select RLNM high-risk subgroup for neoadjuvant chemoradiotherapy. PMID:22538975

  7. Individual treatment selection for patients with posttraumatic stress disorder.

    PubMed

    Deisenhofer, Anne-Katharina; Delgadillo, Jaime; Rubel, Julian A; Böhnke, Jan R; Zimmermann, Dirk; Schwartz, Brian; Lutz, Wolfgang

    2018-04-16

    Trauma-focused cognitive behavioral therapy (Tf-CBT) and eye movement desensitization and reprocessing (EMDR) are two highly effective treatment options for posttraumatic stress disorder (PTSD). Yet, on an individual level, PTSD patients vary substantially in treatment response. The aim of the paper is to test the application of a treatment selection method based on a personalized advantage index (PAI). The study used clinical data for patients accessing treatment for PTSD in a primary care mental health service in the north of England. PTSD patients received either EMDR (N = 75) or Tf-CBT (N = 242). The Patient Health Questionnaire (PHQ-9) was used as an outcome measure for depressive symptoms associated with PTSD. Variables predicting differential treatment response were identified using an automated variable selection approach (genetic algorithm) and afterwards included in regression models, allowing the calculation of each patient's PAI. Age, employment status, gender, and functional impairment were identified as relevant variables for Tf-CBT. For EMDR, baseline depressive symptoms as well as prescribed antidepressant medication were selected as predictor variables. Fifty-six percent of the patients (n = 125) had a PAI equal or higher than one standard deviation. From those patients, 62 (50%) did not receive their model-predicted treatment and could have benefited from a treatment assignment based on the PAI. Using a PAI-based algorithm has the potential to improve clinical decision making and to enhance individual patient outcomes, although further replication is necessary before such an approach can be implemented in prospective studies. © 2018 Wiley Periodicals, Inc.

  8. Computation of Sensitivity Derivatives of Navier-Stokes Equations using Complex Variables

    NASA Technical Reports Server (NTRS)

    Vatsa, Veer N.

    2004-01-01

    Accurate computation of sensitivity derivatives is becoming an important item in Computational Fluid Dynamics (CFD) because of recent emphasis on using nonlinear CFD methods in aerodynamic design, optimization, stability and control related problems. Several techniques are available to compute gradients or sensitivity derivatives of desired flow quantities or cost functions with respect to selected independent (design) variables. Perhaps the most common and oldest method is to use straightforward finite-differences for the evaluation of sensitivity derivatives. Although very simple, this method is prone to errors associated with choice of step sizes and can be cumbersome for geometric variables. The cost per design variable for computing sensitivity derivatives with central differencing is at least equal to the cost of three full analyses, but is usually much larger in practice due to difficulty in choosing step sizes. Another approach gaining popularity is the use of Automatic Differentiation software (such as ADIFOR) to process the source code, which in turn can be used to evaluate the sensitivity derivatives of preselected functions with respect to chosen design variables. In principle, this approach is also very straightforward and quite promising. The main drawback is the large memory requirement because memory use increases linearly with the number of design variables. ADIFOR software can also be cumber-some for large CFD codes and has not yet reached a full maturity level for production codes, especially in parallel computing environments.

  9. A soft computing based approach using modified selection strategy for feature reduction of medical systems.

    PubMed

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  10. A Soft Computing Based Approach Using Modified Selection Strategy for Feature Reduction of Medical Systems

    PubMed Central

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data. PMID:23573172

  11. The Schaake shuffle: A method for reconstructing space-time variability in forecasted precipitation and temperature fields

    USGS Publications Warehouse

    Clark, M.R.; Gangopadhyay, S.; Hay, L.; Rajagopalan, B.; Wilby, R.

    2004-01-01

    A number of statistical methods that are used to provide local-scale ensemble forecasts of precipitation and temperature do not contain realistic spatial covariability between neighboring stations or realistic temporal persistence for subsequent forecast lead times. To demonstrate this point, output from a global-scale numerical weather prediction model is used in a stepwise multiple linear regression approach to downscale precipitation and temperature to individual stations located in and around four study basins in the United States. Output from the forecast model is downscaled for lead times up to 14 days. Residuals in the regression equation are modeled stochastically to provide 100 ensemble forecasts. The precipitation and temperature ensembles from this approach have a poor representation of the spatial variability and temporal persistence. The spatial correlations for downscaled output are considerably lower than observed spatial correlations at short forecast lead times (e.g., less than 5 days) when there is high accuracy in the forecasts. At longer forecast lead times, the downscaled spatial correlations are close to zero. Similarly, the observed temporal persistence is only partly present at short forecast lead times. A method is presented for reordering the ensemble output in order to recover the space-time variability in precipitation and temperature fields. In this approach, the ensemble members for a given forecast day are ranked and matched with the rank of precipitation and temperature data from days randomly selected from similar dates in the historical record. The ensembles are then reordered to correspond to the original order of the selection of historical data. Using this approach, the observed intersite correlations, intervariable correlations, and the observed temporal persistence are almost entirely recovered. This reordering methodology also has applications for recovering the space-time variability in modeled streamflow. ?? 2004 American Meteorological Society.

  12. A GPU-Based Implementation of the Firefly Algorithm for Variable Selection in Multivariate Calibration Problems

    PubMed Central

    de Paula, Lauro C. M.; Soares, Anderson S.; de Lima, Telma W.; Delbem, Alexandre C. B.; Coelho, Clarimar J.; Filho, Arlindo R. G.

    2014-01-01

    Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation. PMID:25493625

  13. A GPU-Based Implementation of the Firefly Algorithm for Variable Selection in Multivariate Calibration Problems.

    PubMed

    de Paula, Lauro C M; Soares, Anderson S; de Lima, Telma W; Delbem, Alexandre C B; Coelho, Clarimar J; Filho, Arlindo R G

    2014-01-01

    Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation.

  14. An Assessment of Farmers' Willingness to Pay for Extension Services Using the Contingent Valuation Method (CVM): The Case of Oyo State, Nigeria

    ERIC Educational Resources Information Center

    Ajayi, A. O.

    2006-01-01

    This study assessed farmers' willingness to pay (WTP) for extension services. The Contingent Valuation Method (CVM) was used to assess the amount which farmers are willing to pay. Primary data on the demographic, socio-economic variables of farmers and their WTP were collected from 228 farmers selected randomly in a stage-wise sampling procedure…

  15. Effects of Three Sets of Instructional Strategies and Three Demographic Variables on the Food and Nutrition Test Performance of Some Jamaican Tenth-Graders

    ERIC Educational Resources Information Center

    Kelly, Dennis; Soyibo, Kola

    2005-01-01

    This study was designed to find out if students taught food and nutrition concepts using the lecture method and practical work would perform significantly better than their counterparts taught with the lecture and teacher demonstrations and the lecture method only. The sample comprised 114 Jamaican 10th-graders (56 boys, 58 girls) selected from…

  16. Effects of Variable Resistance Training on Maximal Strength: A Meta-Analysis.

    PubMed

    Soria-Gila, Miguel A; Chirosa, Ignacio J; Bautista, Iker J; Baena, Salvador; Chirosa, Luis J

    2015-11-01

    Variable resistance training (VRT) methods improve the rate of force development, coordination between antagonist and synergist muscles, the recruitment of motor units, and reduce the drop in force produced in the sticking region. However, the beneficial effects of long-term VRT on maximal strength both in athletes and untrained individuals have been much disputed. The purpose of this study was to compare in a meta-analysis the effects of a long-term (≥7 weeks) VRT program using chains or elastic bands and a similar constant resistance program in both trained adults practicing different sports and untrained individuals. Intervention effect sizes were compared among investigations meeting our selection and inclusion criteria using a random-effects model. The published studies considered were those addressing VRT effects on the 1 repetition maximum. Seven studies involving 235 subjects fulfilled the selection and inclusion criteria. Variable resistance training led to a significantly greater mean strength gain (weighted mean difference: 5.03 kg; 95% confidence interval: 2.26-7.80 kg; Z = 3.55; p < 0.001) than the gain recorded in response to conventional weight training. Long-term VRT training using chains or elastic bands attached to the barbell emerged as an effective evidence-based method of improving maximal strength both in athletes with different sports backgrounds and untrained subjects.

  17. The Smile Esthetic Index (SEI): A method to measure the esthetics of the smile. An intra-rater and inter-rater agreement study.

    PubMed

    Rotundo, Roberto; Nieri, Michele; Bonaccini, Daniele; Mori, Massimiliano; Lamberti, Elena; Massironi, Domenico; Giachetti, Luca; Franchi, Lorenzo; Venezia, Piero; Cavalcanti, Raffaele; Bondi, Elena; Farneti, Mauro; Pinchi, Vilma; Buti, Jacopo

    2015-01-01

    To propose a method to measure the esthetics of the smile and to report its validation by means of an intra-rater and inter-rater agreement analysis. Ten variables were chosen as determinants for the esthetics of a smile: smile line and facial midline, tooth alignment, tooth deformity, tooth dischromy, gingival dischromy, gingival recession, gingival excess, gingival scars and diastema/missing papillae. One examiner consecutively selected seventy smile pictures, which were in the frontal view. Ten examiners, with different levels of clinical experience and specialties, applied the proposed assessment method twice on the selected pictures, independently and blindly. Intraclass correlation coefficient (ICC) and Fleiss' kappa) statistics were performed to analyse the intra-rater and inter-rater agreement. Considering the cumulative assessment of the Smile Esthetic Index (SEI), the ICC value for the inter-rater agreement of the 10 examiners was 0.62 (95% CI: 0.51 to 0.72), representing a substantial agreement. Intra-rater agreement ranged from 0.86 to 0.99. Inter-rater agreement (Fleiss' kappa statistics) calculated for each variable ranged from 0.17 to 0.75. The SEI was a reproducible method, to assess the esthetic component of the smile, useful for the diagnostic phase and for setting appropriate treatment plans.

  18. Continuous-time discrete-space models for animal movement

    USGS Publications Warehouse

    Hanks, Ephraim M.; Hooten, Mevin B.; Alldredge, Mat W.

    2015-01-01

    The processes influencing animal movement and resource selection are complex and varied. Past efforts to model behavioral changes over time used Bayesian statistical models with variable parameter space, such as reversible-jump Markov chain Monte Carlo approaches, which are computationally demanding and inaccessible to many practitioners. We present a continuous-time discrete-space (CTDS) model of animal movement that can be fit using standard generalized linear modeling (GLM) methods. This CTDS approach allows for the joint modeling of location-based as well as directional drivers of movement. Changing behavior over time is modeled using a varying-coefficient framework which maintains the computational simplicity of a GLM approach, and variable selection is accomplished using a group lasso penalty. We apply our approach to a study of two mountain lions (Puma concolor) in Colorado, USA.

  19. Firefly algorithm versus genetic algorithm as powerful variable selection tools and their effect on different multivariate calibration models in spectroscopy: A comparative study.

    PubMed

    Attia, Khalid A M; Nassar, Mohammed W I; El-Zeiny, Mohamed B; Serag, Ahmed

    2017-01-05

    For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Multi-objective flexible job shop scheduling problem using variable neighborhood evolutionary algorithm

    NASA Astrophysics Data System (ADS)

    Wang, Chun; Ji, Zhicheng; Wang, Yan

    2017-07-01

    In this paper, multi-objective flexible job shop scheduling problem (MOFJSP) was studied with the objects to minimize makespan, total workload and critical workload. A variable neighborhood evolutionary algorithm (VNEA) was proposed to obtain a set of Pareto optimal solutions. First, two novel crowded operators in terms of the decision space and object space were proposed, and they were respectively used in mating selection and environmental selection. Then, two well-designed neighborhood structures were used in local search, which consider the problem characteristics and can hold fast convergence. Finally, extensive comparison was carried out with the state-of-the-art methods specially presented for solving MOFJSP on well-known benchmark instances. The results show that the proposed VNEA is more effective than other algorithms in solving MOFJSP.

  1. Quantitative structure-activity relationships studies of CCR5 inhibitors and toxicity of aromatic compounds using gene expression programming.

    PubMed

    Shi, Weimin; Zhang, Xiaoya; Shen, Qi

    2010-01-01

    Quantitative structure-activity relationship (QSAR) study of chemokine receptor 5 (CCR5) binding affinity of substituted 1-(3,3-diphenylpropyl)-piperidinyl amides and ureas and toxicity of aromatic compounds have been performed. The gene expression programming (GEP) was used to select variables and produce nonlinear QSAR models simultaneously using the selected variables. In our GEP implementation, a simple and convenient method was proposed to infer the K-expression from the number of arguments of the function in a gene, without building the expression tree. The results were compared to those obtained by artificial neural network (ANN) and support vector machine (SVM). It has been demonstrated that the GEP is a useful tool for QSAR modeling. Copyright 2009 Elsevier Masson SAS. All rights reserved.

  2. A Cautious Note on Auxiliary Variables That Can Increase Bias in Missing Data Problems.

    PubMed

    Thoemmes, Felix; Rose, Norman

    2014-01-01

    The treatment of missing data in the social sciences has changed tremendously during the last decade. Modern missing data techniques such as multiple imputation and full-information maximum likelihood are used much more frequently. These methods assume that data are missing at random. One very common approach to increase the likelihood that missing at random is achieved consists of including many covariates as so-called auxiliary variables. These variables are either included based on data considerations or in an inclusive fashion; that is, taking all available auxiliary variables. In this article, we point out that there are some instances in which auxiliary variables exhibit the surprising property of increasing bias in missing data problems. In a series of focused simulation studies, we highlight some situations in which this type of biasing behavior can occur. We briefly discuss possible ways how one can avoid selecting bias-inducing covariates as auxiliary variables.

  3. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    ERIC Educational Resources Information Center

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  4. Selective Attention, Anxiety, Depressive Symptomatology and Academic Performance in Adolescents

    ERIC Educational Resources Information Center

    Fernandez-Castillo, Antonio; Gutierrez-Rojas, Maria Esperanza

    2009-01-01

    Introduction: In this cross-sectional, descriptive research we studied the relation between three psychological variables (anxiety, depression and attention) in order to analyze their possible association with and predictive power for academic achievement (as expressed in school grades) in a sample of secondary students. Method: For this purpose…

  5. Juvenile Offender Recidivism: An Examination of Risk Factors

    ERIC Educational Resources Information Center

    Calley, Nancy G.

    2012-01-01

    One hundred and seventy three male juvenile offenders were followed two years postrelease from a residential treatment facility to assess recidivism and factors related to recidivism. The overall recidivism rate was 23.9%. Logistic regression with stepwise and backward variable selection methods was used to examine the relationship between…

  6. Reconsidering the evolution of brain, cognition, and behavior in birds and mammals

    PubMed Central

    Willemet, Romain

    2013-01-01

    Despite decades of research, some of the most basic issues concerning the extraordinarily complex brains and behavior of birds and mammals, such as the factors responsible for the diversity of brain size and composition, are still unclear. This is partly due to a number of conceptual and methodological issues. Determining species and group differences in brain composition requires accounting for the presence of taxon-cerebrotypes and the use of precise statistical methods. The role of allometry in determining brain variables should be revised. In particular, bird and mammalian brains appear to have evolved in response to a variety of selective pressures influencing both brain size and composition. “Brain” and “cognition” are indeed meta-variables, made up of the variables that are ecologically relevant and evolutionarily selected. External indicators of species differences in cognition and behavior are limited by the complexity of these differences. Indeed, behavioral differences between species and individuals are caused by cognitive and affective components. Although intra-species variability forms the basis of species evolution, some of the mechanisms underlying individual differences in brain and behavior appear to differ from those between species. While many issues have persisted over the years because of a lack of appropriate data or methods to test them; several fallacies, particularly those related to the human brain, reflect scientists' preconceptions. The theoretical framework on the evolution of brain, cognition, and behavior in birds and mammals should be reconsidered with these biases in mind. PMID:23847570

  7. A New Approach to Extreme Value Estimation Applicable to a Wide Variety of Random Variables

    NASA Technical Reports Server (NTRS)

    Holland, Frederic A., Jr.

    1997-01-01

    Designing reliable structures requires an estimate of the maximum and minimum values (i.e., strength and load) that may be encountered in service. Yet designs based on very extreme values (to insure safety) can result in extra material usage and hence, uneconomic systems. In aerospace applications, severe over-design cannot be tolerated making it almost mandatory to design closer to the assumed limits of the design random variables. The issue then is predicting extreme values that are practical, i.e. neither too conservative or non-conservative. Obtaining design values by employing safety factors is well known to often result in overly conservative designs and. Safety factor values have historically been selected rather arbitrarily, often lacking a sound rational basis. To answer the question of how safe a design needs to be has lead design theorists to probabilistic and statistical methods. The so-called three-sigma approach is one such method and has been described as the first step in utilizing information about the data dispersion. However, this method is based on the assumption that the random variable is dispersed symmetrically about the mean and is essentially limited to normally distributed random variables. Use of this method can therefore result in unsafe or overly conservative design allowables if the common assumption of normality is incorrect.

  8. Regional flood-frequency relations for streams with many years of no flow

    USGS Publications Warehouse

    Hjalmarson, Hjalmar W.; Thomas, Blakemore E.; ,

    1990-01-01

    In the southwestern United States, flood-frequency relations for streams that drain small arid basins are difficult to estimate, largely because of the extreme temporal and spatial variability of floods and the many years of no flow. A method is proposed that is based on the station-year method. The new method produces regional flood-frequency relations using all available annual peak-discharge data. The prediction errors for the relations are directly assessed using randomly selected subsamples of the annual peak discharges.

  9. Thin layer activation techniques at the U-120 cyclotron of Bucharest

    NASA Astrophysics Data System (ADS)

    Constantinescu, B.; Ivanov, E. A.; Pascovici, G.; Popa-Simil, L.; Racolta, P. M.

    1994-05-01

    The Thin Layer Activation (TLA) technique is a nuclear method especially used for different types of wear (or corrosion) investigations. Experimental results for selection criteria of nuclear reactions for various tribological studies, using the IPNE U-120 classical variable energy Cyclotron are presented. Measuring methods for the main types of wear phenomena and home made instrumentations dedicated for TLA industrial applications are also reported. Some typical TLA tribological applications, a nuclear scanning method to obtain wear profile of piston-rings are presented as well.

  10. Mutual information-based feature selection for radiomics

    NASA Astrophysics Data System (ADS)

    Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine

    2016-03-01

    Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.

  11. Clustering and variable selection in the presence of mixed variable types and missing data.

    PubMed

    Storlie, C B; Myers, S M; Katusic, S K; Weaver, A L; Voigt, R G; Croarkin, P E; Stoeckel, R E; Port, J D

    2018-05-17

    We consider the problem of model-based clustering in the presence of many correlated, mixed continuous, and discrete variables, some of which may have missing values. Discrete variables are treated with a latent continuous variable approach, and the Dirichlet process is used to construct a mixture model with an unknown number of components. Variable selection is also performed to identify the variables that are most influential for determining cluster membership. The work is motivated by the need to cluster patients thought to potentially have autism spectrum disorder on the basis of many cognitive and/or behavioral test scores. There are a modest number of patients (486) in the data set along with many (55) test score variables (many of which are discrete valued and/or missing). The goal of the work is to (1) cluster these patients into similar groups to help identify those with similar clinical presentation and (2) identify a sparse subset of tests that inform the clusters in order to eliminate unnecessary testing. The proposed approach compares very favorably with other methods via simulation of problems of this type. The results of the autism spectrum disorder analysis suggested 3 clusters to be most likely, while only 4 test scores had high (>0.5) posterior probability of being informative. This will result in much more efficient and informative testing. The need to cluster observations on the basis of many correlated, continuous/discrete variables with missing values is a common problem in the health sciences as well as in many other disciplines. Copyright © 2018 John Wiley & Sons, Ltd.

  12. Correlations of Handgrip Strength with Selected Hand-Arm-Anthropometric Variables in Indian Inter-university Female Volleyball Players

    PubMed Central

    Koley, Shyamal; Pal Kaur, Satinder

    2011-01-01

    Purpose The purpose of this study was to estimate the dominant handgrip strength and its correlations with some hand and arm anthropometric variables in 101 randomly selected Indian inter-university female volleyball players aged 18-25 years (mean age 20.52±1.40) from six Indian universities. Methods Three anthropometric variables, i.e. height, weight, BMI, two hand anthropometric variables, viz. right and left hand width and length, four arm anthropometric variables, i.e. upper arm length, lower arm length, upper extremity length, upper arm circumference and dominant right and non-dominant handgrip strength were measured among Indian inter-university female volleyball players by standard anthropometric techniques. Results The findings of the present study indicated that Indian female volleyball players had higher mean values in eleven variables and lesser mean values in two variables than their control counterparts, showing significant differences (P<0.032-0.001) in height (t=2.63), weight (t=8.66), left hand width (t=2.10), left and right hand length (t=9.99 and 10.40 respectively), right upper arm length (t=8.48), right forearm length (t=5.41), dominant (right) and non-dominant (left) handgrip strength (t=9.37 and 6.76 respectively). In female volleyball players, dominant handgrip strength had significantly positive correlations (P=0.01) with all the variables studied. Conclusion It may be concluded that dominant handgrip strength had strong positive correlations with all the variables studied in Indian inter-university female volleyball players. PMID:22375242

  13. Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

    USGS Publications Warehouse

    Southard, Rodney E.

    2013-01-01

    The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical estimates on one of these streams can be calculated at an ungaged location that has a drainage area that is between 40 percent of the drainage area of the farthest upstream streamgage and within 150 percent of the drainage area of the farthest downstream streamgage along the stream of interest. The second method may be used on any stream with a streamgage that has operated for 10 years or longer and for which anthropogenic effects have not changed the low-flow characteristics at the ungaged location since collection of the streamflow data. A ratio of drainage area of the stream at the ungaged location to the drainage area of the stream at the streamgage was computed to estimate the statistic at the ungaged location. The range of applicability is between 40- and 150-percent of the drainage area of the streamgage, and the ungaged location must be located on the same stream as the streamgage. The third method uses regional regression equations to estimate selected low-flow frequency statistics for unregulated streams in Missouri. This report presents regression equations to estimate frequency statistics for the 10-year recurrence interval and for the N-day durations of 1, 2, 3, 7, 10, 30, and 60 days. Basin and climatic characteristics were computed using geographic information system software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses based on existing digital geospatial data and previous studies. Spatial analyses for geographical bias in the predictive accuracy of the regional regression equations defined three low-flow regions with the State representing the three major physiographic provinces in Missouri. Region 1 includes the Central Lowlands, Region 2 includes the Ozark Plateaus, and Region 3 includes the Mississippi Alluvial Plain. A total of 207 streamgages were used in the regression analyses for the regional equations. Of the 207 U.S. Geological Survey streamgages, 77 were located in Region 1, 120 were located in Region 2, and 10 were located in Region 3. Streamgages located outside of Missouri were selected to extend the range of data used for the independent variables in the regression analyses. Streamgages included in the regression analyses had 10 or more years of record and were considered to be affected minimally by anthropogenic activities or trends. Regional regression analyses identified three characteristics as statistically significant for the development of regional equations. For Region 1, drainage area, longest flow path, and streamflow-variability index were statistically significant. The range in the standard error of estimate for Region 1 is 79.6 to 94.2 percent. For Region 2, drainage area and streamflow variability index were statistically significant, and the range in the standard error of estimate is 48.2 to 72.1 percent. For Region 3, drainage area and streamflow-variability index also were statistically significant with a range in the standard error of estimate of 48.1 to 96.2 percent. Limitations on the use of estimating low-flow frequency statistics at ungaged locations are dependent on the method used. The first method outlined for use in Missouri, power curve equations, were developed to estimate the selected statistics for ungaged locations on 28 selected streams with multiple streamgages located on the same stream. A second method uses a drainage-area ratio to compute statistics at an ungaged location using data from a single streamgage on the same stream with 10 or more years of record. Ungaged locations on these streams may use the ratio of the drainage area at an ungaged location to the drainage area at a streamgage location to scale the selected statistic value from the streamgage location to the ungaged location. This method can be used if the drainage area of the ungaged location is within 40 to 150 percent of the streamgage drainage area. The third method is the use of the regional regression equations. The limits for the use of these equations are based on the ranges of the characteristics used as independent variables and that streams must be affected minimally by anthropogenic activities.

  14. A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.

    PubMed

    Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger

    2018-04-19

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.

  15. Comparing supervised learning methods for classifying sex, age, context and individual Mudi dogs from barking.

    PubMed

    Larrañaga, Ana; Bielza, Concha; Pongrácz, Péter; Faragó, Tamás; Bálint, Anna; Larrañaga, Pedro

    2015-03-01

    Barking is perhaps the most characteristic form of vocalization in dogs; however, very little is known about its role in the intraspecific communication of this species. Besides the obvious need for ethological research, both in the field and in the laboratory, the possible information content of barks can also be explored by computerized acoustic analyses. This study compares four different supervised learning methods (naive Bayes, classification trees, [Formula: see text]-nearest neighbors and logistic regression) combined with three strategies for selecting variables (all variables, filter and wrapper feature subset selections) to classify Mudi dogs by sex, age, context and individual from their barks. The classification accuracy of the models obtained was estimated by means of [Formula: see text]-fold cross-validation. Percentages of correct classifications were 85.13 % for determining sex, 80.25 % for predicting age (recodified as young, adult and old), 55.50 % for classifying contexts (seven situations) and 67.63 % for recognizing individuals (8 dogs), so the results are encouraging. The best-performing method was [Formula: see text]-nearest neighbors following a wrapper feature selection approach. The results for classifying contexts and recognizing individual dogs were better with this method than they were for other approaches reported in the specialized literature. This is the first time that the sex and age of domestic dogs have been predicted with the help of sound analysis. This study shows that dog barks carry ample information regarding the caller's indexical features. Our computerized analysis provides indirect proof that barks may serve as an important source of information for dogs as well.

  16. Pitfalls in statistical landslide susceptibility modelling

    NASA Astrophysics Data System (ADS)

    Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut

    2010-05-01

    The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible lack of explanatory information in the chosen set of predictor variables, the model residuals need to be checked for spatial auto¬correlation. Therefore, we calculate spline correlograms. In addition to this, we investigate partial dependency plots and bivariate interactions plots considering possible interactions between predictors to improve model interpretation. Aiming at presenting this toolbox for model quality assessment, we investigate the influence of strategies in the construction of training datasets for statistical models on model quality.

  17. Selective spectroscopic imaging of hyperpolarized pyruvate and its metabolites using a single-echo variable phase advance method in balanced SSFP

    PubMed Central

    Varma, Gopal; Wang, Xiaoen; Vinogradov, Elena; Bhatt, Rupal S.; Sukhatme, Vikas; Seth, Pankaj; Lenkinski, Robert E.; Alsop, David C.; Grant, Aaron K.

    2015-01-01

    Purpose In balanced steady state free precession (bSSFP), the signal intensity has a well-known dependence on the off-resonance frequency, or, equivalently, the phase advance between successive radiofrequency (RF) pulses. The signal profile can be used to resolve the contributions from the spectrally separated metabolites. This work describes a method based on use of a variable RF phase advance to acquire spatial and spectral data in a time-efficient manner for hyperpolarized 13C MRI. Theory and Methods The technique relies on the frequency response from a bSSFP acquisition to acquire relatively rapid, high-resolution images that may be reconstructed to separate contributions from different metabolites. The ability to produce images from spectrally separated metabolites was demonstrated in-vitro, as well as in-vivo following administration of hyperpolarized 1-13C pyruvate in mice with xenograft tumors. Results In-vivo images of pyruvate, alanine, pyruvate hydrate and lactate were reconstructed from 4 images acquired in 2 seconds with an in-plane resolution of 1.25 × 1.25mm2 and 5mm slice thickness. Conclusions The phase advance method allowed acquisition of spectroscopically selective images with high spatial and temporal resolution. This method provides an alternative approach to hyperpolarized 13C spectroscopic MRI that can be combined with other techniques such as multi-echo or fluctuating equilibrium bSSFP. PMID:26507361

  18. Uncertainty Analysis of the NASA Glenn 8x6 Supersonic Wind Tunnel

    NASA Technical Reports Server (NTRS)

    Stephens, Julia; Hubbard, Erin; Walter, Joel; McElroy, Tyler

    2016-01-01

    This paper presents methods and results of a detailed measurement uncertainty analysis that was performed for the 8- by 6-foot Supersonic Wind Tunnel located at the NASA Glenn Research Center. The statistical methods and engineering judgments used to estimate elemental uncertainties are described. The Monte Carlo method of propagating uncertainty was selected to determine the uncertainty of calculated variables of interest. A detailed description of the Monte Carlo method as applied for this analysis is provided. Detailed uncertainty results for the uncertainty in average free stream Mach number as well as other variables of interest are provided. All results are presented as random (variation in observed values about a true value), systematic (potential offset between observed and true value), and total (random and systematic combined) uncertainty. The largest sources contributing to uncertainty are determined and potential improvement opportunities for the facility are investigated.

  19. Implementations of geographically weighted lasso in spatial data with multicollinearity (Case study: Poverty modeling of Java Island)

    NASA Astrophysics Data System (ADS)

    Setiyorini, Anis; Suprijadi, Jadi; Handoko, Budhi

    2017-03-01

    Geographically Weighted Regression (GWR) is a regression model that takes into account the spatial heterogeneity effect. In the application of the GWR, inference on regression coefficients is often of interest, as is estimation and prediction of the response variable. Empirical research and studies have demonstrated that local correlation between explanatory variables can lead to estimated regression coefficients in GWR that are strongly correlated, a condition named multicollinearity. It later results on a large standard error on estimated regression coefficients, and, hence, problematic for inference on relationships between variables. Geographically Weighted Lasso (GWL) is a method which capable to deal with spatial heterogeneity and local multicollinearity in spatial data sets. GWL is a further development of GWR method, which adds a LASSO (Least Absolute Shrinkage and Selection Operator) constraint in parameter estimation. In this study, GWL will be applied by using fixed exponential kernel weights matrix to establish a poverty modeling of Java Island, Indonesia. The results of applying the GWL to poverty datasets show that this method stabilizes regression coefficients in the presence of multicollinearity and produces lower prediction and estimation error of the response variable than GWR does.

  20. Zoonotic importance of canine scabies and dermatophytosis in relation to knowledge level of dog owners

    PubMed Central

    Raval, Heli S.; Nayak, J. B.; Patel, B. M.; Bhadesiya, C. M.

    2015-01-01

    Aim: The present study was undertaken to understand the zoonotic importance of canine scabies and dermatophytosis with special reference to the knowledge level of dog owners in urban areas of Gujarat. Materials and Methods: The study was carried out in randomly selected 120 dog owners of 3 urban cities (viz., Ahmedabad, Anand and Vadodara) of Gujarat state, India. Dog owners (i.e., respondents) were subjected to a detailed interview regarding the zoonotic importance of canine scabies and dermatophytosis in dogs. Ex-post-facto research design was selected because of the independent variables of the selected respondent population for the study. The crucial method used in collecting data was a field survey to generate null hypothesis (Ho1). Available data was subjected to statistical analysis. Results: The three independent variables, viz., extension contact (r=0.522**), mass-media exposure (r=0.205*) and management orientation (r=0.264**) had significant relationship with knowledge of dog owners about zoonotic diseases. Other independent variables, viz., education, experience in dog keeping and housing space were observed to have negative and non-significant relationship with knowledge of dog owners about zoonotic diseases. Conclusion: Extension contact, exposure to extension mass-media, management orientation and innovation proneness among dog owners of 3 urban cities of Gujarat state had significant relationship with knowledge of dog owners on zoonotic aspects of canine scabies and dermatophytosis. Data provided new insights on the present status of zoonotic disease-awareness, which would be an aid to plan preventive measures. PMID:27065644

  1. Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model

    PubMed Central

    Seo, Dong Gi; Weiss, David J.

    2015-01-01

    Most computerized adaptive tests (CATs) have been studied using the framework of unidimensional item response theory. However, many psychological variables are multidimensional and might benefit from using a multidimensional approach to CATs. This study investigated the accuracy, fidelity, and efficiency of a fully multidimensional CAT algorithm (MCAT) with a bifactor model using simulated data. Four item selection methods in MCAT were examined for three bifactor pattern designs using two multidimensional item response theory models. To compare MCAT item selection and estimation methods, a fixed test length was used. The Ds-optimality item selection improved θ estimates with respect to a general factor, and either D- or A-optimality improved estimates of the group factors in three bifactor pattern designs under two multidimensional item response theory models. The MCAT model without a guessing parameter functioned better than the MCAT model with a guessing parameter. The MAP (maximum a posteriori) estimation method provided more accurate θ estimates than the EAP (expected a posteriori) method under most conditions, and MAP showed lower observed standard errors than EAP under most conditions, except for a general factor condition using Ds-optimality item selection. PMID:29795848

  2. Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits.

    PubMed

    van Heerwaarden, Joost; van Zanten, Martijn; Kruijer, Willem

    2015-10-01

    Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation.

  3. Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

    PubMed

    Attallah, Omneya; Karthikesalingam, Alan; Holt, Peter Je; Thompson, Matthew M; Sayers, Rob; Bown, Matthew J; Choke, Eddie C; Ma, Xianghong

    2017-11-01

    Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan.

  4. Feature Selection for Nonstationary Data: Application to Human Recognition Using Medical Biometrics.

    PubMed

    Komeili, Majid; Louis, Wael; Armanfard, Narges; Hatzinakos, Dimitrios

    2018-05-01

    Electrocardiogram (ECG) and transient evoked otoacoustic emission (TEOAE) are among the physiological signals that have attracted significant interest in biometric community due to their inherent robustness to replay and falsification attacks. However, they are time-dependent signals and this makes them hard to deal with in across-session human recognition scenario where only one session is available for enrollment. This paper presents a novel feature selection method to address this issue. It is based on an auxiliary dataset with multiple sessions where it selects a subset of features that are more persistent across different sessions. It uses local information in terms of sample margins while enforcing an across-session measure. This makes it a perfect fit for aforementioned biometric recognition problem. Comprehensive experiments on ECG and TEOAE variability due to time lapse and body posture are done. Performance of the proposed method is compared against seven state-of-the-art feature selection algorithms as well as another six approaches in the area of ECG and TEOAE biometric recognition. Experimental results demonstrate that the proposed method performs noticeably better than other algorithms.

  5. Adaptability and stability of soybean cultivars for grain yield and seed quality.

    PubMed

    Silva, K B; Bruzi, A T; Zambiazzi, E V; Soares, I O; Pereira, J L A R; Carvalho, M L M

    2017-05-10

    This study aimed at verifying the adaptability and stability of soybean cultivars, considering the grain yield and quality of seeds, adopting univariate and multivariate approaches. The experiments were conducted in two crops, three environments, in 2013/2014 and 2014/2015 crop seasons, in the county of Inconfidentes, Lavras, and Patos de Minas, in the Minas Gerais State, Brazil. We evaluated 17 commercial soybean cultivars. For adaptability and stability evaluations, the Graphic and GGE biplot methods were employed. Previously, a selection index was estimated based on the sum of the standardized variables (Z index). The data relative to grain yield, mass of one thousand grain, uniformity test (sieve retention), and germination test were standardized (Z ij ) per cultivar. With the sum of Z ij , we obtained the selection index for the four traits evaluated together. In the Graphic method evaluation, cultivars NA 7200 RR and CD 2737 RR presented the highest values for selection index Z. By the GGE biplot method, we verified that cultivar NA 7200 RR presented greater stability in both univariate evaluations, for grain yield, and for selection index Z.

  6. Global Design Optimization for Fluid Machinery Applications

    NASA Technical Reports Server (NTRS)

    Shyy, Wei; Papila, Nilay; Tucker, Kevin; Vaidyanathan, Raj; Griffin, Lisa

    2000-01-01

    Recent experiences in utilizing the global optimization methodology, based on polynomial and neural network techniques for fluid machinery design are summarized. Global optimization methods can utilize the information collected from various sources and by different tools. These methods offer multi-criterion optimization, handle the existence of multiple design points and trade-offs via insight into the entire design space can easily perform tasks in parallel, and are often effective in filtering the noise intrinsic to numerical and experimental data. Another advantage is that these methods do not need to calculate the sensitivity of each design variable locally. However, a successful application of the global optimization method needs to address issues related to data requirements with an increase in the number of design variables and methods for predicting the model performance. Examples of applications selected from rocket propulsion components including a supersonic turbine and an injector element and a turbulent flow diffuser are used to illustrate the usefulness of the global optimization method.

  7. Applying causal mediation analysis to personality disorder research.

    PubMed

    Walters, Glenn D

    2018-01-01

    This article is designed to address fundamental issues in the application of causal mediation analysis to research on personality disorders. Causal mediation analysis is used to identify mechanisms of effect by testing variables as putative links between the independent and dependent variables. As such, it would appear to have relevance to personality disorder research. It is argued that proper implementation of causal mediation analysis requires that investigators take several factors into account. These factors are discussed under 5 headings: variable selection, model specification, significance evaluation, effect size estimation, and sensitivity testing. First, care must be taken when selecting the independent, dependent, mediator, and control variables for a mediation analysis. Some variables make better mediators than others and all variables should be based on reasonably reliable indicators. Second, the mediation model needs to be properly specified. This requires that the data for the analysis be prospectively or historically ordered and possess proper causal direction. Third, it is imperative that the significance of the identified pathways be established, preferably with a nonparametric bootstrap resampling approach. Fourth, effect size estimates should be computed or competing pathways compared. Finally, investigators employing the mediation method are advised to perform a sensitivity analysis. Additional topics covered in this article include parallel and serial multiple mediation designs, moderation, and the relationship between mediation and moderation. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  8. A practical approach to Sasang constitutional diagnosis using vocal features

    PubMed Central

    2013-01-01

    Background Sasang constitutional medicine (SCM) is a type of tailored medicine that divides human beings into four Sasang constitutional (SC) types. Diagnosis of SC types is crucial to proper treatment in SCM. Voice characteristics have been used as an essential clue for diagnosing SC types. In the past, many studies tried to extract quantitative vocal features to make diagnosis models; however, these studies were flawed by limited data collected from one or a few sites, long recording time, and low accuracy. We propose a practical diagnosis model having only a few variables, which decreases model complexity. This in turn, makes our model appropriate for clinical applications. Methods A total of 2,341 participants’ voice recordings were used in making a SC classification model and to test the generalization ability of the model. Although the voice data consisted of five vowels and two repeated sentences per participant, we used only the sentence part for our study. A total of 21 features were extracted, and an advanced feature selection method—the least absolute shrinkage and selection operator (LASSO)—was applied to reduce the number of variables for classifier learning. A SC classification model was developed using multinomial logistic regression via LASSO. Results We compared the proposed classification model to the previous study, which used both sentences and five vowels from the same patient’s group. The classification accuracies for the test set were 47.9% and 40.4% for male and female, respectively. Our result showed that the proposed method was superior to the previous study in that it required shorter voice recordings, is more applicable to practical use, and had better generalization performance. Conclusions We proposed a practical SC classification method and showed that our model having fewer variables outperformed the model having many variables in the generalization test. We attempted to reduce the number of variables in two ways: 1) the initial number of candidate features was decreased by considering shorter voice recording, and 2) LASSO was introduced for reducing model complexity. The proposed method is suitable for an actual clinical environment. Moreover, we expect it to yield more stable results because of the model’s simplicity. PMID:24200041

  9. OPTIMAL TIME-SERIES SELECTION OF QUASARS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Butler, Nathaniel R.; Bloom, Joshua S.

    2011-03-15

    We present a novel method for the optimal selection of quasars using time-series observations in a single photometric bandpass. Utilizing the damped random walk model of Kelly et al., we parameterize the ensemble quasar structure function in Sloan Stripe 82 as a function of observed brightness. The ensemble model fit can then be evaluated rigorously for and calibrated with individual light curves with no parameter fitting. This yields a classification in two statistics-one describing the fit confidence and the other describing the probability of a false alarm-which can be tuned, a priori, to achieve high quasar detection fractions (99% completenessmore » with default cuts), given an acceptable rate of false alarms. We establish the typical rate of false alarms due to known variable stars as {approx}<3% (high purity). Applying the classification, we increase the sample of potential quasars relative to those known in Stripe 82 by as much as 29%, and by nearly a factor of two in the redshift range 2.5 < z < 3, where selection by color is extremely inefficient. This represents 1875 new quasars in a 290 deg{sup 2} field. The observed rates of both quasars and stars agree well with the model predictions, with >99% of quasars exhibiting the expected variability profile. We discuss the utility of the method at high redshift and in the regime of noisy and sparse data. Our time-series selection complements well-independent selection based on quasar colors and has strong potential for identifying high-redshift quasars for Baryon Acoustic Oscillations and other cosmology studies in the LSST era.« less

  10. Entropic criterion for model selection

    NASA Astrophysics Data System (ADS)

    Tseng, Chih-Yuan

    2006-10-01

    Model or variable selection is usually achieved through ranking models according to the increasing order of preference. One of methods is applying Kullback-Leibler distance or relative entropy as a selection criterion. Yet that will raise two questions, why use this criterion and are there any other criteria. Besides, conventional approaches require a reference prior, which is usually difficult to get. Following the logic of inductive inference proposed by Caticha [Relative entropy and inductive inference, in: G. Erickson, Y. Zhai (Eds.), Bayesian Inference and Maximum Entropy Methods in Science and Engineering, AIP Conference Proceedings, vol. 707, 2004 (available from arXiv.org/abs/physics/0311093)], we show relative entropy to be a unique criterion, which requires no prior information and can be applied to different fields. We examine this criterion by considering a physical problem, simple fluids, and results are promising.

  11. Calibration Variable Selection and Natural Zero Determination for Semispan and Canard Balances

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert M.

    2013-01-01

    Independent calibration variables for the characterization of semispan and canard wind tunnel balances are discussed. It is shown that the variable selection for a semispan balance is determined by the location of the resultant normal and axial forces that act on the balance. These two forces are the first and second calibration variable. The pitching moment becomes the third calibration variable after the normal and axial forces are shifted to the pitch axis of the balance. Two geometric distances, i.e., the rolling and yawing moment arms, are the fourth and fifth calibration variable. They are traditionally substituted by corresponding moments to simplify the use of calibration data during a wind tunnel test. A canard balance is related to a semispan balance. It also only measures loads on one half of a lifting surface. However, the axial force and yawing moment are of no interest to users of a canard balance. Therefore, its calibration variable set is reduced to the normal force, pitching moment, and rolling moment. The combined load diagrams of the rolling and yawing moment for a semispan balance are discussed. They may be used to illustrate connections between the wind tunnel model geometry, the test section size, and the calibration load schedule. Then, methods are reviewed that may be used to obtain the natural zeros of a semispan or canard balance. In addition, characteristics of three semispan balance calibration rigs are discussed. Finally, basic requirements for a full characterization of a semispan balance are reviewed.

  12. Optimal Combinations of Diagnostic Tests Based on AUC.

    PubMed

    Huang, Xin; Qin, Gengsheng; Fang, Yixin

    2011-06-01

    When several diagnostic tests are available, one can combine them to achieve better diagnostic accuracy. This article considers the optimal linear combination that maximizes the area under the receiver operating characteristic curve (AUC); the estimates of the combination's coefficients can be obtained via a nonparametric procedure. However, for estimating the AUC associated with the estimated coefficients, the apparent estimation by re-substitution is too optimistic. To adjust for the upward bias, several methods are proposed. Among them the cross-validation approach is especially advocated, and an approximated cross-validation is developed to reduce the computational cost. Furthermore, these proposed methods can be applied for variable selection to select important diagnostic tests. The proposed methods are examined through simulation studies and applications to three real examples. © 2010, The International Biometric Society.

  13. Bayesian inference of selection in a heterogeneous environment from genetic time-series data.

    PubMed

    Gompert, Zachariah

    2016-01-01

    Evolutionary geneticists have sought to characterize the causes and molecular targets of selection in natural populations for many years. Although this research programme has been somewhat successful, most statistical methods employed were designed to detect consistent, weak to moderate selection. In contrast, phenotypic studies in nature show that selection varies in time and that individual bouts of selection can be strong. Measurements of the genomic consequences of such fluctuating selection could help test and refine hypotheses concerning the causes of ecological specialization and the maintenance of genetic variation in populations. Herein, I proposed a Bayesian nonhomogeneous hidden Markov model to estimate effective population sizes and quantify variable selection in heterogeneous environments from genetic time-series data. The model is described and then evaluated using a series of simulated data, including cases where selection occurs on a trait with a simple or polygenic molecular basis. The proposed method accurately distinguished neutral loci from non-neutral loci under strong selection, but not from those under weak selection. Selection coefficients were accurately estimated when selection was constant or when the fitness values of genotypes varied linearly with the environment, but these estimates were less accurate when fitness was polygenic or the relationship between the environment and the fitness of genotypes was nonlinear. Past studies of temporal evolutionary dynamics in laboratory populations have been remarkably successful. The proposed method makes similar analyses of genetic time-series data from natural populations more feasible and thereby could help answer fundamental questions about the causes and consequences of evolution in the wild. © 2015 John Wiley & Sons Ltd.

  14. Biomarkers of Progression after HIV Acute/Early Infection: Nothing Compares to CD4+ T-cell Count?

    PubMed Central

    Ghiglione, Yanina; Hormanstorfer, Macarena; Coloccini, Romina; Salido, Jimena; Trifone, César; Ruiz, María Julia; Falivene, Juliana; Caruso, María Paula; Figueroa, María Inés; Salomón, Horacio; Giavedoni, Luis D.; Pando, María de los Ángeles; Gherardi, María Magdalena; Rabinovich, Roberto Daniel; Sued, Omar

    2018-01-01

    Progression of HIV infection is variable among individuals, and definition disease progression biomarkers is still needed. Here, we aimed to categorize the predictive potential of several variables using feature selection methods and decision trees. A total of seventy-five treatment-naïve subjects were enrolled during acute/early HIV infection. CD4+ T-cell counts (CD4TC) and viral load (VL) levels were determined at enrollment and for one year. Immune activation, HIV-specific immune response, Human Leukocyte Antigen (HLA) and C-C chemokine receptor type 5 (CCR5) genotypes, and plasma levels of 39 cytokines were determined. Data were analyzed by machine learning and non-parametric methods. Variable hierarchization was performed by Weka correlation-based feature selection and J48 decision tree. Plasma interleukin (IL)-10, interferon gamma-induced protein (IP)-10, soluble IL-2 receptor alpha (sIL-2Rα) and tumor necrosis factor alpha (TNF-α) levels correlated directly with baseline VL, whereas IL-2, TNF-α, fibroblast growth factor (FGF)-2 and macrophage inflammatory protein (MIP)-1β correlated directly with CD4+ T-cell activation (p < 0.05). However, none of these cytokines had good predictive values to distinguish “progressors” from “non-progressors”. Similarly, immune activation, HIV-specific immune responses and HLA/CCR5 genotypes had low discrimination power. Baseline CD4TC was the most potent discerning variable with a cut-off of 438 cells/μL (accuracy = 0.93, κ-Cohen = 0.85). Limited discerning power of the other factors might be related to frequency, variability and/or sampling time. Future studies based on decision trees to identify biomarkers of post-treatment control are warrantied. PMID:29342870

  15. Biomarkers of Progression after HIV Acute/Early Infection: Nothing Compares to CD4⁺ T-cell Count?

    PubMed

    Turk, Gabriela; Ghiglione, Yanina; Hormanstorfer, Macarena; Laufer, Natalia; Coloccini, Romina; Salido, Jimena; Trifone, César; Ruiz, María Julia; Falivene, Juliana; Holgado, María Pía; Caruso, María Paula; Figueroa, María Inés; Salomón, Horacio; Giavedoni, Luis D; Pando, María de Los Ángeles; Gherardi, María Magdalena; Rabinovich, Roberto Daniel; Pury, Pedro A; Sued, Omar

    2018-01-13

    Progression of HIV infection is variable among individuals, and definition disease progression biomarkers is still needed. Here, we aimed to categorize the predictive potential of several variables using feature selection methods and decision trees. A total of seventy-five treatment-naïve subjects were enrolled during acute/early HIV infection. CD4⁺ T-cell counts (CD4TC) and viral load (VL) levels were determined at enrollment and for one year. Immune activation, HIV-specific immune response, Human Leukocyte Antigen (HLA) and C-C chemokine receptor type 5 (CCR5) genotypes, and plasma levels of 39 cytokines were determined. Data were analyzed by machine learning and non-parametric methods. Variable hierarchization was performed by Weka correlation-based feature selection and J48 decision tree. Plasma interleukin (IL)-10, interferon gamma-induced protein (IP)-10, soluble IL-2 receptor alpha (sIL-2Rα) and tumor necrosis factor alpha (TNF-α) levels correlated directly with baseline VL, whereas IL-2, TNF-α, fibroblast growth factor (FGF)-2 and macrophage inflammatory protein (MIP)-1β correlated directly with CD4⁺ T-cell activation ( p < 0.05). However, none of these cytokines had good predictive values to distinguish "progressors" from "non-progressors". Similarly, immune activation, HIV-specific immune responses and HLA/CCR5 genotypes had low discrimination power. Baseline CD4TC was the most potent discerning variable with a cut-off of 438 cells/μL (accuracy = 0.93, κ-Cohen = 0.85). Limited discerning power of the other factors might be related to frequency, variability and/or sampling time. Future studies based on decision trees to identify biomarkers of post-treatment control are warrantied.

  16. Survey of methods for calculating sensitivity of general eigenproblems

    NASA Technical Reports Server (NTRS)

    Murthy, Durbha V.; Haftka, Raphael T.

    1987-01-01

    A survey of methods for sensitivity analysis of the algebraic eigenvalue problem for non-Hermitian matrices is presented. In addition, a modification of one method based on a better normalizing condition is proposed. Methods are classified as Direct or Adjoint and are evaluated for efficiency. Operation counts are presented in terms of matrix size, number of design variables and number of eigenvalues and eigenvectors of interest. The effect of the sparsity of the matrix and its derivatives is also considered, and typical solution times are given. General guidelines are established for the selection of the most efficient method.

  17. A method to estimate weight and dimensions of aircraft gas turbine engines. Volume 1: Method of analysis

    NASA Technical Reports Server (NTRS)

    Pera, R. J.; Onat, E.; Klees, G. W.; Tjonneland, E.

    1977-01-01

    Weight and envelope dimensions of aircraft gas turbine engines are estimated within plus or minus 5% to 10% using a computer method based on correlations of component weight and design features of 29 data base engines. Rotating components are estimated by a preliminary design procedure where blade geometry, operating conditions, material properties, shaft speed, hub-tip ratio, etc., are the primary independent variables used. The development and justification of the method selected, the various methods of analysis, the use of the program, and a description of the input/output data are discussed.

  18. Dual-threshold segmentation using Arimoto entropy based on chaotic bee colony optimization

    NASA Astrophysics Data System (ADS)

    Li, Li

    2018-03-01

    In order to extract target from complex background more quickly and accurately, and to further improve the detection effect of defects, a method of dual-threshold segmentation using Arimoto entropy based on chaotic bee colony optimization was proposed. Firstly, the method of single-threshold selection based on Arimoto entropy was extended to dual-threshold selection in order to separate the target from the background more accurately. Then intermediate variables in formulae of Arimoto entropy dual-threshold selection was calculated by recursion to eliminate redundant computation effectively and to reduce the amount of calculation. Finally, the local search phase of artificial bee colony algorithm was improved by chaotic sequence based on tent mapping. The fast search for two optimal thresholds was achieved using the improved bee colony optimization algorithm, thus the search could be accelerated obviously. A large number of experimental results show that, compared with the existing segmentation methods such as multi-threshold segmentation method using maximum Shannon entropy, two-dimensional Shannon entropy segmentation method, two-dimensional Tsallis gray entropy segmentation method and multi-threshold segmentation method using reciprocal gray entropy, the proposed method can segment target more quickly and accurately with superior segmentation effect. It proves to be an instant and effective method for image segmentation.

  19. Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method.

    PubMed

    Jiang, Yuan; He, Yunxiao; Zhang, Heping

    LASSO is a popular statistical tool often used in conjunction with generalized linear models that can simultaneously select variables and estimate parameters. When there are many variables of interest, as in current biological and biomedical studies, the power of LASSO can be limited. Fortunately, so much biological and biomedical data have been collected and they may contain useful information about the importance of certain variables. This paper proposes an extension of LASSO, namely, prior LASSO (pLASSO), to incorporate that prior information into penalized generalized linear models. The goal is achieved by adding in the LASSO criterion function an additional measure of the discrepancy between the prior information and the model. For linear regression, the whole solution path of the pLASSO estimator can be found with a procedure similar to the Least Angle Regression (LARS). Asymptotic theories and simulation results show that pLASSO provides significant improvement over LASSO when the prior information is relatively accurate. When the prior information is less reliable, pLASSO shows great robustness to the misspecification. We illustrate the application of pLASSO using a real data set from a genome-wide association study.

  20. Crystal diffraction lens with variable focal length

    DOEpatents

    Smither, R.K.

    1991-04-02

    A method and apparatus for altering the focal length of a focusing element of one of a plurality of pre-determined focal lengths by changing heat transfer within selected portions of the element by controlled quantities is disclosed. Control over heat transfer is accomplished by manipulating one or more of a number of variables, including: the amount of heat or cold applied to surfaces; type of fluids pumped through channels for heating and cooling; temperatures, directions of flow and rates of flow of fluids; and placement of channels. 19 figures.

  1. Transceivers and receivers for quantum key distribution and methods pertaining thereto

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    DeRose, Christopher; Sarovar, Mohan; Soh, Daniel B.S.

    Various technologies for performing continuous-variable (CV) and discrete-variable (DV) quantum key distribution (QKD) with integrated electro-optical circuits are described herein. An integrated DV-QKD system uses Mach-Zehnder modulators to modulate a polarization of photons at a transmitter and select a photon polarization measurement basis at a receiver. An integrated CV-QKD system uses wavelength division multiplexing to send and receive amplitude-modulated and phase-modulated optical signals with a local oscillator signal while maintaining phase coherence between the modulated signals and the local oscillator signal.

  2. [Cardiac rhythm variability as an index of vegetative heart regulation in a situation of psychoemotional tension].

    PubMed

    Revina, N E

    2006-01-01

    Differentiated role of segmental and suprasegmental levels of cardiac rhythm variability regulation in dynamics of motivational human conflict was studied for the first time. The author used an original method allowing simultaneous analysis of psychological and physiological parameters of human activity. The study demonstrates that will and anxiety, as components of motivational activity spectrum, form the "energetic" basis of voluntary-constructive and involuntary-affective behavioral strategies, selectively uniting various levels of suprasegmental and segmental control of human heart functioning in a conflict situation.

  3. Design sensitivity analysis with Applicon IFAD using the adjoint variable method

    NASA Technical Reports Server (NTRS)

    Frederick, Marjorie C.; Choi, Kyung K.

    1984-01-01

    A numerical method is presented to implement structural design sensitivity analysis using the versatility and convenience of existing finite element structural analysis program and the theoretical foundation in structural design sensitivity analysis. Conventional design variables, such as thickness and cross-sectional areas, are considered. Structural performance functionals considered include compliance, displacement, and stress. It is shown that calculations can be carried out outside existing finite element codes, using postprocessing data only. That is, design sensitivity analysis software does not have to be imbedded in an existing finite element code. The finite element structural analysis program used in the implementation presented is IFAD. Feasibility of the method is shown through analysis of several problems, including built-up structures. Accurate design sensitivity results are obtained without the uncertainty of numerical accuracy associated with selection of a finite difference perturbation.

  4. Genetic variability in Jatropha curcas L. from diallel crossing.

    PubMed

    Ribeiro, D O; Silva-Mann, R; Alvares-Carvalho, S V; Souza, E M S; Vasconcelos, M C; Blank, A F

    2017-05-18

    Physic nut (Jatropha curcas L.) presents high oilseed yield and low production cost. However, technical-scientific knowledge on this crop is still limited. This study aimed to evaluate and estimate the genetic variability of hybrids obtained from dialell crossing. Genetic variability was carried out using ISSR molecular markers. For genetic variability, nine primers were used, and six were selected with 80.7% polymorphism. Genetic similarity was obtained using the NTSYS pc. 2.1 software, and cluster analysis was obtained by the UPGMA method. Mean genetic similarity was 58.4% among hybrids; the most divergent pair was H1 and H10 and the most similar pair was H9 and H10. ISSR PCR markers provided a quick and highly informative system for DNA fingerprinting, and also allowed establishing genetic relationships of Jatropha hybrids.

  5. [Instruments for quantitative methods of nursing research].

    PubMed

    Vellone, E

    2000-01-01

    Instruments for quantitative nursing research are a mean to objectify and measure a variable or a phenomenon in the scientific research. There are direct instruments to measure concrete variables and indirect instruments to measure abstract concepts (Burns, Grove, 1997). Indirect instruments measure the attributes by which a concept is made of. Furthermore, there are instruments for physiologic variables (e.g. for the weight), observational instruments (Check-lists e Rating Scales), interviews, questionnaires, diaries and the scales (Check-lists, Rating Scales, Likert Scales, Semantic Differential Scales e Visual Anologue Scales). The choice to select an instrument or another one depends on the research question and design. Instruments research are very useful in research both to describe the variables and to see statistical significant relationships. Very carefully should be their use in the clinical practice for diagnostic assessment.

  6. Modulation Depth Estimation and Variable Selection in State-Space Models for Neural Interfaces

    PubMed Central

    Hochberg, Leigh R.; Donoghue, John P.; Brown, Emery N.

    2015-01-01

    Rapid developments in neural interface technology are making it possible to record increasingly large signal sets of neural activity. Various factors such as asymmetrical information distribution and across-channel redundancy may, however, limit the benefit of high-dimensional signal sets, and the increased computational complexity may not yield corresponding improvement in system performance. High-dimensional system models may also lead to overfitting and lack of generalizability. To address these issues, we present a generalized modulation depth measure using the state-space framework that quantifies the tuning of a neural signal channel to relevant behavioral covariates. For a dynamical system, we develop computationally efficient procedures for estimating modulation depth from multivariate data. We show that this measure can be used to rank neural signals and select an optimal channel subset for inclusion in the neural decoding algorithm. We present a scheme for choosing the optimal subset based on model order selection criteria. We apply this method to neuronal ensemble spike-rate decoding in neural interfaces, using our framework to relate motor cortical activity with intended movement kinematics. With offline analysis of intracortical motor imagery data obtained from individuals with tetraplegia using the BrainGate neural interface, we demonstrate that our variable selection scheme is useful for identifying and ranking the most information-rich neural signals. We demonstrate that our approach offers several orders of magnitude lower complexity but virtually identical decoding performance compared to greedy search and other selection schemes. Our statistical analysis shows that the modulation depth of human motor cortical single-unit signals is well characterized by the generalized Pareto distribution. Our variable selection scheme has wide applicability in problems involving multisensor signal modeling and estimation in biomedical engineering systems. PMID:25265627

  7. Estimating the efficacy of Alcoholics Anonymous without self-selection bias: An instrumental variables re-analysis of randomized clinical trials

    PubMed Central

    Humphreys, Keith; Blodgett, Janet C.; Wagner, Todd H.

    2014-01-01

    Background Observational studies of Alcoholics Anonymous’ (AA) effectiveness are vulnerable to self-selection bias because individuals choose whether or not to attend AA. The present study therefore employed an innovative statistical technique to derive a selection bias-free estimate of AA’s impact. Methods Six datasets from 5 National Institutes of Health-funded randomized trials (one with two independent parallel arms) of AA facilitation interventions were analyzed using instrumental variables models. Alcohol dependent individuals in one of the datasets (n = 774) were analyzed separately from the rest of sample (n = 1582 individuals pooled from 5 datasets) because of heterogeneity in sample parameters. Randomization itself was used as the instrumental variable. Results Randomization was a good instrument in both samples, effectively predicting increased AA attendance that could not be attributed to self-selection. In five of the six data sets, which were pooled for analysis, increased AA attendance that was attributable to randomization (i.e., free of self-selection bias) was effective at increasing days of abstinence at 3-month (B = .38, p = .001) and 15-month (B = 0.42, p = .04) follow-up. However, in the remaining dataset, in which pre-existing AA attendance was much higher, further increases in AA involvement caused by the randomly assigned facilitation intervention did not affect drinking outcome. Conclusions For most individuals seeking help for alcohol problems, increasing AA attendance leads to short and long term decreases in alcohol consumption that cannot be attributed to self-selection. However, for populations with high pre-existing AA involvement, further increases in AA attendance may have little impact. PMID:25421504

  8. Predicting the graft survival for heart-lung transplantation patients: an integrated data mining methodology.

    PubMed

    Oztekin, Asil; Delen, Dursun; Kong, Zhenyu James

    2009-12-01

    Predicting the survival of heart-lung transplant patients has the potential to play a critical role in understanding and improving the matching procedure between the recipient and graft. Although voluminous data related to the transplantation procedures is being collected and stored, only a small subset of the predictive factors has been used in modeling heart-lung transplantation outcomes. The previous studies have mainly focused on applying statistical techniques to a small set of factors selected by the domain-experts in order to reveal the simple linear relationships between the factors and survival. The collection of methods known as 'data mining' offers significant advantages over conventional statistical techniques in dealing with the latter's limitations such as normality assumption of observations, independence of observations from each other, and linearity of the relationship between the observations and the output measure(s). There are statistical methods that overcome these limitations. Yet, they are computationally more expensive and do not provide fast and flexible solutions as do data mining techniques in large datasets. The main objective of this study is to improve the prediction of outcomes following combined heart-lung transplantation by proposing an integrated data-mining methodology. A large and feature-rich dataset (16,604 cases with 283 variables) is used to (1) develop machine learning based predictive models and (2) extract the most important predictive factors. Then, using three different variable selection methods, namely, (i) machine learning methods driven variables-using decision trees, neural networks, logistic regression, (ii) the literature review-based expert-defined variables, and (iii) common sense-based interaction variables, a consolidated set of factors is generated and used to develop Cox regression models for heart-lung graft survival. The predictive models' performance in terms of 10-fold cross-validation accuracy rates for two multi-imputed datasets ranged from 79% to 86% for neural networks, from 78% to 86% for logistic regression, and from 71% to 79% for decision trees. The results indicate that the proposed integrated data mining methodology using Cox hazard models better predicted the graft survival with different variables than the conventional approaches commonly used in the literature. This result is validated by the comparison of the corresponding Gains charts for our proposed methodology and the literature review based Cox results, and by the comparison of Akaike information criteria (AIC) values received from each. Data mining-based methodology proposed in this study reveals that there are undiscovered relationships (i.e. interactions of the existing variables) among the survival-related variables, which helps better predict the survival of the heart-lung transplants. It also brings a different set of variables into the scene to be evaluated by the domain-experts and be considered prior to the organ transplantation.

  9. The management of abdominal wall hernias – in search of consensus

    PubMed Central

    Bury, Kamil; Śmietański, Maciej

    2015-01-01

    Introduction Laparoscopic repair is becoming an increasingly popular alternative in the treatment of abdominal wall hernias. In spite of numerous studies evaluating this technique, indications for laparoscopic surgery have not been established. Similarly, implant selection and fixation techniques have not been unified and are the subject of scientific discussion. Aim To assess whether there is a consensus on the management of the most common ventral abdominal wall hernias among recognised experts. Material and methods Fourteen specialists representing the boards of European surgical societies were surveyed to determine their choice of surgical technique for nine typical primary ventral and incisional hernias. The access method, type of operation, mesh prosthesis and fixation method were evaluated. In addition to the laparoscopic procedures, the number of tackers and their arrangement were assessed. Results In none of the cases presented was a consensus of experts obtained. Laparoscopic and open techniques were used equally often. Especially in the group of large hernias, decisions on repair methods were characterised by high variability. The technique of laparoscopic mesh fixation was a subject of great variability in terms of both method selection and the numbers of tackers and sutures used. Conclusions Recognised experts have not reached a consensus on the management of abdominal wall hernias. Our survey results indicate the need for further research and the inclusion of large cohorts of patients in the dedicated registries to evaluate the results of different surgical methods, which would help in the development of treatment algorithms for surgical education in the future. PMID:25960793

  10. Original method to compute epipoles using variable homography: application to measure emergent fibers on textile fabrics

    NASA Astrophysics Data System (ADS)

    Xu, Jun; Cudel, Christophe; Kohler, Sophie; Fontaine, Stéphane; Haeberlé, Olivier; Klotz, Marie-Louise

    2012-04-01

    Fabric's smoothness is a key factor in determining the quality of finished textile products and has great influence on the functionality of industrial textiles and high-end textile products. With popularization of the zero defect industrial concept, identifying and measuring defective material in the early stage of production is of great interest to the industry. In the current market, many systems are able to achieve automatic monitoring and control of fabric, paper, and nonwoven material during the entire production process, however online measurement of hairiness is still an open topic and highly desirable for industrial applications. We propose a computer vision approach to compute epipole by using variable homography, which can be used to measure emergent fiber length on textile fabrics. The main challenges addressed in this paper are the application of variable homography on textile monitoring and measurement, as well as the accuracy of the estimated calculation. We propose that a fibrous structure can be considered as a two-layer structure, and then we show how variable homography combined with epipolar geometry can estimate the length of the fiber defects. Simulations are carried out to show the effectiveness of this method. The true length of selected fibers is measured precisely using a digital optical microscope, and then the same fibers are tested by our method. Our experimental results suggest that smoothness monitored by variable homography is an accurate and robust method of quality control for important industrial fabrics.

  11. Parameterizing the Variability and Uncertainty of Wind and Solar in CEMs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Frew, Bethany

    We present current and improved methods for estimating the capacity value and curtailment impacts from variable generation (VG) in capacity expansion models (CEMs). The ideal calculation of these variability metrics is through an explicit co-optimized investment-dispatch model using multiple years of VG and load data. Because of data and computational limitations, existing CEMs typically approximate these metrics using a subset of all hours from a single year and/or using statistical methods, which often do not capture the tail-event impacts or the broader set of interactions between VG, storage, and conventional generators. In our proposed new methods, we use hourly generationmore » and load values across all hours of the year to characterize the (1) contribution of VG to system capacity during high load hours, (2) the curtailment level of VG, and (3) the reduction in VG curtailment due to storage and shutdown of select thermal generators. Using CEM model outputs from a preceding model solve period, we apply these methods to exogenously calculate capacity value and curtailment metrics for the subsequent model solve period. Preliminary results suggest that these hourly methods offer improved capacity value and curtailment representations of VG in the CEM from existing approximation methods without additional computational burdens.« less

  12. Punishment induced behavioural and neurophysiological variability reveals dopamine-dependent selection of kinematic movement parameters

    PubMed Central

    Galea, Joseph M.; Ruge, Diane; Buijink, Arthur; Bestmann, Sven; Rothwell, John C.

    2013-01-01

    Action selection describes the high-level process which selects between competing movements. In animals, behavioural variability is critical for the motor exploration required to select the action which optimizes reward and minimizes cost/punishment, and is guided by dopamine (DA). The aim of this study was to test in humans whether low-level movement parameters are affected by punishment and reward in ways similar to high-level action selection. Moreover, we addressed the proposed dependence of behavioural and neurophysiological variability on DA, and whether this may underpin the exploration of kinematic parameters. Participants performed an out-and-back index finger movement and were instructed that monetary reward and punishment were based on its maximal acceleration (MA). In fact, the feedback was not contingent on the participant’s behaviour but pre-determined. Blocks highly-biased towards punishment were associated with increased MA variability relative to blocks with either reward or without feedback. This increase in behavioural variability was positively correlated with neurophysiological variability, as measured by changes in cortico-spinal excitability with transcranial magnetic stimulation over the primary motor cortex. Following the administration of a DA-antagonist, the variability associated with punishment diminished and the correlation between behavioural and neurophysiological variability no longer existed. Similar changes in variability were not observed when participants executed a pre-determined MA, nor did DA influence resting neurophysiological variability. Thus, under conditions of punishment, DA-dependent processes influence the selection of low-level movement parameters. We propose that the enhanced behavioural variability reflects the exploration of kinematic parameters for less punishing, or conversely more rewarding, outcomes. PMID:23447607

  13. Application of Plackett-Burman experimental design in the development of muffin using adlay flour

    NASA Astrophysics Data System (ADS)

    Valmorida, J. S.; Castillo-Israel, K. A. T.

    2018-01-01

    The application of Plackett-Burman experimental design was made to identify significant formulation and process variables in the development of muffin using adlay flour. Out of the seven screened variables, levels of sugar, levels of butter and baking temperature had the most significant influence on the product model in terms of physicochemical and sensory acceptability. Results of the experiment further demonstrate the effectiveness of Plackett-Burman design in choosing the best adlay variety for muffin production. Hence, the statistical method used in the study permits an efficient selection of important variables needed in the development of muffin from adlay which can be optimized using response surface methodology.

  14. Data driven model generation based on computational intelligence

    NASA Astrophysics Data System (ADS)

    Gemmar, Peter; Gronz, Oliver; Faust, Christophe; Casper, Markus

    2010-05-01

    The simulation of discharges at a local gauge or the modeling of large scale river catchments are effectively involved in estimation and decision tasks of hydrological research and practical applications like flood prediction or water resource management. However, modeling such processes using analytical or conceptual approaches is made difficult by both complexity of process relations and heterogeneity of processes. It was shown manifold that unknown or assumed process relations can principally be described by computational methods, and that system models can automatically be derived from observed behavior or measured process data. This study describes the development of hydrological process models using computational methods including Fuzzy logic and artificial neural networks (ANN) in a comprehensive and automated manner. Methods We consider a closed concept for data driven development of hydrological models based on measured (experimental) data. The concept is centered on a Fuzzy system using rules of Takagi-Sugeno-Kang type which formulate the input-output relation in a generic structure like Ri : IFq(t) = lowAND...THENq(t+Δt) = ai0 +ai1q(t)+ai2p(t-Δti1)+ai3p(t+Δti2)+.... The rule's premise part (IF) describes process states involving available process information, e.g. actual outlet q(t) is low where low is one of several Fuzzy sets defined over variable q(t). The rule's conclusion (THEN) estimates expected outlet q(t + Δt) by a linear function over selected system variables, e.g. actual outlet q(t), previous and/or forecasted precipitation p(t ?Δtik). In case of river catchment modeling we use head gauges, tributary and upriver gauges in the conclusion part as well. In addition, we consider temperature and temporal (season) information in the premise part. By creating a set of rules R = {Ri|(i = 1,...,N)} the space of process states can be covered as concise as necessary. Model adaptation is achieved by finding on optimal set A = (aij) of conclusion parameters with respect to a defined rating function and experimental data. To find A, we use for example a linear equation solver and RMSE-function. In practical process models, the number of Fuzzy sets and the according number of rules is fairly low. Nevertheless, creating the optimal model requires some experience. Therefore, we improved this development step by methods for automatic generation of Fuzzy sets, rules, and conclusions. Basically, the model achievement depends to a great extend on the selection of the conclusion variables. It is the aim that variables having most influence on the system reaction being considered and superfluous ones being neglected. At first, we use Kohonen maps, a specialized ANN, to identify relevant input variables from the large set of available system variables. A greedy algorithm selects a comprehensive set of dominant and uncorrelated variables. Next, the premise variables are analyzed with clustering methods (e.g. Fuzzy-C-means) and Fuzzy sets are then derived from cluster centers and outlines. The rule base is automatically constructed by permutation of the Fuzzy sets of the premise variables. Finally, the conclusion parameters are calculated and the total coverage of the input space is iteratively tested with experimental data, rarely firing rules are combined and coarse coverage of sensitive process states results in refined Fuzzy sets and rules. Results The described methods were implemented and integrated in a development system for process models. A series of models has already been built e.g. for rainfall-runoff modeling or for flood prediction (up to 72 hours) in river catchments. The models required significantly less development effort and showed advanced simulation results compared to conventional models. The models can be used operationally and simulation takes only some minutes on a standard PC e.g. for a gauge forecast (up to 72 hours) for the whole Mosel (Germany) river catchment.

  15. Genetic potential of common bean progenies selected for crude fiber content obtained through different breeding methods.

    PubMed

    Júnior, V A P; Melo, P G S; Pereira, H S; Bassinello, P Z; Melo, L C

    2015-05-29

    Gastrointestinal health is of great importance due to the increasing consumption of functional foods, especially those concern-ing diets rich in fiber content. The common bean has been valorized as a nutritious food due to its appreciable fiber content and the fact that it is consumed in many countries. The current study aimed to evaluate and compare the genetic potential of common bean progenies of the carioca group, developed through different breeding methods, for crude fiber content. The progenies originated through hybridization of two advanced strains, CNFC 7812 and CNFC 7829, up to the F7 generation using three breeding methods: bulk-population, bulk within F2 families, and single seed descent. Fifteen F8 progenies were evaluated in each method, as well as two check cultivars and both parents, us-ing a 7 x 7 simple lattice design, with experimental plots comprised of two 4-m long rows. Field trials were conducted in eleven environments encompassing four Brazilian states and three different sowing times during 2009 and 2010. Estimates of genetic parameters indicate differences among the breeding methods, which seem to be related to the different processes for sampling the advanced progenies inherent to each method, given that the trait in question is not subject to natural selection. Variability amongst progenies occurred within the three breeding methods and there was also a significant effect of environment on the progeny for all methods. Progenies developed by bulk-population attained the highest estimates of genetic parameters, had less interaction with the environment, and greater variability.

  16. The Association between Bullying and Psychological Health among Senior High School Students in Ghana, West Africa

    ERIC Educational Resources Information Center

    Owusu, Andrew; Hart, Peter; Oliver, Brittney; Kang, Minsoo

    2011-01-01

    Background: School-based bullying, a global challenge, negatively impacts the health and development of both victims and perpetrators. This study examined the relationship between bullying victimization and selected psychological variables among senior high school (SHS) students in Ghana, West Africa. Methods: This study utilized data from the…

  17. College Choice and Student Selection of an Information-Technology Degree Program

    ERIC Educational Resources Information Center

    Severson, Christopher S.

    2017-01-01

    This mixed-methods study analyzed data from 906 Ruffalo Noel Levitz Student Satisfaction Surveys (RNLSSS) at a two-year technical college in Wisconsin, to determine influences on participants' choices of programs in information technology (IT). Quantitative results showed that none of eight variables (i.e., cost, financial assistance, academic…

  18. USE OF GIS AND ANCILLARY VARIABLES TO PREDICT VOLATILE ORGANIC COMPOUND AND NITROGEN DIOXIDE LEVELS AT UNMONITORED LOCATIONS

    EPA Science Inventory

    This paper presents a GIS-based regression spatial method, known as land-use regression (LUR) modeling, to estimate ambient air pollution exposures used in the EPA El Paso Children's Health Study. Passive measurements of select volatile organic compounds (VOC) and nitrogen dioxi...

  19. Assessing the Microstructure of Written Language Using a Retelling Paradigm

    ERIC Educational Resources Information Center

    Puranik, Cynthia S.; Lombardino, Linda J.; Altmann, Lori J. P.

    2008-01-01

    Purpose: The primary goal of this study was to document the progression of the microstructural elements of written language in children at 4 grade levels. The secondary purpose was to ascertain whether the variables selected for examination could be classified into valid categories that reflect the multidimensional nature of writing. Method:…

  20. Farmers as Consumers of Agricultural Education Services: Willingness to Pay and Spend Time

    ERIC Educational Resources Information Center

    Charatsari, Chrysanthi; Papadaki-Klavdianou, Afroditi; Michailidis, Anastasios

    2011-01-01

    This study assessed farmers' willingness to pay for and spend time attending an Agricultural Educational Program (AEP). Primary data on the demographic and socio-economic variables of farmers were collected from 355 farmers selected randomly from Northern Greece. Descriptive statistics and multivariate analysis methods were used in order to meet…

Top