ERIC Educational Resources Information Center
Hafner, Lawrence E.
A study developed a multiple regression prediction equation for each of six selected achievement variables in a popular standardized test of achievement. Subjects, 42 fourth-grade pupils randomly selected across several classes in a large elementary school in a north Florida city, were administered several standardized tests to determine predictor…
Testing Different Model Building Procedures Using Multiple Regression.
ERIC Educational Resources Information Center
Thayer, Jerome D.
The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
A Survey of UML Based Regression Testing
NASA Astrophysics Data System (ADS)
Fahad, Muhammad; Nadeem, Aamer
Regression testing is the process of ensuring software quality by analyzing whether changed parts behave as intended, and unchanged parts are not affected by the modifications. Since it is a costly process, a lot of techniques are proposed in the research literature that suggest testers how to build regression test suite from existing test suite with minimum cost. In this paper, we discuss the advantages and drawbacks of using UML diagrams for regression testing and analyze that UML model helps in identifying changes for regression test selection effectively. We survey the existing UML based regression testing techniques and provide an analysis matrix to give a quick insight into prominent features of the literature work. We discuss the open research issues like managing and reducing the size of regression test suite, prioritization of the test cases that would be helpful during strict schedule and resources that remain to be addressed for UML based regression testing.
Simultaneous Estimation of Regression Functions for Marine Corps Technical Training Specialties.
1985-01-03
Edmonton, Alberta CANADA 1 Dr. Frederic M. Lord Educational Testing Service 1 Dr. Earl Hunt Princeton, NJ 08541 Dept, of Psychology University of...111111-1.6 MICROCOPY RESOLUTION TEST CHART NATIONAL BUREAU OF STANDARDS-1963-A SIMIULTANEOUS ESTIMATION OF REGRESSION FUNCTIONS FOR MARINE CORPS...Bayesian techniques for simul- taneous estimation to the specification of regression weights for selection tests used in various technical training courses
A Demonstration of Regression False Positive Selection in Data Mining
ERIC Educational Resources Information Center
Pinder, Jonathan P.
2014-01-01
Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection ("type I errors") is…
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins. PMID:27418910
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
Random forest models to predict aqueous solubility.
Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O
2007-01-01
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
Mental chronometry with simple linear regression.
Chen, J Y
1997-10-01
Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry.
ERIC Educational Resources Information Center
Moses, Tim
2008-01-01
Nine statistical strategies for selecting equating functions in an equivalent groups design were evaluated. The strategies of interest were likelihood ratio chi-square tests, regression tests, Kolmogorov-Smirnov tests, and significance tests for equated score differences. The most accurate strategies in the study were the likelihood ratio tests…
Water quality parameter measurement using spectral signatures
NASA Technical Reports Server (NTRS)
White, P. E.
1973-01-01
Regression analysis is applied to the problem of measuring water quality parameters from remote sensing spectral signature data. The equations necessary to perform regression analysis are presented and methods of testing the strength and reliability of a regression are described. An efficient algorithm for selecting an optimal subset of the independent variables available for a regression is also presented.
Model building strategy for logistic regression: purposeful selection.
Zhang, Zhongheng
2016-03-01
Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
Simultaneous Estimation of Regression Functions for Marine Corps Technical Training Specialties.
ERIC Educational Resources Information Center
Dunbar, Stephen B.; And Others
This paper considers the application of Bayesian techniques for simultaneous estimation to the specification of regression weights for selection tests used in various technical training courses in the Marine Corps. Results of a method for m-group regression developed by Molenaar and Lewis (1979) suggest that common weights for training courses…
The Predictive Value of Selection Criteria in an Urban Magnet School
ERIC Educational Resources Information Center
Lohmeier, Jill Hendrickson; Raad, Jennifer
2012-01-01
The predictive value of selection criteria on outcome data from two cohorts of students (Total N = 525) accepted to an urban magnet high school were evaluated. Regression analyses of typical screening variables (suspensions, absences, metropolitan achievement tests, middle school grade point averages [GPAs], Matrix Analogies test scores, and…
A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection
Sabourin, Jeremy A; Valdar, William; Nobel, Andrew B
2015-01-01
Summary We describe a simple, computationally effcient, permutation-based procedure for selecting the penalty parameter in LASSO penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), Scaled Sparse Linear Regression, and a selection method based on recently developed testing procedures for the LASSO. PMID:26243050
The relative toxic response of 27 selected phenols in the 96-hr acute flowthrough Pimephales promelas (fathead minnow) and the 48- to 60-hr chronic static Tetrahymena pyriformis (ciliate protozoan) test systems was evaluated. Log Kow-dependent linear regression analyses revealed ...
Comparing State SAT Scores: Problems, Biases, and Corrections.
ERIC Educational Resources Information Center
Gohmann, Stephen F.
1988-01-01
One method to correct for selection bias in comparing Scholastic Aptitude Test (SAT) scores among states is presented, which is a modification of J. J. Heckman's Selection Bias Correction (1976, 1979). Empirical results suggest that sample selection bias is present in SAT score regressions. (SLD)
Robust support vector regression networks for function approximation with outliers.
Chuang, Chen-Chia; Su, Shun-Feng; Jeng, Jin-Tsong; Hsiao, Chih-Ching
2002-01-01
Support vector regression (SVR) employs the support vector machine (SVM) to tackle problems of function approximation and regression estimation. SVR has been shown to have good robust properties against noise. When the parameters used in SVR are improperly selected, overfitting phenomena may still occur. However, the selection of various parameters is not straightforward. Besides, in SVR, outliers may also possibly be taken as support vectors. Such an inclusion of outliers in support vectors may lead to seriously overfitting phenomena. In this paper, a novel regression approach, termed as the robust support vector regression (RSVR) network, is proposed to enhance the robust capability of SVR. In the approach, traditional robust learning approaches are employed to improve the learning performance for any selected parameters. From the simulation results, our RSVR can always improve the performance of the learned systems for all cases. Besides, it can be found that even the training lasted for a long period, the testing errors would not go up. In other words, the overfitting phenomenon is indeed suppressed.
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Notes on power of normality tests of error terms in regression models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
2009-01-01
Background Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle. Methods Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls. Results For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy. All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time. Conclusions The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended. PMID:20043835
Testing a single regression coefficient in high dimensional linear models
Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2017-01-01
In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively. PMID:28663668
Testing a single regression coefficient in high dimensional linear models.
Lan, Wei; Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2016-11-01
In linear regression models with high dimensional data, the classical z -test (or t -test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z -test to assess the significance of each covariate. Based on the p -value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.
Molecular markers of neuropsychological functioning and Alzheimer's disease.
Edwards, Melissa; Balldin, Valerie Hobson; Hall, James; O'Bryant, Sid
2015-03-01
The current project sought to examine molecular markers of neuropsychological functioning among elders with and without Alzheimer's disease (AD) and determine the predictive ability of combined molecular markers and select neuropsychological tests in detecting disease presence. Data were analyzed from 300 participants (n = 150, AD and n = 150, controls) enrolled in the Texas Alzheimer's Research and Care Consortium. Linear regression models were created to examine the link between the top five molecular markers from our AD blood profile and neuropsychological test scores. Logistical regressions were used to predict AD presence using serum biomarkers in combination with select neuropsychological measures. Using the neuropsychological test with the least amount of variance overlap with the molecular markers, the combined neuropsychological test and molecular markers was highly accurate in detecting AD presence. This work provides the foundation for the generation of a point-of-care device that can be used to screen for AD.
Bennett, Bradley C; Husby, Chad E
2008-03-28
Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
Hybrid fuel formulation and technology development
NASA Technical Reports Server (NTRS)
Dean, D. L.
1995-01-01
The objective was to develop an improved hybrid fuel with higher regression rate, a regression rate expression exponent close to 0.5, lower cost, and higher density. The approach was to formulate candidate fuels based on promising concepts, perform thermomechanical analyses to select the most promising candidates, develop laboratory processes to fabricate fuel grains as needed, fabricate fuel grains and test in a small lab-scale motor, select the best candidate, and then scale up and validate performance in a 2500 lbf scale, 11-inch diameter motor. The characteristics of a high performance fuel have been verified in 11-inch motor testing. The advanced fuel exhibits a 15% increase in density over an all hydrocarbon formulation accompanied by a 50% increase in regression rate, which when multiplied by the increase in density yields a 70% increase in fuel mass flow rate; has a significantly lower oxidizer-to-fuel (O/F) ratio requirement at 1.5; has a significantly decreased axial regression rate variation making for more uniform propellant flow throughout motor operation; is very clean burning; extinguishes cleanly and quickly; and burns with a high combustion efficiency.
NASA Astrophysics Data System (ADS)
Shi, Jinfei; Zhu, Songqing; Chen, Ruwen
2017-12-01
An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Odegård, J; Klemetsdal, G; Heringstad, B
2005-04-01
Several selection criteria for reducing incidence of mastitis were developed from a random regression sire model for test-day somatic cell score (SCS). For comparison, sire transmitting abilities were also predicted based on a cross-sectional model for lactation mean SCS. Only first-crop daughters were used in genetic evaluation of SCS, and the different selection criteria were compared based on their correlation with incidence of clinical mastitis in second-crop daughters (measured as mean daughter deviations). Selection criteria were predicted based on both complete and reduced first-crop daughter groups (261 or 65 daughters per sire, respectively). For complete daughter groups, predicted transmitting abilities at around 30 d in milk showed the best predictive ability for incidence of clinical mastitis, closely followed by average predicted transmitting abilities over the entire lactation. Both of these criteria were derived from the random regression model. These selection criteria improved accuracy of selection by approximately 2% relative to a cross-sectional model. However, for reduced daughter groups, the cross-sectional model yielded increased predictive ability compared with the selection criteria based on the random regression model. This result may be explained by the cross-sectional model being more robust, i.e., less sensitive to precision of (co)variance components estimates and effects of data structure.
Determination of suitable drying curve model for bread moisture loss during baking
NASA Astrophysics Data System (ADS)
Soleimani Pour-Damanab, A. R.; Jafary, A.; Rafiee, S.
2013-03-01
This study presents mathematical modelling of bread moisture loss or drying during baking in a conventional bread baking process. In order to estimate and select the appropriate moisture loss curve equation, 11 different models, semi-theoretical and empirical, were applied to the experimental data and compared according to their correlation coefficients, chi-squared test and root mean square error which were predicted by nonlinear regression analysis. Consequently, of all the drying models, a Page model was selected as the best one, according to the correlation coefficients, chi-squared test, and root mean square error values and its simplicity. Mean absolute estimation error of the proposed model by linear regression analysis for natural and forced convection modes was 2.43, 4.74%, respectively.
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk
2005-01-01
We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...
Suzuki, Taku; Iwamoto, Takuji; Shizu, Kanae; Suzuki, Katsuji; Yamada, Harumoto; Sato, Kazuki
2017-05-01
This retrospective study was designed to investigate prognostic factors for postoperative outcomes for cubital tunnel syndrome (CubTS) using multiple logistic regression analysis with a large number of patients. Eighty-three patients with CubTS who underwent surgeries were enrolled. The following potential prognostic factors for disease severity were selected according to previous reports: sex, age, type of surgery, disease duration, body mass index, cervical lesion, presence of diabetes mellitus, Workers' Compensation status, preoperative severity, and preoperative electrodiagnostic testing. Postoperative severity of disease was assessed 2 years after surgery by Messina's criteria which is an outcome measure specifically for CubTS. Bivariate analysis was performed to select candidate prognostic factors for multiple linear regression analyses. Multiple logistic regression analysis was conducted to identify the association between postoperative severity and selected prognostic factors. Both bivariate and multiple linear regression analysis revealed only preoperative severity as an independent risk factor for poor prognosis, while other factors did not show any significant association. Although conflicting results exist regarding prognosis of CubTS, this study supports evidence from previous studies and concludes early surgical intervention portends the most favorable prognosis. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Fang, Xingang; Bagui, Sikha; Bagui, Subhash
2017-08-01
The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei
2017-02-01
Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
Wildfire Selectivity for Land Cover Type: Does Size Matter?
Barros, Ana M. G.; Pereira, José M. C.
2014-01-01
Previous research has shown that fires burn certain land cover types disproportionally to their abundance. We used quantile regression to study land cover proneness to fire as a function of fire size, under the hypothesis that they are inversely related, for all land cover types. Using five years of fire perimeters, we estimated conditional quantile functions for lower (avoidance) and upper (preference) quantiles of fire selectivity for five land cover types - annual crops, evergreen oak woodlands, eucalypt forests, pine forests and shrublands. The slope of significant regression quantiles describes the rate of change in fire selectivity (avoidance or preference) as a function of fire size. We used Monte-Carlo methods to randomly permutate fires in order to obtain a distribution of fire selectivity due to chance. This distribution was used to test the null hypotheses that 1) mean fire selectivity does not differ from that obtained by randomly relocating observed fire perimeters; 2) that land cover proneness to fire does not vary with fire size. Our results show that land cover proneness to fire is higher for shrublands and pine forests than for annual crops and evergreen oak woodlands. As fire size increases, selectivity decreases for all land cover types tested. Moreover, the rate of change in selectivity with fire size is higher for preference than for avoidance. Comparison between observed and randomized data led us to reject both null hypotheses tested ( = 0.05) and to conclude it is very unlikely the observed values of fire selectivity and change in selectivity with fire size are due to chance. PMID:24454747
RRegrs: an R package for computer-aided model selection with multiple regression models.
Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L
2015-01-01
Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.
Genetic analysis of groups of mid-infrared predicted fatty acids in milk.
Narayana, S G; Schenkel, F S; Fleming, A; Koeck, A; Malchiodi, F; Jamrozik, J; Johnston, J; Sargolzaei, M; Miglior, F
2017-06-01
The objective of this study was to investigate genetic variability of mid-infrared predicted fatty acid groups in Canadian Holstein cattle. Genetic parameters were estimated for 5 groups of fatty acids: short-chain (4 to 10 carbons), medium-chain (11 to 16 carbons), long-chain (17 to 22 carbons), saturated, and unsaturated fatty acids. The data set included 49,127 test-day records from 10,029 first-lactation Holstein cows in 810 herds. The random regression animal test-day model included days in milk, herd-test date, and age-season of calving (polynomial regression) as fixed effects, herd-year of calving, animal additive genetic effect, and permanent environment effects as random polynomial regressions, and random residual effect. Legendre polynomials of the third degree were selected for the fixed regression for age-season of calving effect and Legendre polynomials of the fourth degree were selected for the random regression for animal additive genetic, permanent environment, and herd-year effect. The average daily heritability over the lactation for the medium-chain fatty acid group (0.32) was higher than for the short-chain (0.24) and long-chain (0.23) fatty acid groups. The average daily heritability for the saturated fatty acid group (0.33) was greater than for the unsaturated fatty acid group (0.21). Estimated average daily genetic correlations were positive among all fatty acid groups and ranged from moderate to high (0.63-0.96). The genetic correlations illustrated similarities and differences in their origin and the makeup of the groupings based on chain length and saturation. These results provide evidence for the existence of genetic variation in mid-infrared predicted fatty acid groups, and the possibility of improving milk fatty acid profile through genetic selection in Canadian dairy cattle. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Balabin, Roman M; Smirnov, Sergey V
2011-04-29
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice. Copyright © 2011 Elsevier B.V. All rights reserved.
Complex Intellect vs the IQ Test as a Predictor of Performance.
ERIC Educational Resources Information Center
Dees, James W.
In order to test the ubiquity of the structure of the intellect for predictors of performance, a psychomotor skill (M 16 rifle proficiency test), a measure of perseverance (completion or resignation from OCS Program), and a measure of leadership ability (peer ratings) were selected as criteria on which multiple regressions were conducted with a…
Plant selection for ethnobotanical uses on the Amalfi Coast (Southern Italy).
Savo, V; Joy, R; Caneva, G; McClatchey, W C
2015-07-15
Many ethnobotanical studies have investigated selection criteria for medicinal and non-medicinal plants. In this paper we test several statistical methods using different ethnobotanical datasets in order to 1) define to which extent the nature of the datasets can affect the interpretation of results; 2) determine if the selection for different plant uses is based on phylogeny, or other selection criteria. We considered three different ethnobotanical datasets: two datasets of medicinal plants and a dataset of non-medicinal plants (handicraft production, domestic and agro-pastoral practices) and two floras of the Amalfi Coast. We performed residual analysis from linear regression, the binomial test and the Bayesian approach for calculating under-used and over-used plant families within ethnobotanical datasets. Percentages of agreement were calculated to compare the results of the analyses. We also analyzed the relationship between plant selection and phylogeny, chorology, life form and habitat using the chi-square test. Pearson's residuals for each of the significant chi-square analyses were examined for investigating alternative hypotheses of plant selection criteria. The three statistical analysis methods differed within the same dataset, and between different datasets and floras, but with some similarities. In the two medicinal datasets, only Lamiaceae was identified in both floras as an over-used family by all three statistical methods. All statistical methods in one flora agreed that Malvaceae was over-used and Poaceae under-used, but this was not found to be consistent with results of the second flora in which one statistical result was non-significant. All other families had some discrepancy in significance across methods, or floras. Significant over- or under-use was observed in only a minority of cases. The chi-square analyses were significant for phylogeny, life form and habitat. Pearson's residuals indicated a non-random selection of woody species for non-medicinal uses and an under-use of plants of temperate forests for medicinal uses. Our study showed that selection criteria for plant uses (including medicinal) are not always based on phylogeny. The comparison of different statistical methods (regression, binomial and Bayesian) under different conditions led to the conclusion that the most conservative results are obtained using regression analysis.
Fast function-on-scalar regression with penalized basis expansions.
Reiss, Philip T; Huang, Lei; Mennes, Maarten
2010-01-01
Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.
A Solution to Separation and Multicollinearity in Multiple Logistic Regression
Shen, Jianzhao; Gao, Sujuan
2010-01-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
2015-01-01
Background Over the past 50,000 years, shifts in human-environmental or human-human interactions shaped genetic differences within and among human populations, including variants under positive selection. Shaped by environmental factors, such variants influence the genetics of modern health, disease, and treatment outcome. Because evolutionary processes tend to act on gene regulation, we test whether regulatory variants are under positive selection. We introduce a new approach to enhance detection of genetic markers undergoing positive selection, using conditional entropy to capture recent local selection signals. Results We use conditional logistic regression to compare our Adjusted Haplotype Conditional Entropy (H|H) measure of positive selection to existing positive selection measures. H|H and existing measures were applied to published regulatory variants acting in cis (cis-eQTLs), with conditional logistic regression testing whether regulatory variants undergo stronger positive selection than the surrounding gene. These cis-eQTLs were drawn from six independent studies of genotype and RNA expression. The conditional logistic regression shows that, overall, H|H is substantially more powerful than existing positive-selection methods in identifying cis-eQTLs against other Single Nucleotide Polymorphisms (SNPs) in the same genes. When broken down by Gene Ontology, H|H predictions are particularly strong in some biological process categories, where regulatory variants are under strong positive selection compared to the bulk of the gene, distinct from those GO categories under overall positive selection. . However, cis-eQTLs in a second group of genes lack positive selection signatures detectable by H|H, consistent with ancient short haplotypes compared to the surrounding gene (for example, in innate immunity GO:0042742); under such other modes of selection, H|H would not be expected to be a strong predictor.. These conditional logistic regression models are adjusted for Minor allele frequency(MAF); otherwise, ascertainment bias is a huge factor in all eQTL data sets. Relationships between Gene Ontology categories, positive selection and eQTL specificity were replicated with H|H in a single larger data set. Our measure, Adjusted Haplotype Conditional Entropy (H|H), was essential in generating all of the results above because it: 1) is a stronger overall predictor for eQTLs than comparable existing approaches, and 2) shows low sequential auto-correlation, overcoming problems with convergence of these conditional regression statistical models. Conclusions Our new method, H|H, provides a consistently more robust signal associated with cis-eQTLs compared to existing methods. We interpret this to indicate that some cis-eQTLs are under positive selection compared to their surrounding genes. Conditional entropy indicative of a selective sweep is an especially strong predictor of eQTLs for genes in several biological processes of medical interest. Where conditional entropy is a weak or negative predictor of eQTLs, such as innate immune genes, this would be consistent with balancing selection acting on such eQTLs over long time periods. Different measures of selection may be needed for variant prioritization under other modes of evolutionary selection. PMID:26111110
Wavelet regression model in forecasting crude oil price
NASA Astrophysics Data System (ADS)
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Evidence-based selection process to the Master of Public Health program at Medical University.
Panczyk, Mariusz; Juszczyk, Grzegorz; Zarzeka, Aleksander; Samoliński, Łukasz; Belowska, Jarosława; Cieślak, Ilona; Gotlib, Joanna
2017-09-11
Evaluation of the predictive validity of selected sociodemographic factors and admission criteria for Master's studies in Public Health at the Faculty of Health Sciences, Medical University of Warsaw (MUW). For the evaluation purposes recruitment data and learning results of students enrolled between 2008 and 2012 were used (N = 605, average age 22.9 ± 3.01). The predictive analysis was performed using the multiple linear regression method. In the proposed regression model 12 predictors were selected, including: sex, age, professional degree (BA), the Bachelor's studies grade point average (GPA), total score of the preliminary examination broken down into five thematic areas. Depending on the tested model, one of two dependent variables was used: first-year GPA or cumulative GPA in the Master program. The regression model based on the result variable of Master's GPA program was better matched to data in comparison to the model based on the first year GPA (adjusted R 2 0.413 versus 0.476 respectively). The Bachelor's studies GPA and each of the five subtests comprising the test entrance exam were significant predictors of success achieved by a student both after the first year and at the end of the course of studies. Criteria of admissions with total score of MCQs exam and Bachelor's studies GPA can be successfully used for selection of the candidates for Master's degree studies in Public Health. The high predictive validity of the recruitment system confirms the validity of the adopted admission policy at MUW.
NASA Astrophysics Data System (ADS)
He, Song-Bing; Ben Hu; Kuang, Zheng-Kun; Wang, Dong; Kong, De-Xin
2016-11-01
Adenosine receptors (ARs) are potential therapeutic targets for Parkinson’s disease, diabetes, pain, stroke and cancers. Prediction of subtype selectivity is therefore important from both therapeutic and mechanistic perspectives. In this paper, we introduced a shape similarity profile as molecular descriptor, namely three-dimensional biologically relevant spectrum (BRS-3D), for AR selectivity prediction. Pairwise regression and discrimination models were built with the support vector machine methods. The average determination coefficient (r2) of the regression models was 0.664 (for test sets). The 2B-3 (A2B vs A3) model performed best with q2 = 0.769 for training sets (10-fold cross-validation), and r2 = 0.766, RMSE = 0.828 for test sets. The models’ robustness and stability were validated with 100 times resampling and 500 times Y-randomization. We compared the performance of BRS-3D with 3D descriptors calculated by MOE. BRS-3D performed as good as, or better than, MOE 3D descriptors. The performances of the discrimination models were also encouraging, with average accuracy (ACC) 0.912 and MCC 0.792 (test set). The 2A-3 (A2A vs A3) selectivity discrimination model (ACC = 0.882 and MCC = 0.715 for test set) outperformed an earlier reported one (ACC = 0.784). These results demonstrated that, through multiple conformation encoding, BRS-3D can be used as an effective molecular descriptor for AR subtype selectivity prediction.
ERIC Educational Resources Information Center
Melguizo, Tatiana
2010-01-01
The study takes advantage of the nontraditional selection process of the Gates Millennium Scholars (GMS) program to test the association between selectivity of 4-year institution attended as well as other noncognitive variables on the college completion rates of a sample of students of color. The results of logistic regression and propensity score…
Eggert, D L; Nielsen, M K
2006-02-01
Three replications of mouse selection populations for high heat loss (MH), low heat loss (ML), and a nonselected control (MC) were used to estimate the feed energy costs of maintenance and gain and to test whether selection had changed these costs. At 21 and 49 d of age, mice were weighed and subjected to dual x-ray densitometry measurement for prediction of body composition. At 21 d, mice were randomly assigned to an ad libitum, an 80% of ad libitum, or a 60% of ad libitum feeding group for 28-d collection of individual feed intake. Data were analyzed using 3 approaches. The first approach was an attempt to partition energy intake between costs for maintenance, fat deposition, and lean deposition for each replicate, sex, and line by multiple regression of feed intake on the sum of daily metabolic weight (kg(0.75)), fat gain, and lean gain. Approach II was a less restrictive attempt to partition energy intake between costs for maintenance and total gain for each replicate, sex, and line by multiple regression of feed intake on the sum of daily metabolic weight and total gain. Approach III used multiple regression on the entire data set with pooled regressions on fat and lean gains, and subclass regressions for maintenance. Contrasts were conducted to test the effect of selection (MH - ML) and asymmetry of selection [(MH + ML)/2 - MC] for the various energy costs. In approach I, there were no differences between lines for costs of maintenance, fat deposition, or protein deposition, but we question our ability to estimate these accurately. In approach II, selection changed both cost of maintenance (P = 0.03) and gain (P = 0.05); MH mice had greater per unit costs than ML mice for both. Asymmetry of the selection response was found in approach II for the cost of maintenance (P = 0.06). In approach III, the effect of selection (P < 0.01) contributed to differences in the maintenance cost, but asymmetry of selection (P > 0.17) was not evident. Sex effects were found for the cost of fat deposition (P = 0.02) in approach I and the cost of gain (P = 0.001) in approach II; females had a greater cost per unit than males. When costs per unit of fat and per unit of lean gain were assumed to be the same for both sexes (approach III), females had a somewhat greater estimate for maintenance cost (P = 0.10). We conclude that selection for heat loss has changed the costs for maintenance per unit size but probably not the costs for gain.
ERIC Educational Resources Information Center
Vrieze, Scott I.
2012-01-01
This article reviews the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) in model selection and the appraisal of psychological theory. The focus is on latent variable models, given their growing use in theory testing and construction. Theoretical statistical results in regression are discussed, and more important…
Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.
Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu
2016-08-01
This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Regression Models For Multivariate Count Data
Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei
2016-01-01
Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. PMID:28348500
Regression Models For Multivariate Count Data.
Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei
2017-01-01
Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.
ERIC Educational Resources Information Center
Myers, Douglas D.
Regression analysis was employed to determine if there were any similarities between the tests administered to participants of the Mountain-Plains program, a residential, family-based education program developed to improve the economic potential and lifestyle of selected student families in a six-state area. The study compared the Wide Range…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernad-Beltrán, D.; Simó, A.; Bovea, M.D., E-mail: bovea@uji.es
Highlights: • Attitude towards incorporating biowaste selective collection is analysed. • Willingness to participate and to pay in biowaste selective collection is obtained. • Socioeconomic aspects affecting WtParticipate and WtPay are identified. - Abstract: European waste legislation has been encouraging for years the incorporation of selective collection systems for the biowaste fraction. European countries are therefore incorporating it into their current municipal solid waste management (MSWM) systems. However, this incorporation involves changes in the current waste management habits of households. In this paper, the attitude of the public towards the incorporation of selective collection of biowaste into an existing MSWMmore » system in a Spanish municipality is analysed. A semi-structured telephone interview was used to obtain information regarding aspects such as: level of participation in current waste collection systems, willingness to participate in selective collection of biowaste, reasons and barriers that affect participation, willingness to pay for the incorporation of the selective collection of biowaste and the socioeconomic characteristics of citizens who are willing to participate and pay for selective collection of biowaste. The results showed that approximately 81% of the respondents were willing to participate in selective collection of biowaste. This percentage would increase until 89% if the Town Council provided specific waste bins and bags, since the main barrier to participate in the new selective collection system is the need to use specific waste bin and bags for the separation of biowaste. A logit response model was applied to estimate the average willingness to pay, obtaining an estimated mean of 7.5% on top of the current waste management annual tax. The relationship of willingness to participate and willingness to pay for the implementation of this new selective collection with the socioeconomic variables (age, gender, size of the household, work, education and income) was analysed. Chi-square independence tests and binary logistic regression was used for willingness to participate, not being obtained any significant relationship. Chi-square independence tests, ordinal logistic regression and ordinary linear regression was applied for willingness to pay, obtaining statistically significant relationship for most of the socioeconomic variables.« less
Forecasting volatility with neural regression: a contribution to model adequacy.
Refenes, A N; Holt, W T
2001-01-01
Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations.
Using exogenous variables in testing for monotonic trends in hydrologic time series
Alley, William M.
1988-01-01
One approach that has been used in performing a nonparametric test for monotonic trend in a hydrologic time series consists of a two-stage analysis. First, a regression equation is estimated for the variable being tested as a function of an exogenous variable. A nonparametric trend test such as the Kendall test is then performed on the residuals from the equation. By analogy to stagewise regression and through Monte Carlo experiments, it is demonstrated that this approach will tend to underestimate the magnitude of the trend and to result in some loss in power as a result of ignoring the interaction between the exogenous variable and time. An alternative approach, referred to as the adjusted variable Kendall test, is demonstrated to generally have increased statistical power and to provide more reliable estimates of the trend slope. In addition, the utility of including an exogenous variable in a trend test is examined under selected conditions.
Kothe, Christian; Hissbach, Johanna; Hampe, Wolfgang
2013-01-01
Introduction: The present study examines the question whether the selection of dental students should be based solely on average school-leaving grades (GPA) or whether it could be improved by using a subject-specific aptitude test. Methods: The HAM-Nat Natural Sciences Test was piloted with freshmen during their first study week in 2006 and 2007. In 2009 and 2010 it was used in the dental student selection process. The sample size in the regression models varies between 32 and 55 students. Results: Used as a supplement to the German GPA, the HAM-Nat test explained up to 12% of the variance in preclinical examination performance. We confirmed the prognostic validity of GPA reported in earlier studies in some, but not all of the individual preclinical examination results. Conclusion: The HAM-Nat test is a reliable selection tool for dental students. Use of the HAM-Nat yielded a significant improvement in prediction of preclinical academic success in dentistry. PMID:24282449
Saleem, Taimur; Ishaque, Sidra; Habib, Nida; Hussain, Syedda Saadia; Jawed, Areeba; Khan, Aamir Ali; Ahmad, Muhammad Imran; Iftikhar, Mian Omer; Mughal, Hamza Pervez; Jehan, Imtiaz
2009-01-01
Background To determine the knowledge, attitudes and practices regarding organ donation in a selected adult population in Pakistan. Methods Convenience sampling was used to generate a sample of 440; 408 interviews were successfully completed and used for analysis. Data collection was carried out via a face to face interview based on a pre-tested questionnaire in selected public areas of Karachi, Pakistan. Data was analyzed using SPSS v.15 and associations were tested using the Pearson's Chi square test. Multiple logistic regression was used to find independent predictors of knowledge status and motivation of organ donation. Results Knowledge about organ donation was significantly associated with education (p = 0.000) and socioeconomic status (p = 0.038). 70/198 (35.3%) people expressed a high motivation to donate. Allowance of organ donation in religion was significantly associated with the motivation to donate (p = 0.000). Multiple logistic regression analysis revealed that higher level of education and higher socioeconomic status were significant (p < 0.05) independent predictors of knowledge status of organ donation. For motivation, multiple logistic regression revealed that higher socioeconomic status, adequate knowledge score and belief that organ donation is allowed in religion were significant (p < 0.05) independent predictors. Television emerged as the major source of information. Only 3.5% had themselves donated an organ; with only one person being an actual kidney donor. Conclusion Better knowledge may ultimately translate into the act of donation. Effective measures should be taken to educate people with relevant information with the involvement of media, doctors and religious scholars. PMID:19534793
NASA Astrophysics Data System (ADS)
Min, Qing-xu; Zhu, Jun-zhen; Feng, Fu-zhou; Xu, Chao; Sun, Ji-wei
2017-06-01
In this paper, the lock-in vibrothermography (LVT) is utilized for defect detection. Specifically, for a metal plate with an artificial fatigue crack, the temperature rise of the defective area is used for analyzing the influence of different test conditions, i.e. engagement force, excitation intensity, and modulated frequency. The multivariate nonlinear and logistic regression models are employed to estimate the POD (probability of detection) and POA (probability of alarm) of fatigue crack, respectively. The resulting optimal selection of test conditions is presented. The study aims to provide an optimized selection method of the test conditions in the vibrothermography system with the enhanced detection ability.
Ting, Hui-Min; Chang, Liyun; Huang, Yu-Jie; Wu, Jia-Ming; Wang, Hung-Yu; Horng, Mong-Fong; Chang, Chun-Ming; Lan, Jen-Hong; Huang, Ya-Yu; Fang, Fu-Min; Leung, Stephen Wan
2014-01-01
Purpose The aim of this study was to develop a multivariate logistic regression model with least absolute shrinkage and selection operator (LASSO) to make valid predictions about the incidence of moderate-to-severe patient-rated xerostomia among head and neck cancer (HNC) patients treated with IMRT. Methods and Materials Quality of life questionnaire datasets from 206 patients with HNC were analyzed. The European Organization for Research and Treatment of Cancer QLQ-H&N35 and QLQ-C30 questionnaires were used as the endpoint evaluation. The primary endpoint (grade 3+ xerostomia) was defined as moderate-to-severe xerostomia at 3 (XER3m) and 12 months (XER12m) after the completion of IMRT. Normal tissue complication probability (NTCP) models were developed. The optimal and suboptimal numbers of prognostic factors for a multivariate logistic regression model were determined using the LASSO with bootstrapping technique. Statistical analysis was performed using the scaled Brier score, Nagelkerke R2, chi-squared test, Omnibus, Hosmer-Lemeshow test, and the AUC. Results Eight prognostic factors were selected by LASSO for the 3-month time point: Dmean-c, Dmean-i, age, financial status, T stage, AJCC stage, smoking, and education. Nine prognostic factors were selected for the 12-month time point: Dmean-i, education, Dmean-c, smoking, T stage, baseline xerostomia, alcohol abuse, family history, and node classification. In the selection of the suboptimal number of prognostic factors by LASSO, three suboptimal prognostic factors were fine-tuned by Hosmer-Lemeshow test and AUC, i.e., Dmean-c, Dmean-i, and age for the 3-month time point. Five suboptimal prognostic factors were also selected for the 12-month time point, i.e., Dmean-i, education, Dmean-c, smoking, and T stage. The overall performance for both time points of the NTCP model in terms of scaled Brier score, Omnibus, and Nagelkerke R2 was satisfactory and corresponded well with the expected values. Conclusions Multivariate NTCP models with LASSO can be used to predict patient-rated xerostomia after IMRT. PMID:24586971
Lee, Tsair-Fwu; Chao, Pei-Ju; Ting, Hui-Min; Chang, Liyun; Huang, Yu-Jie; Wu, Jia-Ming; Wang, Hung-Yu; Horng, Mong-Fong; Chang, Chun-Ming; Lan, Jen-Hong; Huang, Ya-Yu; Fang, Fu-Min; Leung, Stephen Wan
2014-01-01
The aim of this study was to develop a multivariate logistic regression model with least absolute shrinkage and selection operator (LASSO) to make valid predictions about the incidence of moderate-to-severe patient-rated xerostomia among head and neck cancer (HNC) patients treated with IMRT. Quality of life questionnaire datasets from 206 patients with HNC were analyzed. The European Organization for Research and Treatment of Cancer QLQ-H&N35 and QLQ-C30 questionnaires were used as the endpoint evaluation. The primary endpoint (grade 3(+) xerostomia) was defined as moderate-to-severe xerostomia at 3 (XER3m) and 12 months (XER12m) after the completion of IMRT. Normal tissue complication probability (NTCP) models were developed. The optimal and suboptimal numbers of prognostic factors for a multivariate logistic regression model were determined using the LASSO with bootstrapping technique. Statistical analysis was performed using the scaled Brier score, Nagelkerke R(2), chi-squared test, Omnibus, Hosmer-Lemeshow test, and the AUC. Eight prognostic factors were selected by LASSO for the 3-month time point: Dmean-c, Dmean-i, age, financial status, T stage, AJCC stage, smoking, and education. Nine prognostic factors were selected for the 12-month time point: Dmean-i, education, Dmean-c, smoking, T stage, baseline xerostomia, alcohol abuse, family history, and node classification. In the selection of the suboptimal number of prognostic factors by LASSO, three suboptimal prognostic factors were fine-tuned by Hosmer-Lemeshow test and AUC, i.e., Dmean-c, Dmean-i, and age for the 3-month time point. Five suboptimal prognostic factors were also selected for the 12-month time point, i.e., Dmean-i, education, Dmean-c, smoking, and T stage. The overall performance for both time points of the NTCP model in terms of scaled Brier score, Omnibus, and Nagelkerke R(2) was satisfactory and corresponded well with the expected values. Multivariate NTCP models with LASSO can be used to predict patient-rated xerostomia after IMRT.
LANDSAT (MSS): Image demographic estimations
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Foresti, C.
1977-01-01
The author has identified the following significant results. Two sets of urban test sites, one with 35 cities and one with 70 cities, were selected in the State, Sao Paulo. A high degree of colinearity (0.96) was found between urban and areal measurements taken from aerial photographs and LANDSAT MSS imagery. High coefficients were observed when census data were regressed against aerial information (0.95) and LANDSAT data (0.92). The validity of population estimations was tested by regressing three urban variables, against three classes of cities. Results supported the effectiveness of LANDSAT to estimate large city populations with diminishing effectiveness as urban areas decrease in size.
Sun, Yanqing; Sun, Liuquan; Zhou, Jie
2013-07-01
This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.
Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung
2018-01-01
The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (P<0.05) to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD) and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD). However, it was inferior (P<0.05) to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD) and the AUC without CDD (0.785 vs. 0.844, P<0.001), but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141). Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Ensemble habitat mapping of invasive plant species
Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.
2010-01-01
Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.
Impact of multicollinearity on small sample hydrologic regression models
NASA Astrophysics Data System (ADS)
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
[In vitro testing of yeast resistance to antimycotic substances].
Potel, J; Arndt, K
1982-01-01
Investigations have been carried out in order to clarify the antibiotic susceptibility determination of yeasts. 291 yeast strains of different species were tested for sensitivity to 7 antimycotics: amphotericin B, flucytosin, nystatin, pimaricin, clotrimazol, econazol and miconazol. Additionally to the evaluation of inhibition zone diameters and MIC-values the influence of pH was examined. 1. The dependence of inhibition zone diameters upon pH-values varies due to the antimycotic tested. For standardizing purposes the pH 6.0 is proposed; moreover, further experimental parameters, such as nutrient composition, agar depth, cell density, incubation time and -temperature, have to be normed. 2. The relation between inhibition zone size and logarythmic MIC does not fit a linear regression analysis when all species are considered together. Therefore regression functions have to be calculated selecting the individual species. In case of the antimycotics amphotericin B, nystatin and pimaricin the low scattering of the MIC-values does not allow regression analysis. 3. A quantitative susceptibility determination of yeasts--particularly to the fungistatical substances with systemic applicability, flucytosin and miconazol, -- is advocated by the results of the MIC-tests.
Weigel, K A; de los Campos, G; González-Recio, O; Naya, H; Wu, X L; Long, N; Rosa, G J M; Gianola, D
2009-10-01
The objective of the present study was to assess the predictive ability of subsets of single nucleotide polymorphism (SNP) markers for development of low-cost, low-density genotyping assays in dairy cattle. Dense SNP genotypes of 4,703 Holstein bulls were provided by the USDA Agricultural Research Service. A subset of 3,305 bulls born from 1952 to 1998 was used to fit various models (training set), and a subset of 1,398 bulls born from 1999 to 2002 was used to evaluate their predictive ability (testing set). After editing, data included genotypes for 32,518 SNP and August 2003 and April 2008 predicted transmitting abilities (PTA) for lifetime net merit (LNM$), the latter resulting from progeny testing. The Bayesian least absolute shrinkage and selection operator method was used to regress August 2003 PTA on marker covariates in the training set to arrive at estimates of marker effects and direct genomic PTA. The coefficient of determination (R(2)) from regressing the April 2008 progeny test PTA of bulls in the testing set on their August 2003 direct genomic PTA was 0.375. Subsets of 300, 500, 750, 1,000, 1,250, 1,500, and 2,000 SNP were created by choosing equally spaced and highly ranked SNP, with the latter based on the absolute value of their estimated effects obtained from the training set. The SNP effects were re-estimated from the training set for each subset of SNP, and the 2008 progeny test PTA of bulls in the testing set were regressed on corresponding direct genomic PTA. The R(2) values for subsets of 300, 500, 750, 1,000, 1,250, 1,500, and 2,000 SNP with largest effects (evenly spaced SNP) were 0.184 (0.064), 0.236 (0.111), 0.269 (0.190), 0.289 (0.179), 0.307 (0.228), 0.313 (0.268), and 0.322 (0.291), respectively. These results indicate that a low-density assay comprising selected SNP could be a cost-effective alternative for selection decisions and that significant gains in predictive ability may be achieved by increasing the number of SNP allocated to such an assay from 300 or fewer to 1,000 or more.
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms
NASA Astrophysics Data System (ADS)
Yadav, B.; Hatfield, K.
2017-12-01
We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
Berry, D P; Buckley, F; Dillon, P; Evans, R D; Rath, M; Veerkamp, R F
2003-11-01
Genetic (co)variances between body condition score (BCS), body weight (BW), milk yield, and fertility were estimated using a random regression animal model extended to multivariate analysis. The data analyzed included 81,313 BCS observations, 91,937 BW observations, and 100,458 milk test-day yields from 8725 multiparous Holstein-Friesian cows. A cubic random regression was sufficient to model the changing genetic variances for BCS, BW, and milk across different days in milk. The genetic correlations between BCS and fertility changed little over the lactation; genetic correlations between BCS and interval to first service and between BCS and pregnancy rate to first service varied from -0.47 to -0.31, and from 0.15 to 0.38, respectively. This suggests that maximum genetic gain in fertility from indirect selection on BCS should be based on measurements taken in midlactation when the genetic variance for BCS is largest. Selection for increased BW resulted in shorter intervals to first service, but more services and poorer pregnancy rates; genetic correlations between BW and pregnancy rate to first service varied from -0.52 to -0.45. Genetic selection for higher lactation milk yield alone through selection on increased milk yield in early lactation is likely to have a more deleterious effect on genetic merit for fertility than selection on higher milk yield in late lactation.
USDA-ARS?s Scientific Manuscript database
Selective principal component regression analysis (SPCR) uses a subset of the original image bands for principal component transformation and regression. For optimal band selection before the transformation, this paper used genetic algorithms (GA). In this case, the GA process used the regression co...
Shrinkage regression-based methods for microarray missing value imputation.
Wang, Hsiuying; Chiu, Chia-Chun; Wu, Yi-Ching; Wu, Wei-Sheng
2013-01-01
Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.
ERIC Educational Resources Information Center
Dickson, Ginger L.; Jepsen, David A.
2007-01-01
The authors surveyed a national sample of master's-level counseling students regarding their multicultural training experiences and their multicultural counseling competencies. A series of hierarchical regression models tested the prediction of inventoried competencies from measures of selected training experiences: (a) program cultural ambience…
Cui, Zaixu; Gong, Gaolang
2018-06-02
Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
Model selection with multiple regression on distance matrices leads to incorrect inferences.
Franckowiak, Ryan P; Panasci, Michael; Jarvis, Karl J; Acuña-Rodriguez, Ian S; Landguth, Erin L; Fortin, Marie-Josée; Wagner, Helene H
2017-01-01
In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Valente, Bruno D.; Morota, Gota; Peñagaricano, Francisco; Gianola, Daniel; Weigel, Kent; Rosa, Guilherme J. M.
2015-01-01
The term “effect” in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability. PMID:25908318
Comparison of statistical tests for association between rare variants and binary traits.
Bacanu, Silviu-Alin; Nelson, Matthew R; Whittaker, John C
2012-01-01
Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait's variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim
2014-01-01
Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Al-Ghatani, Ali M; Obonsawin, Marc C; Binshaig, Basmah A; Al-Moutaery, Khalaf R
2011-01-01
There are 2 aims for this study: first, to collect normative data for the Wisconsin Card Sorting Test (WCST), Stroop test, Test of Non-verbal Intelligence (TONI-3), Picture Completion (PC) and Vocabulary (VOC) sub-test of the Wechsler Adult Intelligence Scale-Revised for use in a Saudi Arabian culture, and second, to use the normative data provided to generate the regression equations. To collect the normative data and generate the regression equations, 198 healthy individuals were selected to provide a representative distribution for age, gender, years of education, and socioeconomic class. The WCST, Stroop test, TONI-3, PC, and VOC were administrated to the healthy individuals. This study was carried out at the Department of Clinical Neurosciences, Riyadh Military Hospital, Riyadh, Kingdom of Saudi Arabia from January 2000 to July 2002. Normative data were obtained for all tests, and tables were constructed to interpret scores for different age groups. Regression equations to predict performance on the 3 tests of frontal function from scores on tests of fluid (TONI-3) and premorbid intelligence were generated from the data from the healthy individuals. The data collected in this study provide normative tables for 3 tests of frontal lobe function and for tests of general intellectual ability for use in Saudi Arabia. The data also provide a method to estimate pre-injury ability without the use of verbally based tests.
Creating a non-linear total sediment load formula using polynomial best subset regression model
NASA Astrophysics Data System (ADS)
Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali
2016-08-01
The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.
Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W
1992-03-01
The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.
Efficient logistic regression designs under an imperfect population identifier.
Albert, Paul S; Liu, Aiyi; Nansel, Tonja
2014-03-01
Motivated by actual study designs, this article considers efficient logistic regression designs where the population is identified with a binary test that is subject to diagnostic error. We consider the case where the imperfect test is obtained on all participants, while the gold standard test is measured on a small chosen subsample. Under maximum-likelihood estimation, we evaluate the optimal design in terms of sample selection as well as verification. We show that there may be substantial efficiency gains by choosing a small percentage of individuals who test negative on the imperfect test for inclusion in the sample (e.g., verifying 90% test-positive cases). We also show that a two-stage design may be a good practical alternative to a fixed design in some situations. Under optimal and nearly optimal designs, we compare maximum-likelihood and semi-parametric efficient estimators under correct and misspecified models with simulations. The methodology is illustrated with an analysis from a diabetes behavioral intervention trial. © 2013, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.
Gimelfarb, A.; Willis, J. H.
1994-01-01
An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818
"Salivary exRNA biomarkers to detect gingivitis and monitor disease regression".
Kaczor-Urbanowicz, Karolina Elżbieta; Trivedi, Harsh M; Lima, Patricia O; Camargo, Paulo M; Giannobile, William V; Grogan, Tristan R; Gleber-Netto, Frederico O; Whiteman, Yair; Li, Feng; Lee, Hyo Jung; Dharia, Karan; Aro, Katri; Carerras-Presas, Carmen Martin; Amuthan, Saarah; Vartak, Manjiri; Akin, David; Al-Adbullah, Hiba; Bembey, Kanika; Klokkevold, Perry R; Elashoff, David; Barnes, Virginia Monsul; Richter, Rose; DeVizio, William; Masters, James G; Wong, David
2018-05-19
This study tests the hypothesis that salivary extracellular RNA (exRNA) biomarkers can be developed for gingivitis detection and monitoring disease regression. Salivary exRNA biomarker candidates were developed from a total of 100 gingivitis and non-gingivitis individuals using Affymetrix's expression microarrays. The top ten differentially expressed exRNAs were tested in a clinical cohort to determine if the discovered salivary exRNA markers for gingivitis were associated with clinical gingivitis and disease regression. For this purpose, unstimulated saliva was collected from 30 randomly selected gingivitis subjects, the gingival and plaque indexes scores were taken at baseline, 3 & 6 weeks and salivary exRNAs were assayed by means of reverse transcription quantitative polymerase chain reaction. Eight salivary exRNA biomarkers developed for gingivitis were statistically significantly changed over time, consistent with disease regression. A panel of four salivary exRNAs [SPRR1A, lnc-TET3-2:1, FAM25A, CRCT1] can detect gingivitis with a clinical performance of 0.91 area under the curve (AUC), with 71% sensitivity and 100% specificity. The clinical values of the developed salivary exRNA biomarkers are associated with gingivitis regression. They offer strong potential to be advanced for definitive validation and clinical laboratory development test (LDT). This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Szyda, Joanna; Liu, Zengting; Zatoń-Dobrowolska, Magdalena; Wierzbicki, Heliodor; Rzasa, Anna
2008-01-01
We analysed data from a selective DNA pooling experiment with 130 individuals of the arctic fox (Alopex lagopus), which originated from 2 different types regarding body size. The association between alleles of 6 selected unlinked molecular markers and body size was tested by using univariate and multinomial logistic regression models, applying odds ratio and test statistics from the power divergence family. Due to the small sample size and the resulting sparseness of the data table, in hypothesis testing we could not rely on the asymptotic distributions of the tests. Instead, we tried to account for data sparseness by (i) modifying confidence intervals of odds ratio; (ii) using a normal approximation of the asymptotic distribution of the power divergence tests with different approaches for calculating moments of the statistics; and (iii) assessing P values empirically, based on bootstrap samples. As a result, a significant association was observed for 3 markers. Furthermore, we used simulations to assess the validity of the normal approximation of the asymptotic distribution of the test statistics under the conditions of small and sparse samples.
Robertson, Sam; Woods, Carl; Gastin, Paul
2015-09-01
To develop a physiological performance and anthropometric attribute model to predict Australian Football League draft selection. Cross-sectional observational. Data was obtained (n=4902) from three Under-18 Australian football competitions between 2010 and 2013. Players were allocated into one of the three groups, based on their highest level of selection in their final year of junior football (Australian Football League Drafted, n=292; National Championship, n=293; State-level club, n=4317). Physiological performance (vertical jumps, agility, speed and running endurance) and anthropometric (body mass and height) data were obtained. Hedge's effect sizes were calculated to assess the influence of selection-level and competition on these physical attributes, with logistic regression models constructed to discriminate Australian Football League Drafted and National Championship players. Rule induction analysis was undertaken to determine a set of rules for discriminating selection-level. Effect size comparisons revealed a range of small to moderate differences between State-level club players and both other groups for all attributes, with trivial to small differences between Australian Football League Drafted and National Championship players noted. Logistic regression models showed multistage fitness test, height and 20 m sprint time as the most important attributes in predicting Draft success. Rule induction analysis showed that players displaying multistage fitness test scores of >14.01 and/or 20 m sprint times of <2.99 s were most likely to be recruited. High levels of performance in aerobic and/or speed tests increase the likelihood of elite junior Australian football players being recruited to the highest level of the sport. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for the verification data set decreased as the calibration data-set size decreased, but predictive accuracy was not as sensitive for the MAP?s as it was for the local regression models.
Eash, David A.; Barnes, Kimberlee K.
2017-01-01
A statewide study was conducted to develop regression equations for estimating six selected low-flow frequency statistics and harmonic mean flows for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include: the annual 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years, the annual 30-day mean low flow for a recurrence interval of 5 years, and the seasonal (October 1 through December 31) 1- and 7-day mean low flows for a recurrence interval of 10 years. Estimation equations also were developed for the harmonic-mean-flow statistic. Estimates of these seven selected statistics are provided for 208 U.S. Geological Survey continuous-record streamgages using data through September 30, 2006. The study area comprises streamgages located within Iowa and 50 miles beyond the State's borders. Because trend analyses indicated statistically significant positive trends when considering the entire period of record for the majority of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. The median number of years of record used to compute each of these seven selected statistics was 35. Geographic information system software was used to measure 54 selected basin characteristics for each streamgage. Following the removal of two streamgages from the initial data set, data collected for 206 streamgages were compiled to investigate three approaches for regionalization of the seven selected statistics. Regionalization, a process using statistical regression analysis, provides a relation for efficiently transferring information from a group of streamgages in a region to ungaged sites in the region. The three regionalization approaches tested included statewide, regional, and region-of-influence regressions. For the regional regression, the study area was divided into three low-flow regions on the basis of hydrologic characteristics, landform regions, and soil regions. A comparison of root mean square errors and average standard errors of prediction for the statewide, regional, and region-of-influence regressions determined that the regional regression provided the best estimates of the seven selected statistics at ungaged sites in Iowa. Because a significant number of streams in Iowa reach zero flow as their minimum flow during low-flow years, four different types of regression analyses were used: left-censored, logistic, generalized-least-squares, and weighted-least-squares regression. A total of 192 streamgages were included in the development of 27 regression equations for the three low-flow regions. For the northeast and northwest regions, a censoring threshold was used to develop 12 left-censored regression equations to estimate the 6 low-flow frequency statistics for each region. For the southern region a total of 12 regression equations were developed; 6 logistic regression equations were developed to estimate the probability of zero flow for the 6 low-flow frequency statistics and 6 generalized least-squares regression equations were developed to estimate the 6 low-flow frequency statistics, if nonzero flow is estimated first by use of the logistic equations. A weighted-least-squares regression equation was developed for each region to estimate the harmonic-mean-flow statistic. Average standard errors of estimate for the left-censored equations for the northeast region range from 64.7 to 88.1 percent and for the northwest region range from 85.8 to 111.8 percent. Misclassification percentages for the logistic equations for the southern region range from 5.6 to 14.0 percent. Average standard errors of prediction for generalized least-squares equations for the southern region range from 71.7 to 98.9 percent and pseudo coefficients of determination for the generalized-least-squares equations range from 87.7 to 91.8 percent. Average standard errors of prediction for weighted-least-squares equations developed for estimating the harmonic-mean-flow statistic for each of the three regions range from 66.4 to 80.4 percent. The regression equations are applicable only to stream sites in Iowa with low flows not significantly affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. If the equations are used at ungaged sites on regulated streams, or on streams affected by water-supply and agricultural withdrawals, then the estimates will need to be adjusted by the amount of regulation or withdrawal to estimate the actual flow conditions if that is of interest. Caution is advised when applying the equations for basins with characteristics near the applicable limits of the equations and for basins located in karst topography. A test of two drainage-area ratio methods using 31 pairs of streamgages, for the annual 7-day mean low-flow statistic for a recurrence interval of 10 years, indicates a weighted drainage-area ratio method provides better estimates than regional regression equations for an ungaged site on a gaged stream in Iowa when the drainage-area ratio is between 0.5 and 1.4. These regression equations will be implemented within the U.S. Geological Survey StreamStats web-based geographic-information-system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the seven selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these seven selected statistics are provided for the streamgage.
The Role of Resources and Incentives in Education Production
ERIC Educational Resources Information Center
Saavedra, Juan Esteban
2009-01-01
Chapter 1 examines the effects of college quality on students' learning, employment and earnings in Colombia. Scores on a national college "entry" test solely determine admission to many selective Colombian universities, creating exogenous peer and resource quality variation near admission cutoffs. In one regression discontinuity (RD)…
SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *
Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.
2014-01-01
The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844
Predictive equations for the estimation of body size in seals and sea lions (Carnivora: Pinnipedia)
Churchill, Morgan; Clementz, Mark T; Kohno, Naoki
2014-01-01
Body size plays an important role in pinniped ecology and life history. However, body size data is often absent for historical, archaeological, and fossil specimens. To estimate the body size of pinnipeds (seals, sea lions, and walruses) for today and the past, we used 14 commonly preserved cranial measurements to develop sets of single variable and multivariate predictive equations for pinniped body mass and total length. Principal components analysis (PCA) was used to test whether separate family specific regressions were more appropriate than single predictive equations for Pinnipedia. The influence of phylogeny was tested with phylogenetic independent contrasts (PIC). The accuracy of these regressions was then assessed using a combination of coefficient of determination, percent prediction error, and standard error of estimation. Three different methods of multivariate analysis were examined: bidirectional stepwise model selection using Akaike information criteria; all-subsets model selection using Bayesian information criteria (BIC); and partial least squares regression. The PCA showed clear discrimination between Otariidae (fur seals and sea lions) and Phocidae (earless seals) for the 14 measurements, indicating the need for family-specific regression equations. The PIC analysis found that phylogeny had a minor influence on relationship between morphological variables and body size. The regressions for total length were more accurate than those for body mass, and equations specific to Otariidae were more accurate than those for Phocidae. Of the three multivariate methods, the all-subsets approach required the fewest number of variables to estimate body size accurately. We then used the single variable predictive equations and the all-subsets approach to estimate the body size of two recently extinct pinniped taxa, the Caribbean monk seal (Monachus tropicalis) and the Japanese sea lion (Zalophus japonicus). Body size estimates using single variable regressions generally under or over-estimated body size; however, the all-subset regression produced body size estimates that were close to historically recorded body length for these two species. This indicates that the all-subset regression equations developed in this study can estimate body size accurately. PMID:24916814
John W. Edwards; Susan C. Loeb; David C. Guynn
1994-01-01
Multiple regression and use-availability analyses are two methods for examining habitat selection. Use-availability analysis is commonly used to evaluate macrohabitat selection whereas multiple regression analysis can be used to determine microhabitat selection. We compared these techniques using behavioral observations (n = 5534) and telemetry locations (n = 2089) of...
Lacagnina, Valerio; Leto-Barone, Maria S; La Piana, Simona; Seidita, Aurelio; Pingitore, Giuseppe; Di Lorenzo, Gabriele
2014-01-01
This article uses the logistic regression model for diagnostic decision making in patients with chronic nasal symptoms. We studied the ability of the logistic regression model, obtained by the evaluation of a database, to detect patients with positive allergy skin-prick test (SPT) and patients with negative SPT. The model developed was validated using the data set obtained from another medical institution. The analysis was performed using a database obtained from a questionnaire administered to the patients with nasal symptoms containing personal data, clinical data, and results of allergy testing (SPT). All variables found to be significantly different between patients with positive and negative SPT (p < 0.05) were selected for the logistic regression models and were analyzed with backward stepwise logistic regression, evaluated with area under the curve of the receiver operating characteristic curve. A second set of patients from another institution was used to prove the model. The accuracy of the model in identifying, over the second set, both patients whose SPT will be positive and negative was high. The model detected 96% of patients with nasal symptoms and positive SPT and classified 94% of those with negative SPT. This study is preliminary to the creation of a software that could help the primary care doctors in a diagnostic decision making process (need of allergy testing) in patients complaining of chronic nasal symptoms.
Sullivan, Sarah; Lewis, Glyn; Mohr, Christine; Herzig, Daniela; Corcoran, Rhiannon; Drake, Richard; Evans, Jonathan
2014-01-01
There is some cross-sectional evidence that theory of mind ability is associated with social functioning in those with psychosis but the direction of this relationship is unknown. This study investigates the longitudinal association between both theory of mind and psychotic symptoms and social functioning outcome in first-episode psychosis. Fifty-four people with first-episode psychosis were followed up at 6 and 12 months. Random effects regression models were used to estimate the stability of theory of mind over time and the association between baseline theory of mind and psychotic symptoms and social functioning outcome. Neither baseline theory of mind ability (regression coefficients: Hinting test 1.07 95% CI -0.74, 2.88; Visual Cartoon test -2.91 95% CI -7.32, 1.51) nor baseline symptoms (regression coefficients: positive symptoms -0.04 95% CI -1.24, 1.16; selected negative symptoms -0.15 95% CI -2.63, 2.32) were associated with social functioning outcome. There was evidence that theory of mind ability was stable over time, (regression coefficients: Hinting test 5.92 95% CI -6.66, 8.92; Visual Cartoon test score 0.13 95% CI -0.17, 0.44). Neither baseline theory of mind ability nor psychotic symptoms are associated with social functioning outcome. Further longitudinal work is needed to understand the origin of social functioning deficits in psychosis.
Sharp, T G
1984-02-01
The study was designed to determine whether any one of seven selected variables or a combination of the variables is predictive of performance on the State Board Test Pool Examination. The selected variables studied were: high school grade point average (HSGPA), The University of Tennessee, Knoxville, College of Nursing grade point average (GPA), and American College Test Assessment (ACT) standard scores (English, ENG; mathematics, MA; social studies, SS; natural sciences, NSC; composite, COMP). Data utilized were from graduates of the baccalaureate program of The University of Tennessee, Knoxville, College of Nursing from 1974 through 1979. The sample of 322 was selected from a total population of 572. The Statistical Analysis System (SAS) was designed to accomplish analysis of the predictive relationship of each of the seven selected variables to State Board Test Pool Examination performance (result of pass or fail), a stepwise discriminant analysis was designed for determining the predictive relationship of the strongest combination of the independent variables to overall State Board Test Pool Examination performance (result of pass or fail), and stepwise multiple regression analysis was designed to determine the strongest predictive combination of selected variables for each of the five subexams of the State Board Test Pool Examination. The selected variables were each found to be predictive of SBTPE performance (result of pass or fail). The strongest combination for predicting SBTPE performance (result of pass or fail) was found to be GPA, MA, and NSC.
Mixed conditional logistic regression for habitat selection studies.
Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas
2010-05-01
1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.
Using Data Mining for Wine Quality Assessment
NASA Astrophysics Data System (ADS)
Cortez, Paulo; Teixeira, Juliana; Cerdeira, António; Almeida, Fernando; Matos, Telmo; Reis, José
Certification and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physicochemical (e.g alcohol levels) and sensory (e.g. human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certification step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal. Wine quality is modeled under a regression approach, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its domain. Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection and that is guided by the sensitivity analysis. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful for understanding how physicochemical tests affect the sensory preferences. Moreover, it can support the wine expert evaluations and ultimately improve the production.
Willke, Richard J; Zheng, Zhiyuan; Subedi, Prasun; Althin, Rikard; Mullins, C Daniel
2012-12-13
Implicit in the growing interest in patient-centered outcomes research is a growing need for better evidence regarding how responses to a given intervention or treatment may vary across patients, referred to as heterogeneity of treatment effect (HTE). A variety of methods are available for exploring HTE, each associated with unique strengths and limitations. This paper reviews a selected set of methodological approaches to understanding HTE, focusing largely but not exclusively on their uses with randomized trial data. It is oriented for the "intermediate" outcomes researcher, who may already be familiar with some methods, but would value a systematic overview of both more and less familiar methods with attention to when and why they may be used. Drawing from the biomedical, statistical, epidemiological and econometrics literature, we describe the steps involved in choosing an HTE approach, focusing on whether the intent of the analysis is for exploratory, initial testing, or confirmatory testing purposes. We also map HTE methodological approaches to data considerations as well as the strengths and limitations of each approach. Methods reviewed include formal subgroup analysis, meta-analysis and meta-regression, various types of predictive risk modeling including classification and regression tree analysis, series of n-of-1 trials, latent growth and growth mixture models, quantile regression, and selected non-parametric methods. In addition to an overview of each HTE method, examples and references are provided for further reading.By guiding the selection of the methods and analysis, this review is meant to better enable outcomes researchers to understand and explore aspects of HTE in the context of patient-centered outcomes research.
Exhaustive Search for Sparse Variable Selection in Linear Regression
NASA Astrophysics Data System (ADS)
Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato
2018-04-01
We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
Patterson, Fiona; Lopes, Safiatu; Harding, Stephen; Vaux, Emma; Berkin, Liz; Black, David
2017-02-01
The aim of this study was to follow up a sample of physicians who began core medical training (CMT) in 2009. This paper examines the long-term validity of CMT and GP selection methods in predicting performance in the Membership of Royal College of Physicians (MRCP(UK)) examinations. We performed a longitudinal study, examining the extent to which the GP and CMT selection methods (T1) predict performance in the MRCP(UK) examinations (T2). A total of 2,569 applicants from 2008-09 who completed CMT and GP selection methods were included in the study. Looking at MRCP(UK) part 1, part 2 written and PACES scores, both CMT and GP selection methods show evidence of predictive validity for the outcome variables, and hierarchical regressions show the GP methods add significant value to the CMT selection process. CMT selection methods predict performance in important outcomes and have good evidence of validity; the GP methods may have an additional role alongside the CMT selection methods. © Royal College of Physicians 2017. All rights reserved.
Parametric regression model for survival data: Weibull regression model as an example
2016-01-01
Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846
1981-01-01
explanatory variable has been ommitted. Ramsey (1974) has developed a rather interesting test for detecting specification errors using estimates of the...Peter. (1979) A Guide to Econometrics , Cambridge, MA: The MIT Press. Ramsey , J.B. (1974), "Classical Model Selection Through Specification Error... Tests ," in P. Zarembka, Ed. Frontiers in Econometrics , New York: Academia Press. Theil, Henri. (1971), Principles of Econometrics , New York: John Wiley
Gettings, S D; Lordo, R A; Hintze, K L; Bagley, D M; Casterton, P L; Chudkowski, M; Curren, R D; Demetrulias, J L; Dipasquale, L C; Earl, L K; Feder, P I; Galli, C L; Glaza, S M; Gordon, V C; Janus, J; Kurtz, P J; Marenus, K D; Moral, J; Pape, W J; Renskers, K J; Rheins, L A; Roddy, M T; Rozen, M G; Tedeschi, J P; Zyracki, J
1996-01-01
The CTFA Evaluation of Alternatives Program is an evaluation of the relationship between data from the Draize primary eye irritation test and comparable data from a selection of promising in vitro eye irritation tests. In Phase III, data from the Draize test and 41 in vitro endpoints on 25 representative surfactant-based personal care formulations were compared. As in Phase I and Phase II, regression modelling of the relationship between maximum average Draize score (MAS) and in vitro endpoint was the primary approach adopted for evaluating in vitro assay performance. The degree of confidence in prediction of MAS for a given in vitro endpoint is quantified in terms of the relative widths of prediction intervals constructed about the fitted regression curve. Prediction intervals reflect not only the error attributed to the model but also the material-specific components of variation in both the Draize and the in vitro assays. Among the in vitro assays selected for regression modeling in Phase III, the relationship between MAS and in vitro score was relatively well defined. The prediction bounds on MAS were most narrow for materials at the lower or upper end of the effective irritation range (MAS = 0-45), where variability in MAS was smallest. This, the confidence with which the MAS of surfactant-based formulations is predicted is greatest when MAS approaches zero or when MAS approaches 45 (no comment is made on prediction of MAS > 45 since extrapolation beyond the range of observed data is not possible). No single in vitro endpoint was found to exhibit relative superiority with regard to prediction of MAS. Variability associated with Draize test outcome (e.g. in MAS values) must be considered in any future comparisons of in vivo and in vitro test results if the purpose is to predict in vivo response using in vitro data.
Missing-value estimation using linear and non-linear regression with Bayesian gene selection.
Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R
2003-11-22
Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).
Bae, Jong-Myon; Kim, Eun Hee
2016-03-01
Research on how the risk of gastric cancer increases with Epstein-Barr virus (EBV) infection is lacking. In a systematic review that investigated studies published until September 2014, the authors did not calculate the summary odds ratio (SOR) due to heterogeneity across studies. Therefore, we include here additional studies published until October 2015 and conduct a meta-analysis with meta-regression that controls for the heterogeneity among studies. Using the studies selected in the previously published systematic review, we formulated lists of references, cited articles, and related articles provided by PubMed. From the lists, only case-control studies that detected EBV in tissue samples were selected. In order to control for the heterogeneity among studies, subgroup analysis and meta-regression were performed. In the 33 case-control results with adjacent non-cancer tissue, the total number of test samples in the case and control groups was 5280 and 4962, respectively. In the 14 case-control results with normal tissue, the total number of test samples in case and control groups was 1393 and 945, respectively. Upon meta-regression, the type of control tissue was found to be a statistically significant variable with regard to heterogeneity. When the control tissue was normal tissue of healthy individuals, the SOR was 3.41 (95% CI, 1.78 to 6.51; I-squared, 65.5%). The results of the present study support the argument that EBV infection increases the risk of gastric cancer. In the future, age-matched and sex-matched case-control studies should be conducted.
EMI-Sensor Data to Identify Areas of Manure Accumulation on a Feedlot Surface
USDA-ARS?s Scientific Manuscript database
A study was initiated to test the validity of using electromagnetic induction (EMI) survey data, a prediction-based sampling strategy and ordinary linear regression modeling to predict spatially variable feedlot surface manure accumulation. A 30 m × 60 m feedlot pen with a central mound was selecte...
Lee, Seung Hee; Jang, Hyung Suk; Yang, Young Hee
2016-10-01
This study was done to investigate factors influencing successful aging in middle-aged women. A convenience sample of 103 middle-aged women was selected from the community. Data were collected using a structured questionnaire and analyzed using descriptive statistics, two-sample t-test, one-way ANOVA, Kruskal Wallis test, Pearson correlations, Spearman correlations and multiple regression analysis with the SPSS/WIN 22.0 program. Results of regression analysis showed that significant factors influencing successful aging were post-traumatic growth and social support. This regression model explained 48% of the variance in successful aging. Findings show that the concept 'post-traumatic growth' is an important factor influencing successful aging in middle-aged women. In addition, social support from friends/co-workers had greater influence on successful aging than social support from family. Thus, we need to consider the positive impact of post-traumatic growth and increase the chances of social participation in a successful aging program for middle-aged women.
Quantifying prosthetic gait deviation using simple outcome measures
Kark, Lauren; Odell, Ross; McIntosh, Andrew S; Simmons, Anne
2016-01-01
AIM: To develop a subset of simple outcome measures to quantify prosthetic gait deviation without needing three-dimensional gait analysis (3DGA). METHODS: Eight unilateral, transfemoral amputees and 12 unilateral, transtibial amputees were recruited. Twenty-eight able-bodied controls were recruited. All participants underwent 3DGA, the timed-up-and-go test and the six-minute walk test (6MWT). The lower-limb amputees also completed the Prosthesis Evaluation Questionnaire. Results from 3DGA were summarised using the gait deviation index (GDI), which was subsequently regressed, using stepwise regression, against the other measures. RESULTS: Step-length (SL), self-selected walking speed (SSWS) and the distance walked during the 6MWT (6MWD) were significantly correlated with GDI. The 6MWD was the strongest, single predictor of the GDI, followed by SL and SSWS. The predictive ability of the regression equations were improved following inclusion of self-report data related to mobility and prosthetic utility. CONCLUSION: This study offers a practicable alternative to quantifying kinematic deviation without the need to conduct complete 3DGA. PMID:27335814
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test ofmore » the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.« less
NASA Astrophysics Data System (ADS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
Basis Selection for Wavelet Regression
NASA Technical Reports Server (NTRS)
Wheeler, Kevin R.; Lau, Sonie (Technical Monitor)
1998-01-01
A wavelet basis selection procedure is presented for wavelet regression. Both the basis and the threshold are selected using cross-validation. The method includes the capability of incorporating prior knowledge on the smoothness (or shape of the basis functions) into the basis selection procedure. The results of the method are demonstrated on sampled functions widely used in the wavelet regression literature. The results of the method are contrasted with other published methods.
Wei, Chang-Na; Zhou, Qing-He; Wang, Li-Zhong
2017-01-01
Abstract Currently, there is no consensus on how to determine the optimal dose of intrathecal bupivacaine for an individual undergoing an elective cesarean section. In this study, we developed a regression equation between intrathecal 0.5% hyperbaric bupivacaine volume and abdominal girth and vertebral column length, to determine a suitable block level (T5) for elective cesarean section patients. In phase I, we analyzed 374 parturients undergoing an elective cesarean section that received a suitable dose of intrathecal 0.5% hyperbaric bupivacaine after a combined spinal-epidural (CSE) was performed at the L3/4 interspace. Parturients with T5 blockade to pinprick were selected for establishing the regression equation between 0.5% hyperbaric bupivacaine volume and vertebral column length and abdominal girth. Six parturient and neonatal variables, intrathecal 0.5% hyperbaric bupivacaine volume, and spinal anesthesia spread were recorded. Bivariate line correlation analyses, multiple line regression analyses, and 2-tailed t tests or chi-square test were performed, as appropriate. In phase II, another 200 parturients with CSE for elective cesarean section were enrolled to verify the accuracy of the regression equation. In phase I, a total of 143 parturients were selected to establish the following regression equation: YT5 = 0.074X1 − 0.022X2 − 0.017 (YT5 = 0.5% hyperbaric bupivacaine volume for T5 block level; X1 = vertebral column length; and X2 = abdominal girth). In phase II, a total of 189 participants were enrolled in the study to verify the accuracy of the regression equation, and 155 parturients with T5 blockade were deemed eligible, which accounted for 82.01% of all participants. This study evaluated parturients with T5 blockade to pinprick after a CSE for elective cesarean section to establish a regression equation between parturient vertebral column length and abdominal girth and 0.5% hyperbaric intrathecal bupivacaine volume. This equation can accurately predict the suitable intrathecal hyperbaric bupivacaine dose for elective cesarean section. PMID:28834913
Wei, Chang-Na; Zhou, Qing-He; Wang, Li-Zhong
2017-08-01
Currently, there is no consensus on how to determine the optimal dose of intrathecal bupivacaine for an individual undergoing an elective cesarean section. In this study, we developed a regression equation between intrathecal 0.5% hyperbaric bupivacaine volume and abdominal girth and vertebral column length, to determine a suitable block level (T5) for elective cesarean section patients.In phase I, we analyzed 374 parturients undergoing an elective cesarean section that received a suitable dose of intrathecal 0.5% hyperbaric bupivacaine after a combined spinal-epidural (CSE) was performed at the L3/4 interspace. Parturients with T5 blockade to pinprick were selected for establishing the regression equation between 0.5% hyperbaric bupivacaine volume and vertebral column length and abdominal girth. Six parturient and neonatal variables, intrathecal 0.5% hyperbaric bupivacaine volume, and spinal anesthesia spread were recorded. Bivariate line correlation analyses, multiple line regression analyses, and 2-tailed t tests or chi-square test were performed, as appropriate. In phase II, another 200 parturients with CSE for elective cesarean section were enrolled to verify the accuracy of the regression equation.In phase I, a total of 143 parturients were selected to establish the following regression equation: YT5 = 0.074X1 - 0.022X2 - 0.017 (YT5 = 0.5% hyperbaric bupivacaine volume for T5 block level; X1 = vertebral column length; and X2 = abdominal girth). In phase II, a total of 189 participants were enrolled in the study to verify the accuracy of the regression equation, and 155 parturients with T5 blockade were deemed eligible, which accounted for 82.01% of all participants.This study evaluated parturients with T5 blockade to pinprick after a CSE for elective cesarean section to establish a regression equation between parturient vertebral column length and abdominal girth and 0.5% hyperbaric intrathecal bupivacaine volume. This equation can accurately predict the suitable intrathecal hyperbaric bupivacaine dose for elective cesarean section.
Kelly, Maureen E; Regan, Daniel; Dunne, Fidelma; Henn, Patrick; Newell, John; O'Flynn, Siun
2013-05-10
Internationally, tests of general mental ability are used in the selection of medical students. Examples include the Medical College Admission Test, Undergraduate Medicine and Health Sciences Admission Test and the UK Clinical Aptitude Test. The most widely used measure of their efficacy is predictive validity.A new tool, the Health Professions Admission Test- Ireland (HPAT-Ireland), was introduced in 2009. Traditionally, selection to Irish undergraduate medical schools relied on academic achievement. Since 2009, Irish and EU applicants are selected on a combination of their secondary school academic record (measured predominately by the Leaving Certificate Examination) and HPAT-Ireland score. This is the first study to report on the predictive validity of the HPAT-Ireland for early undergraduate assessments of communication and clinical skills. Students enrolled at two Irish medical schools in 2009 were followed up for two years. Data collected were gender, HPAT-Ireland total and subsection scores; Leaving Certificate Examination plus HPAT-Ireland combined score, Year 1 Objective Structured Clinical Examination (OSCE) scores (Total score, communication and clinical subtest scores), Year 1 Multiple Choice Questions and Year 2 OSCE and subset scores. We report descriptive statistics, Pearson correlation coefficients and Multiple linear regression models. Data were available for 312 students. In Year 1 none of the selection criteria were significantly related to student OSCE performance. The Leaving Certificate Examination and Leaving Certificate plus HPAT-Ireland combined scores correlated with MCQ marks.In Year 2 a series of significant correlations emerged between the HPAT-Ireland and subsections thereof with OSCE Communication Z-scores; OSCE Clinical Z-scores; and Total OSCE Z-scores. However on multiple regression only the relationship between Total OSCE Score and the Total HPAT-Ireland score remained significant; albeit the predictive power was modest. We found that none of our selection criteria strongly predict clinical and communication skills. The HPAT- Ireland appears to measures ability in domains different to those assessed by the Leaving Certificate Examination. While some significant associations did emerge in Year 2 between HPAT Ireland and total OSCE scores further evaluation is required to establish if this pattern continues during the senior years of the medical course.
2013-01-01
Background Internationally, tests of general mental ability are used in the selection of medical students. Examples include the Medical College Admission Test, Undergraduate Medicine and Health Sciences Admission Test and the UK Clinical Aptitude Test. The most widely used measure of their efficacy is predictive validity. A new tool, the Health Professions Admission Test- Ireland (HPAT-Ireland), was introduced in 2009. Traditionally, selection to Irish undergraduate medical schools relied on academic achievement. Since 2009, Irish and EU applicants are selected on a combination of their secondary school academic record (measured predominately by the Leaving Certificate Examination) and HPAT-Ireland score. This is the first study to report on the predictive validity of the HPAT-Ireland for early undergraduate assessments of communication and clinical skills. Method Students enrolled at two Irish medical schools in 2009 were followed up for two years. Data collected were gender, HPAT-Ireland total and subsection scores; Leaving Certificate Examination plus HPAT-Ireland combined score, Year 1 Objective Structured Clinical Examination (OSCE) scores (Total score, communication and clinical subtest scores), Year 1 Multiple Choice Questions and Year 2 OSCE and subset scores. We report descriptive statistics, Pearson correlation coefficients and Multiple linear regression models. Results Data were available for 312 students. In Year 1 none of the selection criteria were significantly related to student OSCE performance. The Leaving Certificate Examination and Leaving Certificate plus HPAT-Ireland combined scores correlated with MCQ marks. In Year 2 a series of significant correlations emerged between the HPAT-Ireland and subsections thereof with OSCE Communication Z-scores; OSCE Clinical Z-scores; and Total OSCE Z-scores. However on multiple regression only the relationship between Total OSCE Score and the Total HPAT-Ireland score remained significant; albeit the predictive power was modest. Conclusion We found that none of our selection criteria strongly predict clinical and communication skills. The HPAT- Ireland appears to measures ability in domains different to those assessed by the Leaving Certificate Examination. While some significant associations did emerge in Year 2 between HPAT Ireland and total OSCE scores further evaluation is required to establish if this pattern continues during the senior years of the medical course. PMID:23663266
Fouad, Marwa A; Tolba, Enas H; El-Shal, Manal A; El Kerdawy, Ahmed M
2018-05-11
The justified continuous emerging of new β-lactam antibiotics provokes the need for developing suitable analytical methods that accelerate and facilitate their analysis. A face central composite experimental design was adopted using different levels of phosphate buffer pH, acetonitrile percentage at zero time and after 15 min in a gradient program to obtain the optimum chromatographic conditions for the elution of 31 β-lactam antibiotics. Retention factors were used as the target property to build two QSRR models utilizing the conventional forward selection and the advanced nature-inspired firefly algorithm for descriptor selection, coupled with multiple linear regression. The obtained models showed high performance in both internal and external validation indicating their robustness and predictive ability. Williams-Hotelling test and student's t-test showed that there is no statistical significant difference between the models' results. Y-randomization validation showed that the obtained models are due to significant correlation between the selected molecular descriptors and the analytes' chromatographic retention. These results indicate that the generated FS-MLR and FFA-MLR models are showing comparable quality on both the training and validation levels. They also gave comparable information about the molecular features that influence the retention behavior of β-lactams under the current chromatographic conditions. We can conclude that in some cases simple conventional feature selection algorithm can be used to generate robust and predictive models comparable to that are generated using advanced ones. Copyright © 2018 Elsevier B.V. All rights reserved.
Rafindadi, Abdulkadir Abdulrashid; Yusof, Zarinah; Zaman, Khalid; Kyophilavong, Phouphet; Akhmat, Ghulam
2014-10-01
The objective of the study is to examine the relationship between air pollution, fossil fuel energy consumption, water resources, and natural resource rents in the panel of selected Asia-Pacific countries, over a period of 1975-2012. The study includes number of variables in the model for robust analysis. The results of cross-sectional analysis show that there is a significant relationship between air pollution, energy consumption, and water productivity in the individual countries of Asia-Pacific. However, the results of each country vary according to the time invariant shocks. For this purpose, the study employed the panel least square technique which includes the panel least square regression, panel fixed effect regression, and panel two-stage least square regression. In general, all the panel tests indicate that there is a significant and positive relationship between air pollution, energy consumption, and water resources in the region. The fossil fuel energy consumption has a major dominating impact on the changes in the air pollution in the region.
Zhao, Ni; Chen, Jun; Carroll, Ian M.; Ringel-Kulka, Tamar; Epstein, Michael P.; Zhou, Hua; Zhou, Jin J.; Ringel, Yehuda; Li, Hongzhe; Wu, Michael C.
2015-01-01
High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Distance-based analysis is a popular strategy for evaluating the overall association between microbiome diversity and outcome, wherein the phylogenetic distance between individuals’ microbiome profiles is computed and tested for association via permutation. Despite their practical popularity, distance-based approaches suffer from important challenges, especially in selecting the best distance and extending the methods to alternative outcomes, such as survival outcomes. We propose the microbiome regression-based kernel association test (MiRKAT), which directly regresses the outcome on the microbiome profiles via the semi-parametric kernel machine regression framework. MiRKAT allows for easy covariate adjustment and extension to alternative outcomes while non-parametrically modeling the microbiome through a kernel that incorporates phylogenetic distance. It uses a variance-component score statistic to test for the association with analytical p value calculation. The model also allows simultaneous examination of multiple distances, alleviating the problem of choosing the best distance. Our simulations demonstrated that MiRKAT provides correctly controlled type I error and adequate power in detecting overall association. “Optimal” MiRKAT, which considers multiple candidate distances, is robust in that it suffers from little power loss in comparison to when the best distance is used and can achieve tremendous power gain in comparison to when a poor distance is chosen. Finally, we applied MiRKAT to real microbiome datasets to show that microbial communities are associated with smoking and with fecal protease levels after confounders are controlled for. PMID:25957468
NASA Astrophysics Data System (ADS)
Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.
2018-03-01
The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.
Mirmohseni, A; Abdollahi, H; Rostamizadeh, K
2007-02-28
Net analyte signal (NAS)-based method called HLA/GO was applied for the selectively determination of binary mixture of ethanol and water by quartz crystal nanobalance (QCN) sensor. A full factorial design was applied for the formation of calibration and prediction sets in the concentration ranges 5.5-22.2 microg mL(-1) for ethanol and 7.01-28.07 microg mL(-1) for water. An optimal time range was selected by procedure which was based on the calculation of the net analyte signal regression plot in any considered time window for each test sample. A moving window strategy was used for searching the region with maximum linearity of NAS regression plot (minimum error indicator) and minimum of PRESS value. On the base of obtained results, the differences on the adsorption profiles in the time range between 1 and 600 s were used to determine mixtures of both compounds by HLA/GO method. The calculation of the net analytical signal using HLA/GO method allows determination of several figures of merit like selectivity, sensitivity, analytical sensitivity and limit of detection, for each component. To check the ability of the proposed method in the selection of linear regions of adsorption profile, a test for detecting non-linear regions of adsorption profile data in the presence of methanol was also described. The results showed that the method was successfully applied for the determination of ethanol and water.
Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.
2016-01-01
Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171
Space Shuttle Main Engine performance analysis
NASA Technical Reports Server (NTRS)
Santi, L. Michael
1993-01-01
For a number of years, NASA has relied primarily upon periodically updated versions of Rocketdyne's power balance model (PBM) to provide space shuttle main engine (SSME) steady-state performance prediction. A recent computational study indicated that PBM predictions do not satisfy fundamental energy conservation principles. More recently, SSME test results provided by the Technology Test Bed (TTB) program have indicated significant discrepancies between PBM flow and temperature predictions and TTB observations. Results of these investigations have diminished confidence in the predictions provided by PBM, and motivated the development of new computational tools for supporting SSME performance analysis. A multivariate least squares regression algorithm was developed and implemented during this effort in order to efficiently characterize TTB data. This procedure, called the 'gains model,' was used to approximate the variation of SSME performance parameters such as flow rate, pressure, temperature, speed, and assorted hardware characteristics in terms of six assumed independent influences. These six influences were engine power level, mixture ratio, fuel inlet pressure and temperature, and oxidizer inlet pressure and temperature. A BFGS optimization algorithm provided the base procedure for determining regression coefficients for both linear and full quadratic approximations of parameter variation. Statistical information relative to data deviation from regression derived relations was also computed. A new strategy for integrating test data with theoretical performance prediction was also investigated. The current integration procedure employed by PBM treats test data as pristine and adjusts hardware characteristics in a heuristic manner to achieve engine balance. Within PBM, this integration procedure is called 'data reduction.' By contrast, the new data integration procedure, termed 'reconciliation,' uses mathematical optimization techniques, and requires both measurement and balance uncertainty estimates. The reconciler attempts to select operational parameters that minimize the difference between theoretical prediction and observation. Selected values are further constrained to fall within measurement uncertainty limits and to satisfy fundamental physical relations (mass conservation, energy conservation, pressure drop relations, etc.) within uncertainty estimates for all SSME subsystems. The parameter selection problem described above is a traditional nonlinear programming problem. The reconciler employs a mixed penalty method to determine optimum values of SSME operating parameters associated with this problem formulation.
Regression analysis for LED color detection of visual-MIMO system
NASA Astrophysics Data System (ADS)
Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo
2018-04-01
Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.
Lamont, Andrea E.; Vermunt, Jeroen K.; Van Horn, M. Lee
2016-01-01
Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we test the effects of violating an implicit assumption often made in these models – i.e., independent variables in the model are not directly related to latent classes. Results indicated that the major risk of failing to model the relationship between predictor and latent class was an increase in the probability of selecting additional latent classes and biased class proportions. Additionally, this study tests whether regression mixture models can detect a piecewise relationship between a predictor and outcome. Results suggest that these models are able to detect piecewise relations, but only when the relationship between the latent class and the predictor is included in model estimation. We illustrate the implications of making this assumption through a re-analysis of applied data examining heterogeneity in the effects of family resources on academic achievement. We compare previous results (which assumed no relation between independent variables and latent class) to the model where this assumption is lifted. Implications and analytic suggestions for conducting regression mixture based on these findings are noted. PMID:26881956
2012-01-01
Implicit in the growing interest in patient-centered outcomes research is a growing need for better evidence regarding how responses to a given intervention or treatment may vary across patients, referred to as heterogeneity of treatment effect (HTE). A variety of methods are available for exploring HTE, each associated with unique strengths and limitations. This paper reviews a selected set of methodological approaches to understanding HTE, focusing largely but not exclusively on their uses with randomized trial data. It is oriented for the “intermediate” outcomes researcher, who may already be familiar with some methods, but would value a systematic overview of both more and less familiar methods with attention to when and why they may be used. Drawing from the biomedical, statistical, epidemiological and econometrics literature, we describe the steps involved in choosing an HTE approach, focusing on whether the intent of the analysis is for exploratory, initial testing, or confirmatory testing purposes. We also map HTE methodological approaches to data considerations as well as the strengths and limitations of each approach. Methods reviewed include formal subgroup analysis, meta-analysis and meta-regression, various types of predictive risk modeling including classification and regression tree analysis, series of n-of-1 trials, latent growth and growth mixture models, quantile regression, and selected non-parametric methods. In addition to an overview of each HTE method, examples and references are provided for further reading. By guiding the selection of the methods and analysis, this review is meant to better enable outcomes researchers to understand and explore aspects of HTE in the context of patient-centered outcomes research. PMID:23234603
Bricklemyer, Ross S; Brown, David J; Turk, Philip J; Clegg, Sam M
2013-10-01
Laser-induced breakdown spectroscopy (LIBS) provides a potential method for rapid, in situ soil C measurement. In previous research on the application of LIBS to intact soil cores, we hypothesized that ultraviolet (UV) spectrum LIBS (200-300 nm) might not provide sufficient elemental information to reliably discriminate between soil organic C (SOC) and inorganic C (IC). In this study, using a custom complete spectrum (245-925 nm) core-scanning LIBS instrument, we analyzed 60 intact soil cores from six wheat fields. Predictive multi-response partial least squares (PLS2) models using full and reduced spectrum LIBS were compared for directly determining soil total C (TC), IC, and SOC. Two regression shrinkage and variable selection approaches, the least absolute shrinkage and selection operator (LASSO) and sparse multivariate regression with covariance estimation (MRCE), were tested for soil C predictions and the identification of wavelengths important for soil C prediction. Using complete spectrum LIBS for PLS2 modeling reduced the calibration standard error of prediction (SEP) 15 and 19% for TC and IC, respectively, compared to UV spectrum LIBS. The LASSO and MRCE approaches provided significantly improved calibration accuracy and reduced SEP 32-55% over UV spectrum PLS2 models. We conclude that (1) complete spectrum LIBS is superior to UV spectrum LIBS for predicting soil C for intact soil cores without pretreatment; (2) LASSO and MRCE approaches provide improved calibration prediction accuracy over PLS2 but require additional testing with increased soil and target analyte diversity; and (3) measurement errors associated with analyzing intact cores (e.g., sample density and surface roughness) require further study and quantification.
Anna Luisa de Brito, Pacheco; Isabel Cristina, Olegário; Clarissa Calil, Bonifácio; Ana Flávia Bissoto, Calvo; José Carlos Pettorossi, Imparato; Daniela Prócida, Raggio
2017-11-06
Good survival rates for single-surface Atraumatic Restorative Treatment (ART) restorations have been reported, while multi-surface ART restorations have not shown similar results. The aim of this study was to evaluate the survival rate of occluso-proximal ART restorations using two different filling materials: Ketac Molar EasyMix (3M ESPE) and Vitro Molar (DFL). A total of 117 primary molars with occluso-proximal caries lesions were selected in 4 to 8 years old children in Barueri city, Brazil. Only one tooth was selected per child. The subjetcs were randomly allocated in two groups according to the filling material. All treatments were performed following the ART premises and all restorations were evaluated after 2, 6 and 12 months. Restoration survival was evaluated using Kaplan-Meier survival analysis and Log-rank test, while Cox regression analysis was used for testing association with clinical factors (α = 5%). There was no difference in survival rate between the materials tested, (HR = 1.60, CI = 0.98-2.62, p = 0.058). The overall survival rate of restorations was 42.74% and the survival rate per group was Ketac Molar = 50,8% and Vitro Molar G2 = 34.5%). Cox regression test showed no association between the analyzed clinical variables and the success of the restorations. After 12 months evaluation, no difference in the survival rate of ART occluso-proximal restorations was found between tested materials.
ERIC Educational Resources Information Center
Maholchic-Nelson, Suzy
2010-01-01
This correlational study tested the efficacy of the social-ecological theory (Moos, 1979) by employing the University Residential Environmental Scale and multiple regression analysis to examine the influences of personal attributes (SAT, parents' level of education, race/ethnicity, and high school drinking) and environmental factors (high/low…
ERIC Educational Resources Information Center
Floyd, Randy G.; Evans, Jeffrey J.; McGrew, Kevin S.
2003-01-01
Cognitive clusters from the Woodcock-Johnson III (WJ III) Tests of Cognitive Abilities that measure select Cattell-Horn-Carroll broad and narrow cognitive abilities were shown to be significantly related to mathematics achievement in a large, nationally representative sample of children and adolescents. Multiple regression analyses were used to…
ERIC Educational Resources Information Center
Joyce, Beverly A.; Farenga, Stephen J.
1999-01-01
Examines specific science-related attitudes, informal science-related experiences, future interest in science, and gender of young high-ability students (n=111) who completed the Test of Science Related Attitudes (TOSRA), the Science Experience Survey (SES), and the Course Selection Sheet (CSS). Develops two regression models to predict the number…
Genetic prediction of type 2 diabetes using deep neural network.
Kim, J; Kim, J; Kwak, M J; Bajaj, M
2018-04-01
Type 2 diabetes (T2DM) has strong heritability but genetic models to explain heritability have been challenging. We tested deep neural network (DNN) to predict T2DM using the nested case-control study of Nurses' Health Study (3326 females, 45.6% T2DM) and Health Professionals Follow-up Study (2502 males, 46.5% T2DM). We selected 96, 214, 399, and 678 single-nucleotide polymorphism (SNPs) through Fisher's exact test and L1-penalized logistic regression. We split each dataset randomly in 4:1 to train prediction models and test their performance. DNN and logistic regressions showed better area under the curve (AUC) of ROC curves than the clinical model when 399 or more SNPs included. DNN was superior than logistic regressions in AUC with 399 or more SNPs in male and 678 SNPs in female. Addition of clinical factors consistently increased AUC of DNN but failed to improve logistic regressions with 214 or more SNPs. In conclusion, we show that DNN can be a versatile tool to predict T2DM incorporating large numbers of SNPs and clinical information. Limitations include a relatively small number of the subjects mostly of European ethnicity. Further studies are warranted to confirm and improve performance of genetic prediction models using DNN in different ethnic groups. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Comparison of random regression test-day models for Polish Black and White cattle.
Strabel, T; Szyda, J; Ptak, E; Jamrozik, J
2005-10-01
Test-day milk yields of first-lactation Black and White cows were used to select the model for routine genetic evaluation of dairy cattle in Poland. The population of Polish Black and White cows is characterized by small herd size, low level of production, and relatively early peak of lactation. Several random regression models for first-lactation milk yield were initially compared using the "percentage of squared bias" criterion and the correlations between true and predicted breeding values. Models with random herd-test-date effects, fixed age-season and herd-year curves, and random additive genetic and permanent environmental curves (Legendre polynomials of different orders were used for all regressions) were chosen for further studies. Additional comparisons included analyses of the residuals and shapes of variance curves in days in milk. The low production level and early peak of lactation of the breed required the use of Legendre polynomials of order 5 to describe age-season lactation curves. For the other curves, Legendre polynomials of order 3 satisfactorily described daily milk yield variation. Fitting third-order polynomials for the permanent environmental effect made it possible to adequately account for heterogeneous residual variance at different stages of lactation.
Gregory, Simon; Patterson, Fiona; Baron, Helen; Knight, Alec; Walsh, Kieran; Irish, Bill; Thomas, Sally
2016-10-01
Increasing pressure is being placed on external accountability and cost efficiency in medical education and training internationally. We present an illustrative data analysis of the value-added of postgraduate medical education. We analysed historical selection (entry) and licensure (exit) examination results for trainees sitting the UK Membership of the Royal College of General Practitioners (MRCGP) licensing examination (N = 2291). Selection data comprised: a clinical problem solving test (CPST); a situational judgement test (SJT); and a selection centre (SC). Exit data was an applied knowledge test (AKT) from MRCGP. Ordinary least squares (OLS) regression analyses were used to model differences in attainment in the AKT based on performance at selection (the value-added score). Results were aggregated to the regional level for comparisons. We discovered significant differences in the value-added score between regional training providers. Whilst three training providers confer significant value-added, one training provider was significantly lower than would be predicted based on the attainment of trainees at selection. Value-added analysis in postgraduate medical education potentially offers useful information, although the methodology is complex, controversial, and has significant limitations. Developing models further could offer important insights to support continuous improvement in medical education in future.
Reporting quality of multivariable logistic regression in selected Indian medical journals.
Kumar, R; Indrayan, A; Chhabra, P
2012-01-01
Use of multivariable logistic regression (MLR) modeling has steeply increased in the medical literature over the past few years. Testing of model assumptions and adequate reporting of MLR allow the reader to interpret results more accurately. To review the fulfillment of assumptions and reporting quality of MLR in selected Indian medical journals using established criteria. Analysis of published literature. Medknow.com publishes 68 Indian medical journals with open access. Eight of these journals had at least five articles using MLR between the years 1994 to 2008. Articles from each of these journals were evaluated according to the previously established 10-point quality criteria for reporting and to test the MLR model assumptions. SPSS 17 software and non-parametric test (Kruskal-Wallis H, Mann Whitney U, Spearman Correlation). One hundred and nine articles were finally found using MLR for analyzing the data in the selected eight journals. The number of such articles gradually increased after year 2003, but quality score remained almost similar over time. P value, odds ratio, and 95% confidence interval for coefficients in MLR was reported in 75.2% and sufficient cases (>10) per covariate of limiting sample size were reported in the 58.7% of the articles. No article reported the test for conformity of linear gradient for continuous covariates. Total score was not significantly different across the journals. However, involvement of statistician or epidemiologist as a co-author improved the average quality score significantly (P=0.014). Reporting of MLR in many Indian journals is incomplete. Only one article managed to score 8 out of 10 among 109 articles under review. All others scored less. Appropriate guidelines in instructions to authors, and pre-publication review of articles using MLR by a qualified statistician may improve quality of reporting.
Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA
Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui
2014-01-01
Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792
Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan
2011-11-01
To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P < 0.0001) based on testing by the Lagrangemultiplier. Therefore, the over-dispersion dispersed data using a modified Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.
NASA Astrophysics Data System (ADS)
Guan, Yafu; Yang, Shuo; Zhang, Dong H.
2018-04-01
Gaussian process regression (GPR) is an efficient non-parametric method for constructing multi-dimensional potential energy surfaces (PESs) for polyatomic molecules. Since not only the posterior mean but also the posterior variance can be easily calculated, GPR provides a well-established model for active learning, through which PESs can be constructed more efficiently and accurately. We propose a strategy of active data selection for the construction of PESs with emphasis on low energy regions. Through three-dimensional (3D) example of H3, the validity of this strategy is verified. The PESs for two prototypically reactive systems, namely, H + H2O ↔ H2 + OH reaction and H + CH4 ↔ H2 + CH3 reaction are reconstructed. Only 920 and 4000 points are assembled to reconstruct these two PESs respectively. The accuracy of the GP PESs is not only tested by energy errors but also validated by quantum scattering calculations.
NASA Astrophysics Data System (ADS)
Khazaei, Ardeshir; Sarmasti, Negin; Seyf, Jaber Yousefi
2016-03-01
Quantitative structure activity relationship were used to study a series of curcumin-related compounds with inhibitory effect on prostate cancer PC-3 cells, pancreas cancer Panc-1 cells, and colon cancer HT-29 cells. Sphere exclusion method was used to split data set in two categories of train and test set. Multiple linear regression, principal component regression and partial least squares were used as the regression methods. In other hand, to investigate the effect of feature selection methods, stepwise, Genetic algorithm, and simulated annealing were used. In two cases (PC-3 cells and Panc-1 cells), the best models were generated by a combination of multiple linear regression and stepwise (PC-3 cells: r2 = 0.86, q2 = 0.82, pred_r2 = 0.93, and r2m (test) = 0.43, Panc-1 cells: r2 = 0.85, q2 = 0.80, pred_r2 = 0.71, and r2m (test) = 0.68). For the HT-29 cells, principal component regression with stepwise (r2 = 0.69, q2 = 0.62, pred_r2 = 0.54, and r2m (test) = 0.41) is the best method. The QSAR study reveals descriptors which have crucial role in the inhibitory property of curcumin-like compounds. 6ChainCount, T_C_C_1, and T_O_O_7 are the most important descriptors that have the greatest effect. With a specific end goal to design and optimization of novel efficient curcumin-related compounds it is useful to introduce heteroatoms such as nitrogen, oxygen, and sulfur atoms in the chemical structure (reduce the contribution of T_C_C_1 descriptor) and increase the contribution of 6ChainCount and T_O_O_7 descriptors. Models can be useful in the better design of some novel curcumin-related compounds that can be used in the treatment of prostate, pancreas, and colon cancers.
Kwan, Johnny S H; Kung, Annie W C; Sham, Pak C
2011-09-01
Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.
Estimating effects of limiting factors with regression quantiles
Cade, B.S.; Terrell, J.W.; Schroeder, R.L.
1999-01-01
In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.
Chung, Seungjoon; Seo, Chang Duck; Choi, Jae-Hoon; Chung, Jinwook
2014-01-01
Membrane distillation (MD) is an emerging desalination technology as an energy-saving alternative to conventional distillation and reverse osmosis method. The selection of appropriate membrane is a prerequisite for the design of an optimized MD process. We proposed a simple approximation method to evaluate the performance of membranes for MD process. Three hollow fibre-type commercial membranes with different thicknesses and pore sizes were tested. Experimental results showed that one membrane was advantageous due to the highest flux, whereas another membrane was due to the lowest feed temperature drop. Regression analyses and multi-stage calculations were used to account for the trade-offeffects of flux and feed temperature drop. The most desirable membrane was selected from tested membranes in terms of the mean flux in a multi-stage process. This method would be useful for the selection of the membranes without complicated simulation techniques.
William L. Gaines; Andrea L. Lyons; John F. Lehmkuhl; Kenneth J. Raedeke
2005-01-01
We used logistic regression to derive scaled resource selection functions (RSFs) for female black bears at two study areas in the North Cascades Mountains. We tested the hypothesis that the influence of roads would result in potential habitat effectiveness (RSFs without the influence of roads) being greater than realized habitat effectiveness (RSFs with roads). Roads...
ERIC Educational Resources Information Center
Drewery, David; Nevison, Colleen; Pretti, T. Judene; Cormier, Lauren; Barclay, Sage; Pennaforte, Antoine
2016-01-01
This study discusses and tests a conceptual model of co-op work-term quality from a student perspective. Drawing from an earlier exploration of co-op students' perceptions of work-term quality, variables related to role characteristics, interpersonal dynamics, and organizational elements were used in a multiple linear regression analysis to…
Tu, Shin-Ping; Li, Lin; Tsai, Jenny Hsin-Chun; Yip, Mei-Po; Terasaki, Genji; Teh, Chong; Yasui, Yutaka; Hislop, T Gregory; Taylor, Vicky
2013-01-01
Background The Western Pacific region has the highest level of endemic hepatitis B virus (HBV) infection in the world, with the Chinese representing nearly one-third of infected persons globally. HBV carriers are potentially infectious to others and have an increased risk of chronic active hepatitis, cirrhosis, and hepatocellular carcinoma. Studies from the U.S. and Canada demonstrate that immigrants, particularly from Asia, are disproportionately affected by liver cancer. Purpose Given the different health care systems in Seattle and Vancouver, two geographically proximate cities, we examined HBV testing levels and factors associated with testing among Chinese residents of these cities. Methods We surveyed Chinese living in areas of Seattle and Vancouver with relatively high proportions of Chinese residents. In-person interviews were conducted in Cantonese, Mandarin, or English. Our bivariate analyses consisted of the chi-square test, with Fisher’s Exact test as necessary. We then performed unconditional logistic regression, first examining only the city effect as the sole explanatory variable of the model, then assessing the adjusted city effect in a final main-effects model that was constructed through backward selection to select statistically significant variables at alpha = 0.05. Results Survey cooperation rates for Seattle and Vancouver were 58% and 59%, respectively. In Seattle, 48% reported HBV testing, whereas in Vancouver, 55% reported testing. HBV testing in Seattle was lower than in Vancouver, with a crude odds ratio of 0.73 (95% CI = 0.56, 0.94). However after adjusting for demographic, health care access, knowledge, and social support variables, we found no significant differences in HBV testing between the two cities. In our logistic regression model, the odds of HBV testing were greatest when the doctor recommended the test, followed by when the employer asked for the test. Discussion Findings from this study support the need for additional research to examine the effectiveness of clinic-based and workplace interventions to promote HBV testing among immigrants to North America. PMID:19640196
Tu R, Shin-Ping; Li, Lin; Tsai, Jenny Hsin-Chun; Yip, Mei-Po; Terasaki, Genji; Teh, Chong; Yasui, Yutaka; Hislop, T Gregory; Taylor, Vicky
2009-01-01
The Western Pacific region has the highest level of endemic hepatitis B virus (HBV) infection in the world, with the Chinese representing nearly one-third of infected persons globally. HBV carriers are potentially infectious to others and have an increased risk of chronic active hepatitis, cirrhosis, and hepatocellular carcinoma. Studies from the U.S. and Canada demonstrate that immigrants, particularly from Asia, are disproportionately affected by liver cancer. Given the different health care systems in Seattle and Vancouver, two geographically proximate cities, we examined HBV testing levels and factors associated with testing among Chinese residents of these cities. We surveyed Chinese living in areas of Seattle and Vancouver with relatively high proportions of Chinese residents. In-person interviews were conducted in Cantonese, Mandarin, or English. Our bivariate analyses consisted of the chi-square test, with Fisher's Exact test as necessary. We then performed unconditional logistic regression, first examining only the city effect as the sole explanatory variable of the model, then assessing the adjusted city effect in a final main-effects model that was constructed through backward selection to select statistically significant variables at alpha=0.05. Survey cooperation rates for Seattle and Vancouver were 58% and 59%, respectively. In Seattle, 48% reported HBV testing, whereas in Vancouver, 55% reported testing. HBV testing in Seattle was lower than in Vancouver, with a crude odds ratio of 0.73 (95% CI = 0.56, 0.94). However after adjusting for demographic, health care access, knowledge, and social support variables, we found no significant differences in HBV testing between the two cities. In our logistic regression model, the odds of HBV testing were greatest when the doctor recommended the test, followed by when the employer asked for the test. Findings from this study support the need for additional research to examine the effectiveness of clinic-based and workplace interventions to promote HBV testing among immigrants to North America.
NASA Astrophysics Data System (ADS)
Kim, Saejoon
2018-01-01
We consider the problem of low-volatility portfolio selection which has been the subject of extensive research in the field of portfolio selection. To improve the currently existing techniques that rely purely on past information to select low-volatility portfolios, this paper investigates the use of time series regression techniques that make forecasts of future volatility to select the portfolios. In particular, for the first time, the utility of support vector regression and its enhancements as portfolio selection techniques is provided. It is shown that our regression-based portfolio selection provides attractive outperformances compared to the benchmark index and the portfolio defined by a well-known strategy on the data-sets of the S&P 500 and the KOSPI 200.
A survey of variable selection methods in two Chinese epidemiology journals
2010-01-01
Background Although much has been written on developing better procedures for variable selection, there is little research on how it is practiced in actual studies. This review surveys the variable selection methods reported in two high-ranking Chinese epidemiology journals. Methods Articles published in 2004, 2006, and 2008 in the Chinese Journal of Epidemiology and the Chinese Journal of Preventive Medicine were reviewed. Five categories of methods were identified whereby variables were selected using: A - bivariate analyses; B - multivariable analysis; e.g. stepwise or individual significance testing of model coefficients; C - first bivariate analyses, followed by multivariable analysis; D - bivariate analyses or multivariable analysis; and E - other criteria like prior knowledge or personal judgment. Results Among the 287 articles that reported using variable selection methods, 6%, 26%, 30%, 21%, and 17% were in categories A through E, respectively. One hundred sixty-three studies selected variables using bivariate analyses, 80% (130/163) via multiple significance testing at the 5% alpha-level. Of the 219 multivariable analyses, 97 (44%) used stepwise procedures, 89 (41%) tested individual regression coefficients, but 33 (15%) did not mention how variables were selected. Sixty percent (58/97) of the stepwise routines also did not specify the algorithm and/or significance levels. Conclusions The variable selection methods reported in the two journals were limited in variety, and details were often missing. Many studies still relied on problematic techniques like stepwise procedures and/or multiple testing of bivariate associations at the 0.05 alpha-level. These deficiencies should be rectified to safeguard the scientific validity of articles published in Chinese epidemiology journals. PMID:20920252
An updated Italian normative dataset for the Stroop color word test (SCWT).
Brugnolo, A; De Carli, F; Accardo, J; Amore, M; Bosia, L E; Bruzzaniti, C; Cappa, S F; Cocito, L; Colazzo, G; Ferrara, M; Ghio, L; Magi, E; Mancardi, G L; Nobili, F; Pardini, M; Rissotto, R; Serrati, C; Girtler, N
2016-03-01
The Stroop color and word test (SCWT) is widely used to evaluate attention, information processing speed, selective attention, and cognitive flexibility. Normative values for the Italian population are available only for selected age groups, or for the short version of the test. The aim of this study was to provide updated normal values for the full version, balancing groups across gender, age decades, and education. Two kinds of indexes were derived from the performance of 192 normal subjects, divided by decade (from 20 to 90) and level of education (4 levels: 3-5; 6-8; 9-13; >13 years). They were (i) the correct answers achieved for each table in the first 30 s (word items, WI; color items, CI; color word items, CWI) and (ii) the total time required for reading the three tables (word time, WT; color time, CT; color word time, CWT). For each index, the regression model was evaluated using age, education, and gender as independent variables. The normative data were then computed following the equivalent scores method. In the regression model, age and education significantly influenced the performance in each of the 6 indexes, whereas gender had no significant effect. This study confirms the effect of age and education on the main indexes of the Stroop test and provides updated normative data for an Italian healthy population, well balanced across age, education, and gender. It will be useful to Italian researchers studying attentional functions in health and disease.
Optimizing data collection for public health decisions: a data mining approach
2014-01-01
Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484
Optimizing data collection for public health decisions: a data mining approach.
Partington, Susan N; Papakroni, Vasil; Menzies, Tim
2014-06-12
Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications
Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric
2016-01-01
Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
Retrieval and Mapping of Heavy Metal Concentration in Soil Using Time Series Landsat 8 Imagery
NASA Astrophysics Data System (ADS)
Fang, Y.; Xu, L.; Peng, J.; Wang, H.; Wong, A.; Clausi, D. A.
2018-04-01
Heavy metal pollution is a critical global environmental problem which has always been a concern. Traditional approach to obtain heavy metal concentration relying on field sampling and lab testing is expensive and time consuming. Although many related studies use spectrometers data to build relational model between heavy metal concentration and spectra information, and then use the model to perform prediction using the hyperspectral imagery, this manner can hardly quickly and accurately map soil metal concentration of an area due to the discrepancies between spectrometers data and remote sensing imagery. Taking the advantage of easy accessibility of Landsat 8 data, this study utilizes Landsat 8 imagery to retrieve soil Cu concentration and mapping its distribution in the study area. To enlarge the spectral information for more accurate retrieval and mapping, 11 single date Landsat 8 imagery from 2013-2017 are selected to form a time series imagery. Three regression methods, partial least square regression (PLSR), artificial neural network (ANN) and support vector regression (SVR) are used to model construction. By comparing these models unbiasedly, the best model are selected to mapping Cu concentration distribution. The produced distribution map shows a good spatial autocorrelation and consistency with the mining area locations.
Schorer, Jörg; Rienhoff, Rebecca; Fischer, Lennart; Baker, Joseph
2017-01-01
In most sports, the development of elite athletes is a long-term process of talent identification and support. Typically, talent selection systems administer a multi-faceted strategy including national coach observations and varying physical and psychological tests when deciding who is chosen for talent development. The aim of this exploratory study was to evaluate the prognostic validity of talent selections by varying groups 10 years after they had been conducted. This study used a unique, multi-phased approach. Phase 1 involved players (n = 68) in 2001 completing a battery of general and sport-specific tests of handball ‘talent’ and performance. In Phase 2, national and regional coaches (n = 7) in 2001 who attended training camps identified the most talented players. In Phase 3, current novice and advanced handball players (n = 12 in each group) selected the most talented from short videos of matches played during the talent camp. Analyses compared predictions among all groups with a best model-fit derived from the motor tests. Results revealed little difference between regional and national coaches in the prediction of future performance and little difference in forecasting performance between novices and players. The best model-fit regression by the motor-tests outperformed all predictions. While several limitations are discussed, this study is a useful starting point for future investigations considering athlete selection decisions in talent identification in sport. PMID:28744238
Schorer, Jörg; Rienhoff, Rebecca; Fischer, Lennart; Baker, Joseph
2017-01-01
In most sports, the development of elite athletes is a long-term process of talent identification and support. Typically, talent selection systems administer a multi-faceted strategy including national coach observations and varying physical and psychological tests when deciding who is chosen for talent development. The aim of this exploratory study was to evaluate the prognostic validity of talent selections by varying groups 10 years after they had been conducted. This study used a unique, multi-phased approach. Phase 1 involved players ( n = 68) in 2001 completing a battery of general and sport-specific tests of handball 'talent' and performance. In Phase 2, national and regional coaches ( n = 7) in 2001 who attended training camps identified the most talented players. In Phase 3, current novice and advanced handball players ( n = 12 in each group) selected the most talented from short videos of matches played during the talent camp. Analyses compared predictions among all groups with a best model-fit derived from the motor tests. Results revealed little difference between regional and national coaches in the prediction of future performance and little difference in forecasting performance between novices and players. The best model-fit regression by the motor-tests outperformed all predictions. While several limitations are discussed, this study is a useful starting point for future investigations considering athlete selection decisions in talent identification in sport.
NASA Astrophysics Data System (ADS)
Kneringer, Philipp; Dietz, Sebastian; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Low-visibility conditions have a large impact on aviation safety and economic efficiency of airports and airlines. To support decision makers, we develop a statistical probabilistic nowcasting tool for the occurrence of capacity-reducing operations related to low visibility. The probabilities of four different low visibility classes are predicted with an ordered logistic regression model based on time series of meteorological point measurements. Potential predictor variables for the statistical models are visibility, humidity, temperature and wind measurements at several measurement sites. A stepwise variable selection method indicates that visibility and humidity measurements are the most important model inputs. The forecasts are tested with a 30 minute forecast interval up to two hours, which is a sufficient time span for tactical planning at Vienna Airport. The ordered logistic regression models outperform persistence and are competitive with human forecasters.
A comparison of fitness-case sampling methods for genetic programming
NASA Astrophysics Data System (ADS)
Martínez, Yuliana; Naredo, Enrique; Trujillo, Leonardo; Legrand, Pierrick; López, Uriel
2017-11-01
Genetic programming (GP) is an evolutionary computation paradigm for automatic program induction. GP has produced impressive results but it still needs to overcome some practical limitations, particularly its high computational cost, overfitting and excessive code growth. Recently, many researchers have proposed fitness-case sampling methods to overcome some of these problems, with mixed results in several limited tests. This paper presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world data-sets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. Comparisons are carried out using non-parametric multigroup tests and post hoc pairwise statistical tests. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification.
McManus, I C; Dewberry, Chris; Nicholson, Sandra; Dowell, Jonathan S; Woolf, Katherine; Potts, Henry W W
2013-11-14
Measures used for medical student selection should predict future performance during training. A problem for any selection study is that predictor-outcome correlations are known only in those who have been selected, whereas selectors need to know how measures would predict in the entire pool of applicants. That problem of interpretation can be solved by calculating construct-level predictive validity, an estimate of true predictor-outcome correlation across the range of applicant abilities. Construct-level predictive validities were calculated in six cohort studies of medical student selection and training (student entry, 1972 to 2009) for a range of predictors, including A-levels, General Certificates of Secondary Education (GCSEs)/O-levels, and aptitude tests (AH5 and UK Clinical Aptitude Test (UKCAT)). Outcomes included undergraduate basic medical science and finals assessments, as well as postgraduate measures of Membership of the Royal Colleges of Physicians of the United Kingdom (MRCP(UK)) performance and entry in the Specialist Register. Construct-level predictive validity was calculated with the method of Hunter, Schmidt and Le (2006), adapted to correct for right-censorship of examination results due to grade inflation. Meta-regression analyzed 57 separate predictor-outcome correlations (POCs) and construct-level predictive validities (CLPVs). Mean CLPVs are substantially higher (.450) than mean POCs (.171). Mean CLPVs for first-year examinations, were high for A-levels (.809; CI: .501 to .935), and lower for GCSEs/O-levels (.332; CI: .024 to .583) and UKCAT (mean = .245; CI: .207 to .276). A-levels had higher CLPVs for all undergraduate and postgraduate assessments than did GCSEs/O-levels and intellectual aptitude tests. CLPVs of educational attainment measures decline somewhat during training, but continue to predict postgraduate performance. Intellectual aptitude tests have lower CLPVs than A-levels or GCSEs/O-levels. Educational attainment has strong CLPVs for undergraduate and postgraduate performance, accounting for perhaps 65% of true variance in first year performance. Such CLPVs justify the use of educational attainment measure in selection, but also raise a key theoretical question concerning the remaining 35% of variance (and measurement error, range restriction and right-censorship have been taken into account). Just as in astrophysics, 'dark matter' and 'dark energy' are posited to balance various theoretical equations, so medical student selection must also have its 'dark variance', whose nature is not yet properly characterized, but explains a third of the variation in performance during training. Some variance probably relates to factors which are unpredictable at selection, such as illness or other life events, but some is probably also associated with factors such as personality, motivation or study skills.
2013-01-01
Background Measures used for medical student selection should predict future performance during training. A problem for any selection study is that predictor-outcome correlations are known only in those who have been selected, whereas selectors need to know how measures would predict in the entire pool of applicants. That problem of interpretation can be solved by calculating construct-level predictive validity, an estimate of true predictor-outcome correlation across the range of applicant abilities. Methods Construct-level predictive validities were calculated in six cohort studies of medical student selection and training (student entry, 1972 to 2009) for a range of predictors, including A-levels, General Certificates of Secondary Education (GCSEs)/O-levels, and aptitude tests (AH5 and UK Clinical Aptitude Test (UKCAT)). Outcomes included undergraduate basic medical science and finals assessments, as well as postgraduate measures of Membership of the Royal Colleges of Physicians of the United Kingdom (MRCP(UK)) performance and entry in the Specialist Register. Construct-level predictive validity was calculated with the method of Hunter, Schmidt and Le (2006), adapted to correct for right-censorship of examination results due to grade inflation. Results Meta-regression analyzed 57 separate predictor-outcome correlations (POCs) and construct-level predictive validities (CLPVs). Mean CLPVs are substantially higher (.450) than mean POCs (.171). Mean CLPVs for first-year examinations, were high for A-levels (.809; CI: .501 to .935), and lower for GCSEs/O-levels (.332; CI: .024 to .583) and UKCAT (mean = .245; CI: .207 to .276). A-levels had higher CLPVs for all undergraduate and postgraduate assessments than did GCSEs/O-levels and intellectual aptitude tests. CLPVs of educational attainment measures decline somewhat during training, but continue to predict postgraduate performance. Intellectual aptitude tests have lower CLPVs than A-levels or GCSEs/O-levels. Conclusions Educational attainment has strong CLPVs for undergraduate and postgraduate performance, accounting for perhaps 65% of true variance in first year performance. Such CLPVs justify the use of educational attainment measure in selection, but also raise a key theoretical question concerning the remaining 35% of variance (and measurement error, range restriction and right-censorship have been taken into account). Just as in astrophysics, ‘dark matter’ and ‘dark energy’ are posited to balance various theoretical equations, so medical student selection must also have its ‘dark variance’, whose nature is not yet properly characterized, but explains a third of the variation in performance during training. Some variance probably relates to factors which are unpredictable at selection, such as illness or other life events, but some is probably also associated with factors such as personality, motivation or study skills. PMID:24229353
Scheme, Erik J; Englehart, Kevin B
2013-07-01
When controlling a powered upper limb prosthesis it is important not only to know how to move the device, but also when not to move. A novel approach to pattern recognition control, using a selective multiclass one-versus-one classification scheme has been shown to be capable of rejecting unintended motions. This method was shown to outperform other popular classification schemes when presented with muscle contractions that did not correspond to desired actions. In this work, a 3-D Fitts' Law test is proposed as a suitable alternative to using virtual limb environments for evaluating real-time myoelectric control performance. The test is used to compare the selective approach to a state-of-the-art linear discriminant analysis classification based scheme. The framework is shown to obey Fitts' Law for both control schemes, producing linear regression fittings with high coefficients of determination (R(2) > 0.936). Additional performance metrics focused on quality of control are discussed and incorporated in the evaluation. Using this framework the selective classification based scheme is shown to produce significantly higher efficiency and completion rates, and significantly lower overshoot and stopping distances, with no significant difference in throughput.
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
Variable Selection in Logistic Regression.
1987-06-01
23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah
Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan
2012-12-01
A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.
Study of Personnel Attrition and Revocation within U.S. Marine Corps Air Traffic Control Specialties
2012-03-01
Entrance Processing Stations (MEPS) and recruit depots, to include non-cognitive testing, such as Navy Computer Adaptive Personality Scales ( NCAPS ...Revocation, Selection, MOS, Regression, Probit, dProbit, STATA, Statistics, Marginal Effects, ASVAB, AFQT, Composite Scores, Screening, NCAPS 15. NUMBER...Navy Computer Adaptive Personality Scales ( NCAPS ), during recruitment. It is also recommended that an economic analysis be conducted comparing the
Accounting for informatively missing data in logistic regression by means of reassessment sampling.
Lin, Ji; Lyles, Robert H
2015-05-20
We explore the 'reassessment' design in a logistic regression setting, where a second wave of sampling is applied to recover a portion of the missing data on a binary exposure and/or outcome variable. We construct a joint likelihood function based on the original model of interest and a model for the missing data mechanism, with emphasis on non-ignorable missingness. The estimation is carried out by numerical maximization of the joint likelihood function with close approximation of the accompanying Hessian matrix, using sharable programs that take advantage of general optimization routines in standard software. We show how likelihood ratio tests can be used for model selection and how they facilitate direct hypothesis testing for whether missingness is at random. Examples and simulations are presented to demonstrate the performance of the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Kiram, J. J.; Sulaiman, J.; Swanto, S.; Din, W. A.
2015-10-01
This study aims to construct a mathematical model of the relationship between a student's Language Learning Strategy usage and English Language proficiency. Fifty-six pre-university students of University Malaysia Sabah participated in this study. A self-report questionnaire called the Strategy Inventory for Language Learning was administered to them to measure their language learning strategy preferences before they sat for the Malaysian University English Test (MUET), the results of which were utilised to measure their English language proficiency. We attempted the model assessment specific to Multiple Linear Regression Analysis subject to variable selection using Stepwise regression. We conducted various assessments to the model obtained, including the Global F-test, Root Mean Square Error and R-squared. The model obtained suggests that not all language learning strategies should be included in the model in an attempt to predict Language Proficiency.
Zhang, J; Feng, J-Y; Ni, Y-L; Wen, Y-J; Niu, Y; Tamba, C L; Yue, C; Song, Q; Zhang, Y-M
2017-06-01
Multilocus genome-wide association studies (GWAS) have become the state-of-the-art procedure to identify quantitative trait nucleotides (QTNs) associated with complex traits. However, implementation of multilocus model in GWAS is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multilocus GWAS under polygenic background control. We used an algorithm of model transformation that whitened the covariance matrix of the polygenic matrix K and environmental noise. Markers on one chromosome were included simultaneously in a multilocus model and least angle regression was used to select the most potentially associated single-nucleotide polymorphisms (SNPs), whereas the markers on the other chromosomes were used to calculate kinship matrix as polygenic background control. The selected SNPs in multilocus model were further detected for their association with the trait by empirical Bayes and likelihood ratio test. We herein refer to this method as the pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes). Results from simulation studies showed that pLARmEB was more powerful in QTN detection and more accurate in QTN effect estimation, had less false positive rate and required less computing time than Bayesian hierarchical generalized linear model, efficient mixed model association (EMMA) and least angle regression plus empirical Bayes. pLARmEB, multilocus random-SNP-effect mixed linear model and fast multilocus random-SNP-effect EMMA methods had almost equal power of QTN detection in simulation experiments. However, only pLARmEB identified 48 previously reported genes for 7 flowering time-related traits in Arabidopsis thaliana.
Age and motives for volunteering: testing hypotheses derived from socioemotional selectivity theory.
Okun, Morris A; Schultz, Amy
2003-06-01
Following a meta-analysis of the relations between age and volunteer motives (career, understanding, enhancement, protective, making friends, social, and values), the authors tested hypotheses derived from socioemotional selectivity theory regarding the effects of age on these volunteer motives. The Volunteer Functions Inventory was completed by 523 volunteers from 2 affiliates of the International Habitat for Humanity. Multiple regression analyses revealed, as predicted, that as age increases, career and understanding volunteer motivation decrease and social volunteer motivation increases. Contrary to expectations, age did not contribute to the prediction of enhancement, protective, and values volunteer motivations and the relation between age and making friends volunteer motivation was nonlinear. The results were discussed in the context of age-differential and age-similarity perspectives on volunteer motivation.
Guillaume, Bryan; Wang, Changqing; Poh, Joann; Shen, Mo Jun; Ong, Mei Lyn; Tan, Pei Fang; Karnani, Neerja; Meaney, Michael; Qiu, Anqi
2018-06-01
Statistical inference on neuroimaging data is often conducted using a mass-univariate model, equivalent to fitting a linear model at every voxel with a known set of covariates. Due to the large number of linear models, it is challenging to check if the selection of covariates is appropriate and to modify this selection adequately. The use of standard diagnostics, such as residual plotting, is clearly not practical for neuroimaging data. However, the selection of covariates is crucial for linear regression to ensure valid statistical inference. In particular, the mean model of regression needs to be reasonably well specified. Unfortunately, this issue is often overlooked in the field of neuroimaging. This study aims to adopt the existing Confounder Adjusted Testing and Estimation (CATE) approach and to extend it for use with neuroimaging data. We propose a modification of CATE that can yield valid statistical inferences using Principal Component Analysis (PCA) estimators instead of Maximum Likelihood (ML) estimators. We then propose a non-parametric hypothesis testing procedure that can improve upon parametric testing. Monte Carlo simulations show that the modification of CATE allows for more accurate modelling of neuroimaging data and can in turn yield a better control of False Positive Rate (FPR) and Family-Wise Error Rate (FWER). We demonstrate its application to an Epigenome-Wide Association Study (EWAS) on neonatal brain imaging and umbilical cord DNA methylation data obtained as part of a longitudinal cohort study. Software for this CATE study is freely available at http://www.bioeng.nus.edu.sg/cfa/Imaging_Genetics2.html. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
A metabolomic study of low estimated GFR in non-proteinuric type 2 diabetes mellitus.
Ng, D P K; Salim, A; Liu, Y; Zou, L; Xu, F G; Huang, S; Leong, H; Ong, C N
2012-02-01
We carried out a urinary metabolomic study to gain insight into low estimated GFR (eGFR) in patients with non-proteinuric type 2 diabetes. Patients were identified as being non-proteinuric using multiple urinalyses. Cases (n = 44) with low eGFR and controls (n = 46) had eGFR values <60 and ≥60 ml min(-1) 1.73 m(-2), respectively, as calculated using the Modification of Diet in Renal Disease formula. Urine samples were analysed by liquid chromatography/mass spectrometry (LC/MS) and GC/MS. False discovery rates were used to adjust for multiple hypotheses testing, and selection of metabolites that best predicted low eGFR status was achieved using least absolute shrinkage and selection operator logistic regression. Eleven GC/MS metabolites were strongly associated with low eGFR after correction for multiple hypotheses testing (smallest adjusted p value = 2.62 × 10(-14), largest adjusted p value = 3.84 × 10(-2)). In regression analysis, octanol, oxalic acid, phosphoric acid, benzamide, creatinine, 3,5-dimethoxymandelic amide and N-acetylglutamine were selected as the best subset for prediction and allowed excellent classification of low eGFR (AUC = 0.996). In LC/MS, 19 metabolites remained significant after multiple hypotheses testing had been taken into account (smallest adjusted p value = 2.04 × 10(-4), largest adjusted p value = 4.48 × 10(-2)), and several metabolites showed stronger evidence of association relative to the uraemic toxin, indoxyl sulphate (adjusted p value = 3.03 × 10(-2)). The potential effect of confounding on the association between metabolites was excluded. Our study has yielded substantial new insight into low eGFR and provided a collection of potential urinary biomarkers for its detection.
Cawley, Gavin C; Talbot, Nicola L C
2006-10-01
Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/
Association Between Socio-Demographic Background and Self-Esteem of University Students.
Haq, Muhammad Ahsan Ul
2016-12-01
The purpose of this study was to scrutinize self-esteem of university students and explore association of self-esteem with academic achievement, gender and other factors. A sample of 346 students was selected from Punjab University, Lahore Pakistan. Rosenberg self-esteem scale with demographic variables was used for data collection. Besides descriptive statistics, binary logistic regression and t test were used for analysing the data. Significant gender difference was observed, self-esteem was significantly higher in males than females. Logistic regression indicates that age, medium of instruction, family income, student monthly expenditures, GPA and area of residence has direct effect on self-esteem; while number of siblings showed an inverse effect.
Barriers and benefits of a healthy diet in spain: comparison with other European member states.
Holgado, B; de Irala-Estévez, J; Martínez-González, M A; Gibney, M; Kearney, J; Martínez, J A
2000-06-01
Our purpose was to identify the main barriers and benefits perceived by the European citizens in regard to following a healthy diet and to assess the differences in expected benefits and difficulties between Spain and the remaining countries of the European Union. A cross-sectional study in which quota-controlled, nationally representative samples of approximately 1000 adults from each country completed a questionnaire. The survey was carried out between October 1995 and February 1996 in the 15 member states of the European Union. Participants (aged 15 y and older) were selected and interviewed in their homes about their attitudes towards healthy diets. They were asked to select two options from a list of 22 potential barriers to achieve a healthy diet and the benefits derived from a healthy diet. The associations of the perceived benefits of barriers with the sociodemographic variables within Spain and the rest of the European Union were compared with the Pearson chi-squared test and the chi-squared linear trend test. Two multivariate logistic regression models were also fitted to assess the characteristics independently related to the selection of 'Resistance to change' among the main barriers and to the selection of 'Prevent disease/stay healthy' as the main perceived benefits. The barrier most frequently mentioned in Spain was 'Irregular work hours' (29.7%) in contrast with the rest of the European Union where 'Giving up foods that I like' was the barrier most often chosen (26.2%). In the multivariate logistic regression model studying resistance to change, Spaniards were less resistant to change than the rest of the European Union. The benefit more frequently mentioned across Europe was 'Prevent disease/stay healthy'. In the multivariate logistic regression model women, older individuals, and people with a higher educational level were more likely to choose this benefit. It is apparent that there are many barriers to achieve healthy eating, mostly lack of time. For this reason a higher availability of food in line with the nutrition guidelines could be helpful. The population could have a better knowledge of the benefits derived from a healthy diet.
[How medical students perform academically by admission types?].
Kim, Se-Hoon; Lee, Keumho; Hur, Yera; Kim, Ji-Ha
2013-09-01
Despite the importance of selecting students whom are capable for medical education and to become a good doctor, not enough studies have been done in the category. This study focused on analysing the medical students' academic performance (grade point average, GPA) differences, flunk and dropout rates by admission types. From 2004 to 2010, we gathered 369 Konyang University College of Medicine's students admission data and analyzed the differences between admission method and academic achievement, differences in failure and dropout rates. Analysis of variance (ANOVA), ordinary least square, and logistic regression were used. The rolling students showed higher academic achievement from year 1 to 3 than regular students (p < 0.01). Using admission type variable as control variable in multiple regression model similar results were shown. But unlike the results of ANOVA, GPA differences by admission types were shown not only in lower academic years but also in year 6 (p < 0.01). From the regression analysis of flunk and dropout rate by admission types, regular admission type students showed higher drop out rate than the rolling ones which demonstrates admission types gives significant effect on flunk or dropout rates in medical students (p < 0.01). The rolling admissions type students tend to show lower flunk rate and dropout rates and perform better academically. This implies selecting students primarily by Korean College Scholastic Ability Test does not guarantee their academic success in medical education. Thus we suggest a more in-depth comprehensive method of selecting students that are appropriate to individual medical school's educational goal.
A Robust Shape Reconstruction Method for Facial Feature Point Detection.
Tan, Shuqiu; Chen, Dongyi; Guo, Chenggang; Huang, Zhiqi
2017-01-01
Facial feature point detection has been receiving great research advances in recent years. Numerous methods have been developed and applied in practical face analysis systems. However, it is still a quite challenging task because of the large variability in expression and gestures and the existence of occlusions in real-world photo shoot. In this paper, we present a robust sparse reconstruction method for the face alignment problems. Instead of a direct regression between the feature space and the shape space, the concept of shape increment reconstruction is introduced. Moreover, a set of coupled overcomplete dictionaries termed the shape increment dictionary and the local appearance dictionary are learned in a regressive manner to select robust features and fit shape increments. Additionally, to make the learned model more generalized, we select the best matched parameter set through extensive validation tests. Experimental results on three public datasets demonstrate that the proposed method achieves a better robustness over the state-of-the-art methods.
Penalized regression procedures for variable selection in the potential outcomes framework
Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.
2015-01-01
A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185
Liquid electrolyte informatics using an exhaustive search with linear regression.
Sodeyama, Keitaro; Igarashi, Yasuhiko; Nakayama, Tomofumi; Tateyama, Yoshitaka; Okada, Masato
2018-06-14
Exploring new liquid electrolyte materials is a fundamental target for developing new high-performance lithium-ion batteries. In contrast to solid materials, disordered liquid solution properties have been less studied by data-driven information techniques. Here, we examined the estimation accuracy and efficiency of three information techniques, multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), and exhaustive search with linear regression (ES-LiR), by using coordination energy and melting point as test liquid properties. We then confirmed that ES-LiR gives the most accurate estimation among the techniques. We also found that ES-LiR can provide the relationship between the "prediction accuracy" and "calculation cost" of the properties via a weight diagram of descriptors. This technique makes it possible to choose the balance of the "accuracy" and "cost" when the search of a huge amount of new materials was carried out.
New insights into old methods for identifying causal rare variants.
Wang, Haitian; Huang, Chien-Hsun; Lo, Shaw-Hwa; Zheng, Tian; Hu, Inchi
2011-11-29
The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.
Proportional estimation of finger movements from high-density surface electromyography.
Celadon, Nicolò; Došen, Strahinja; Binder, Iris; Ariano, Paolo; Farina, Dario
2016-08-04
The importance to restore the hand function following an injury/disease of the nervous system led to the development of novel rehabilitation interventions. Surface electromyography can be used to create a user-driven control of a rehabilitation robot, in which the subject needs to engage actively, by using spared voluntary activation to trigger the assistance of the robot. The study investigated methods for the selective estimation of individual finger movements from high-density surface electromyographic signals (HD-sEMG) with minimal interference between movements of other fingers. Regression was evaluated in online and offline control tests with nine healthy subjects (per test) using a linear discriminant analysis classifier (LDA), a common spatial patterns proportional estimator (CSP-PE), and a thresholding (THR) algorithm. In all tests, the subjects performed an isometric force tracking task guided by a moving visual marker indicating the contraction type (flexion/extension), desired activation level and the finger that should be moved. The outcome measures were mean square error (nMSE) between the reference and generated trajectories normalized to the peak-to-peak value of the reference, the classification accuracy (CA), the mean amplitude of the false activations (MAFA) and, in the offline tests only, the Pearson correlation coefficient (PCORR). The offline tests demonstrated that, for the reduced number of electrodes (≤24), the CSP-PE outperformed the LDA with higher precision of proportional estimation and less crosstalk between the movement classes (e.g., 8 electrodes, median MAFA ~ 0.6 vs. 1.1 %, median nMSE ~ 4.3 vs. 5.5 %). The LDA and the CSP-PE performed similarly in the online tests (median nMSE < 3.6 %, median MAFA < 0.7 %), but the CSP-PE provided a more stable performance across the tested conditions (less improvement between different sessions). Furthermore, THR, exploiting topographical information about the single finger activity from HD-sEMG, provided in many cases a regression accuracy similar to that of the pattern recognition techniques, but the performance was not consistent across subjects and fingers. The CSP-PE is a method of choice for selective individual finger control with the limited number of electrodes (<24), whereas for the higher resolution of the recording, either method (CPS-PA or LDA) can be used with a similar performance. Despite the abundance of detection points, the simple THR showed to be significantly worse compared to both pattern recognition/regression methods. Nevertheless, THR is a simple method to apply (no training), and it could still give satisfactory performance in some subjects and/or simpler scenarios (e.g., control of selected fingers). These conclusions are important for guiding future developments towards the clinical application of the methods for individual finger control in rehabilitation robotics.
LINKING LUNG AIRWAY STRUCTURE TO PULMONARY FUNCTION VIA COMPOSITE BRIDGE REGRESSION
Chen, Kun; Hoffman, Eric A.; Seetharaman, Indu; Jiao, Feiran; Lin, Ching-Long; Chan, Kung-Sik
2017-01-01
The human lung airway is a complex inverted tree-like structure. Detailed airway measurements can be extracted from MDCT-scanned lung images, such as segmental wall thickness, airway diameter, parent-child branch angles, etc. The wealth of lung airway data provides a unique opportunity for advancing our understanding of the fundamental structure-function relationships within the lung. An important problem is to construct and identify important lung airway features in normal subjects and connect these to standardized pulmonary function test results such as FEV1%. Among other things, the problem is complicated by the fact that a particular airway feature may be an important (relevant) predictor only when it pertains to segments of certain generations. Thus, the key is an efficient, consistent method for simultaneously conducting group selection (lung airway feature types) and within-group variable selection (airway generations), i.e., bi-level selection. Here we streamline a comprehensive procedure to process the lung airway data via imputation, normalization, transformation and groupwise principal component analysis, and then adopt a new composite penalized regression approach for conducting bi-level feature selection. As a prototype of composite penalization, the proposed composite bridge regression method is shown to admit an efficient algorithm, enjoy bi-level oracle properties, and outperform several existing methods. We analyze the MDCT lung image data from a cohort of 132 subjects with normal lung function. Our results show that, lung function in terms of FEV1% is promoted by having a less dense and more homogeneous lung comprising an airway whose segments enjoy more heterogeneity in wall thicknesses, larger mean diameters, lumen areas and branch angles. These data hold the potential of defining more accurately the “normal” subject population with borderline atypical lung functions that are clearly influenced by many genetic and environmental factors. PMID:28280520
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Brügemann, K; Gernand, E; von Borstel, U U; König, S
2011-08-01
Data used in the present study included 1,095,980 first-lactation test-day records for protein yield of 154,880 Holstein cows housed on 196 large-scale dairy farms in Germany. Data were recorded between 2002 and 2009 and merged with meteorological data from public weather stations. The maximum distance between each farm and its corresponding weather station was 50 km. Hourly temperature-humidity indexes (THI) were calculated using the mean of hourly measurements of dry bulb temperature and relative humidity. On the phenotypic scale, an increase in THI was generally associated with a decrease in daily protein yield. For genetic analyses, a random regression model was applied using time-dependent (d in milk, DIM) and THI-dependent covariates. Additive genetic and permanent environmental effects were fitted with this random regression model and Legendre polynomials of order 3 for DIM and THI. In addition, the fixed curve was modeled with Legendre polynomials of order 3. Heterogeneous residuals were fitted by dividing DIM into 5 classes, and by dividing THI into 4 classes, resulting in 20 different classes. Additive genetic variances for daily protein yield decreased with increasing degrees of heat stress and were lowest at the beginning of lactation and at extreme THI. Due to higher additive genetic variances, slightly higher permanent environment variances, and similar residual variances, heritabilities were highest for low THI in combination with DIM at the end of lactation. Genetic correlations among individual values for THI were generally >0.90. These trends from the complex random regression model were verified by applying relatively simple bivariate animal models for protein yield measured in 2 THI environments; that is, defining a THI value of 60 as a threshold. These high correlations indicate the absence of any substantial genotype × environment interaction for protein yield. However, heritabilities and additive genetic variances from the random regression model tended to be slightly higher in the THI range corresponding to cows' comfort zone. Selecting such superior environments for progeny testing can contribute to an accurate genetic differentiation among selection candidates. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tharrington, Arnold N.
2015-09-09
The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.
Eash, David A.; Barnes, Kimberlee K.; Veilleux, Andrea G.
2013-01-01
A statewide study was performed to develop regional regression equations for estimating selected annual exceedance-probability statistics for ungaged stream sites in Iowa. The study area comprises streamgages located within Iowa and 50 miles beyond the State’s borders. Annual exceedance-probability estimates were computed for 518 streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data through 2010. The estimation of the selected statistics included a Bayesian weighted least-squares/generalized least-squares regression analysis to update regional skew coefficients for the 518 streamgages. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low flows. Also, geographic information system software was used to measure 59 selected basin characteristics for each streamgage. Regional regression analysis, using generalized least-squares regression, was used to develop a set of equations for each flood region in Iowa for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. A total of 394 streamgages were included in the development of regional regression equations for three flood regions (regions 1, 2, and 3) that were defined for Iowa based on landform regions and soil regions. Average standard errors of prediction range from 31.8 to 45.2 percent for flood region 1, 19.4 to 46.8 percent for flood region 2, and 26.5 to 43.1 percent for flood region 3. The pseudo coefficients of determination for the generalized least-squares equations range from 90.8 to 96.2 percent for flood region 1, 91.5 to 97.9 percent for flood region 2, and 92.4 to 96.0 percent for flood region 3. The regression equations are applicable only to stream sites in Iowa with flows not significantly affected by regulation, diversion, channelization, backwater, or urbanization and with basin characteristics within the range of those used to develop the equations. These regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the eight selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided by the Web-based tool. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these eight selected statistics are provided for the streamgage.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhong, H; Wang, J; Shen, L
Purpose: The purpose of this study is to investigate the relationship between computed tomographic (CT) texture features of primary lesions and metastasis-free survival for rectal cancer patients; and to develop a datamining prediction model using texture features. Methods: A total of 220 rectal cancer patients treated with neoadjuvant chemo-radiotherapy (CRT) were enrolled in this study. All patients underwent CT scans before CRT. The primary lesions on the CT images were delineated by two experienced oncologists. The CT images were filtered by Laplacian of Gaussian (LoG) filters with different filter values (1.0–2.5: from fine to coarse). Both filtered and unfiltered imagesmore » were analyzed using Gray-level Co-occurrence Matrix (GLCM) texture analysis with different directions (transversal, sagittal, and coronal). Totally, 270 texture features with different species, directions and filter values were extracted. Texture features were examined with Student’s t-test for selecting predictive features. Principal Component Analysis (PCA) was performed upon the selected features to reduce the feature collinearity. Artificial neural network (ANN) and logistic regression were applied to establish metastasis prediction models. Results: Forty-six of 220 patients developed metastasis with a follow-up time of more than 2 years. Sixtyseven texture features were significantly different in t-test (p<0.05) between patients with and without metastasis, and 12 of them were extremely significant (p<0.001). The Area-under-the-curve (AUC) of ANN was 0.72, and the concordance index (CI) of logistic regression was 0.71. The predictability of ANN was slightly better than logistic regression. Conclusion: CT texture features of primary lesions are related to metastasisfree survival of rectal cancer patients. Both ANN and logistic regression based models can be developed for prediction.« less
Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N
2017-07-01
Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant role in the calibration while wavelength selection plays a marginal role and the combination of certain pre-processing, wavelength selection, and nonlinear regression methods can achieve superior performance over traditional linear regression-based calibration.
Lee, Mi Hee; Lee, Soo Bong; Eo, Yang Dam; Kim, Sun Woong; Woo, Jung-Hun; Han, Soo Hee
2017-07-01
Landsat optical images have enough spatial and spectral resolution to analyze vegetation growth characteristics. But, the clouds and water vapor degrade the image quality quite often, which limits the availability of usable images for the time series vegetation vitality measurement. To overcome this shortcoming, simulated images are used as an alternative. In this study, weighted average method, spatial and temporal adaptive reflectance fusion model (STARFM) method, and multilinear regression analysis method have been tested to produce simulated Landsat normalized difference vegetation index (NDVI) images of the Korean Peninsula. The test results showed that the weighted average method produced the images most similar to the actual images, provided that the images were available within 1 month before and after the target date. The STARFM method gives good results when the input image date is close to the target date. Careful regional and seasonal consideration is required in selecting input images. During summer season, due to clouds, it is very difficult to get the images close enough to the target date. Multilinear regression analysis gives meaningful results even when the input image date is not so close to the target date. Average R 2 values for weighted average method, STARFM, and multilinear regression analysis were 0.741, 0.70, and 0.61, respectively.
Variable selection and model choice in geoadditive regression models.
Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard
2009-06-01
Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
Ruan, Cheng-Jiang; Xu, Xue-Xuan; Shao, Hong-Bo; Jaleel, Cheruth Abdul
2010-09-01
In the past 20 years, the major effort in plant breeding has changed from quantitative to molecular genetics with emphasis on quantitative trait loci (QTL) identification and marker assisted selection (MAS). However, results have been modest. This has been due to several factors including absence of tight linkage QTL, non-availability of mapping populations, and substantial time needed to develop such populations. To overcome these limitations, and as an alternative to planned populations, molecular marker-trait associations have been identified by the combination between germplasm and the regression technique. In the present preview, the authors (1) survey the successful applications of germplasm-regression-combined (GRC) molecular marker-trait association identification in plants; (2) describe how to do the GRC analysis and its differences from mapping QTL based on a linkage map reconstructed from the planned populations; (3) consider the factors that affect the GRC association identification, including selections of optimal germplasm and molecular markers and testing of identification efficiency of markers associated with traits; and (4) finally discuss the future prospects of GRC marker-trait association analysis used in plant MAS/QTL breeding programs, especially in long-juvenile woody plants when no other genetic information such as linkage maps and QTL are available.
Qin, Zijian; Wang, Maolin; Yan, Aixia
2017-07-01
In this study, quantitative structure-activity relationship (QSAR) models using various descriptor sets and training/test set selection methods were explored to predict the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by using a multiple linear regression (MLR) and a support vector machine (SVM) method. 512 HCV NS3/4A protease inhibitors and their IC 50 values which were determined by the same FRET assay were collected from the reported literature to build a dataset. All the inhibitors were represented with selected nine global and 12 2D property-weighted autocorrelation descriptors calculated from the program CORINA Symphony. The dataset was divided into a training set and a test set by a random and a Kohonen's self-organizing map (SOM) method. The correlation coefficients (r 2 ) of training sets and test sets were 0.75 and 0.72 for the best MLR model, 0.87 and 0.85 for the best SVM model, respectively. In addition, a series of sub-dataset models were also developed. The performances of all the best sub-dataset models were better than those of the whole dataset models. We believe that the combination of the best sub- and whole dataset SVM models can be used as reliable lead designing tools for new NS3/4A protease inhibitors scaffolds in a drug discovery pipeline. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Selective Review of Group Selection in High-Dimensional Models
Huang, Jian; Breheny, Patrick; Ma, Shuangge
2013-01-01
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707
Ahn, Jae Joon; Kim, Young Min; Yoo, Keunje; Park, Joonhong; Oh, Kyong Joo
2012-11-01
For groundwater conservation and management, it is important to accurately assess groundwater pollution vulnerability. This study proposed an integrated model using ridge regression and a genetic algorithm (GA) to effectively select the major hydro-geological parameters influencing groundwater pollution vulnerability in an aquifer. The GA-Ridge regression method determined that depth to water, net recharge, topography, and the impact of vadose zone media were the hydro-geological parameters that influenced trichloroethene pollution vulnerability in a Korean aquifer. When using these selected hydro-geological parameters, the accuracy was improved for various statistical nonlinear and artificial intelligence (AI) techniques, such as multinomial logistic regression, decision trees, artificial neural networks, and case-based reasoning. These results provide a proof of concept that the GA-Ridge regression is effective at determining influential hydro-geological parameters for the pollution vulnerability of an aquifer, and in turn, improves the AI performance in assessing groundwater pollution vulnerability.
Support vector regression methodology for estimating global solar radiation in Algeria
NASA Astrophysics Data System (ADS)
Guermoui, Mawloud; Rabehi, Abdelaziz; Gairaa, Kacem; Benkaciali, Said
2018-01-01
Accurate estimation of Daily Global Solar Radiation (DGSR) has been a major goal for solar energy applications. In this paper we show the possibility of developing a simple model based on the Support Vector Regression (SVM-R), which could be used to estimate DGSR on the horizontal surface in Algeria based only on sunshine ratio as input. The SVM model has been developed and tested using a data set recorded over three years (2005-2007). The data was collected at the Applied Research Unit for Renewable Energies (URAER) in Ghardaïa city. The data collected between 2005-2006 are used to train the model while the 2007 data are used to test the performance of the selected model. The measured and the estimated values of DGSR were compared during the testing phase statistically using the Root Mean Square Error (RMSE), Relative Square Error (rRMSE), and correlation coefficient (r2), which amount to 1.59(MJ/m2), 8.46 and 97,4%, respectively. The obtained results show that the SVM-R is highly qualified for DGSR estimation using only sunshine ratio.
Seroprevalence of human hydatidosis using ELISA method in qom province, central iran.
Rakhshanpour, A; Harandi, M Fasihi; Moazezi, Ss; Rahimi, Mt; Mohebali, M; Mowlavi, Ghh; Babaei, Z; Ariaeipour, M; Heidari, Z; Rokni, Mb
2012-01-01
The objective of this study was to determine the prevalence of cystic echinococcosis (CE) in Qom Province, central Iran using ELISA test. Overall, 1564 serum samples (800 males and 764 females) were collected from selected subjects by randomized cluster sampling in 2011-2012. Sera were analyzed by ELISA test using AgB. Before sampling, a questionnaire was filled out for each case. Data were analyzed using Chi-square test and multivariate logistic regression for risk factors analysis. Seropositivity was 1.6% (25 cases). Males (2.2%) showed significantly more positivity than females (0.9%) (P= 0.03). There was no significant association between CE seropositivity and age group, occupation, and region. Age group of 30-60 years encompassed the highest rate of positivity. The seropositivity of CE was 2.1% and 1.2% for urban and rural cases respectively. Binary logistic regression showed that males were 2.5 times at higher risk for infection than females. Although seroprevalence of CE is relatively low in Qom Province, yet due to the importance of the disease, all preventive measures should be taken into consideration.
Sensitivity and specificity of memory and naming tests for identifying left temporal-lobe epilepsy.
Umfleet, Laura Glass; Janecek, Julie K; Quasney, Erin; Sabsevitz, David S; Ryan, Joseph J; Binder, Jeffrey R; Swanson, Sara J
2015-01-01
The sensitivity and specificity of the Selective Reminding Test (SRT) Delayed Recall, Wechsler Memory Scale (WMS) Logical Memory, the Boston Naming Test (BNT), and two nonverbal memory measures for detecting lateralized dysfunction in association with side of seizure focus was examined in a sample of 143 patients with left or right temporal-lobe epilepsy (TLE). Scores on the SRT and BNT were statistically significantly lower in the left TLE group compared with the right TLE group, whereas no group differences emerged on the Logical Memory subtest. No significant group differences were found with nonverbal memory measures. When the SRT and BNT were both entered as predictors in a logistic regression, the BNT, although significant, added minimal value to the model beyond the variance accounted for by the SRT Delayed Recall. Both variables emerged as significant predictors of side of seizure focus when entered into separate regressions. Sensitivity and specificity of the SRT and BNT ranged from 56% to 65%. The WMS Logical Memory and nonverbal memory measures were not significant predictors of the side of seizure focus.
Meta-regression approximations to reduce publication selection bias.
Stanley, T D; Doucouliagos, Hristos
2014-03-01
Publication selection bias is a serious challenge to the integrity of all empirical sciences. We derive meta-regression approximations to reduce this bias. Our approach employs Taylor polynomial approximations to the conditional mean of a truncated distribution. A quadratic approximation without a linear term, precision-effect estimate with standard error (PEESE), is shown to have the smallest bias and mean squared error in most cases and to outperform conventional meta-analysis estimators, often by a great deal. Monte Carlo simulations also demonstrate how a new hybrid estimator that conditionally combines PEESE and the Egger regression intercept can provide a practical solution to publication selection bias. PEESE is easily expanded to accommodate systematic heterogeneity along with complex and differential publication selection bias that is related to moderator variables. By providing an intuitive reason for these approximations, we can also explain why the Egger regression works so well and when it does not. These meta-regression methods are applied to several policy-relevant areas of research including antidepressant effectiveness, the value of a statistical life, the minimum wage, and nicotine replacement therapy. Copyright © 2013 John Wiley & Sons, Ltd.
Learning accurate and interpretable models based on regularized random forests regression
2014-01-01
Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120
MacKenzie, R K; Dowell, J; Ayansina, D; Cleland, J A
2017-05-01
Traditional methods of assessing personality traits in medical school selection have been heavily criticised. To address this at the point of selection, "non-cognitive" tests were included in the UK Clinical Aptitude Test, the most widely-used aptitude test in UK medical education (UKCAT: http://www.ukcat.ac.uk/ ). We examined the predictive validity of these non-cognitive traits with performance during and on exit from medical school. We sampled all students graduating in 2013 from the 30 UKCAT consortium medical schools. Analysis included: candidate demographics, UKCAT non-cognitive scores, medical school performance data-the Educational Performance Measure (EPM) and national exit situational judgement test (SJT) outcomes. We examined the relationships between these variables and SJT and EPM scores. Multilevel modelling was used to assess the relationships adjusting for confounders. The 3343 students who had taken the UKCAT non-cognitive tests and had both EPM and SJT data were entered into the analysis. There were four types of non-cognitive test: (1) libertariancommunitarian, (2) NACE-narcissism, aloofness, confidence and empathy, (3) MEARS-self-esteem, optimism, control, self-discipline, emotional-nondefensiveness (END) and faking, (4) an abridged version of 1 and 2 combined. Multilevel regression showed that, after correcting for demographic factors, END predicted SJT and EPM decile. Aloofness and empathy in NACE were predictive of SJT score. This is the first large-scale study examining the relationship between performance on non-cognitive selection tests and medical school exit assessments. The predictive validity of these tests was limited, and the relationships revealed do not fit neatly with theoretical expectations. This study does not support their use in selection.
Learning epistatic interactions from sequence-activity data to predict enantioselectivity
NASA Astrophysics Data System (ADS)
Zaugg, Julian; Gumulya, Yosephine; Malde, Alpeshkumar K.; Bodén, Mikael
2017-12-01
Enzymes with a high selectivity are desirable for improving economics of chemical synthesis of enantiopure compounds. To improve enzyme selectivity mutations are often introduced near the catalytic active site. In this compact environment epistatic interactions between residues, where contributions to selectivity are non-additive, play a significant role in determining the degree of selectivity. Using support vector machine regression models we map mutations to the experimentally characterised enantioselectivities for a set of 136 variants of the epoxide hydrolase from the fungus Aspergillus niger (AnEH). We investigate whether the influence a mutation has on enzyme selectivity can be accurately predicted through linear models, and whether prediction accuracy can be improved using higher-order counterparts. Comparing linear and polynomial degree = 2 models, mean Pearson coefficients (r) from 50 {× } 5 -fold cross-validation increase from 0.84 to 0.91 respectively. Equivalent models tested on interaction-minimised sequences achieve values of r=0.90 and r=0.93 . As expected, testing on a simulated control data set with no interactions results in no significant improvements from higher-order models. Additional experimentally derived AnEH mutants are tested with linear and polynomial degree = 2 models, with values increasing from r=0.51 to r=0.87 respectively. The study demonstrates that linear models perform well, however the representation of epistatic interactions in predictive models improves identification of selectivity-enhancing mutations. The improvement is attributed to higher-order kernel functions that represent epistatic interactions between residues.
Learning epistatic interactions from sequence-activity data to predict enantioselectivity
NASA Astrophysics Data System (ADS)
Zaugg, Julian; Gumulya, Yosephine; Malde, Alpeshkumar K.; Bodén, Mikael
2017-12-01
Enzymes with a high selectivity are desirable for improving economics of chemical synthesis of enantiopure compounds. To improve enzyme selectivity mutations are often introduced near the catalytic active site. In this compact environment epistatic interactions between residues, where contributions to selectivity are non-additive, play a significant role in determining the degree of selectivity. Using support vector machine regression models we map mutations to the experimentally characterised enantioselectivities for a set of 136 variants of the epoxide hydrolase from the fungus Aspergillus niger ( AnEH). We investigate whether the influence a mutation has on enzyme selectivity can be accurately predicted through linear models, and whether prediction accuracy can be improved using higher-order counterparts. Comparing linear and polynomial degree = 2 models, mean Pearson coefficients ( r) from 50 {× } 5-fold cross-validation increase from 0.84 to 0.91 respectively. Equivalent models tested on interaction-minimised sequences achieve values of r=0.90 and r=0.93. As expected, testing on a simulated control data set with no interactions results in no significant improvements from higher-order models. Additional experimentally derived AnEH mutants are tested with linear and polynomial degree = 2 models, with values increasing from r=0.51 to r=0.87 respectively. The study demonstrates that linear models perform well, however the representation of epistatic interactions in predictive models improves identification of selectivity-enhancing mutations. The improvement is attributed to higher-order kernel functions that represent epistatic interactions between residues.
Learning epistatic interactions from sequence-activity data to predict enantioselectivity.
Zaugg, Julian; Gumulya, Yosephine; Malde, Alpeshkumar K; Bodén, Mikael
2017-12-01
Enzymes with a high selectivity are desirable for improving economics of chemical synthesis of enantiopure compounds. To improve enzyme selectivity mutations are often introduced near the catalytic active site. In this compact environment epistatic interactions between residues, where contributions to selectivity are non-additive, play a significant role in determining the degree of selectivity. Using support vector machine regression models we map mutations to the experimentally characterised enantioselectivities for a set of 136 variants of the epoxide hydrolase from the fungus Aspergillus niger (AnEH). We investigate whether the influence a mutation has on enzyme selectivity can be accurately predicted through linear models, and whether prediction accuracy can be improved using higher-order counterparts. Comparing linear and polynomial degree = 2 models, mean Pearson coefficients (r) from [Formula: see text]-fold cross-validation increase from 0.84 to 0.91 respectively. Equivalent models tested on interaction-minimised sequences achieve values of [Formula: see text] and [Formula: see text]. As expected, testing on a simulated control data set with no interactions results in no significant improvements from higher-order models. Additional experimentally derived AnEH mutants are tested with linear and polynomial degree = 2 models, with values increasing from [Formula: see text] to [Formula: see text] respectively. The study demonstrates that linear models perform well, however the representation of epistatic interactions in predictive models improves identification of selectivity-enhancing mutations. The improvement is attributed to higher-order kernel functions that represent epistatic interactions between residues.
NASA Astrophysics Data System (ADS)
Bajaj, Ketan; Anbazhagan, P.
2018-01-01
Advancement in the seismic networks results in formulation of different functional forms for developing any new ground motion prediction equation (GMPE) for a region. Till date, various guidelines and tools are available for selecting a suitable GMPE for any seismic study area. However, these methods are efficient in quantifying the GMPE but not for determining a proper functional form and capturing the epistemic uncertainty associated with selection of GMPE. In this study, the compatibility of the recent available functional forms for the active region is tested for distance and magnitude scaling. Analysis is carried out by determining the residuals using the recorded and the predicted spectral acceleration values at different periods. Mixed effect regressions are performed on the calculated residuals for determining the intra- and interevent residuals. Additionally, spatial correlation is used in mixed effect regression by changing its likelihood function. Distance scaling and magnitude scaling are respectively examined by studying the trends of intraevent residuals with distance and the trend of the event term with magnitude. Further, these trends are statistically studied for a respective functional form of a ground motion. Additionally, genetic algorithm and Monte Carlo method are used respectively for calculating the hinge point and standard error for magnitude and distance scaling for a newly determined functional form. The whole procedure is applied and tested for the available strong motion data for the Himalayan region. The functional form used for testing are five Himalayan GMPEs, five GMPEs developed under NGA-West 2 project, two from Pan-European, and one from Japan region. It is observed that bilinear functional form with magnitude and distance hinged at 6.5 M w and 300 km respectively is suitable for the Himalayan region. Finally, a new regression coefficient for peak ground acceleration for a suitable functional form that governs the attenuation characteristic of the Himalayan region is derived.
Seo, Eun Hyun; Han, Ji Young; Sohn, Bo Kyung; Byun, Min Soo; Lee, Jun Ho; Choe, Young Min; Ahn, Suzy; Woo, Jong Inn; Jun, Jongho; Lee, Dong Young
2017-01-01
We aimed to develop a word-reading test for Korean-speaking adults using irregularly pronounced words that would be useful for estimation of premorbid intelligence. A linguist who specialized in Korean phonology selected 94 words that have irregular relationship between orthography and phonology. Sixty cognitively normal elderly (CN) and 31 patients with Alzheimer’s disease (AD) were asked to read out loud the words and were administered the Wechsler Adult Intelligence Scale, 4th edition, Korean version (K-WAIS-IV). Among the 94 words, 50 words that did not show a significant difference between the CN and the AD group were selected and constituted the KART. Using the 30 CN calculation group (CNc), a linear regression equation was obtained in which the observed full-scale IQ (FSIQ) was regressed on the reading errors of the KART, where education was included as an additional variable. When the regressed equation computed from the CNc was applied to 30 CN individuals of the validation group (CNv), the predicted FSIQ adequately fit the observed FSIQ (R2 = 0.63). In addition, independent sample t-test showed that the KART-predicted IQs were not significantly different between the CNv and AD groups, whereas the performance of the AD group was significantly worse in the observed IQs. In addition, an extended validation of the KART was performed with a separate sample consisted of 84 CN, 56 elderly with mild cognitive impairment (MCI), and 43 AD patients who were administered comprehensive neuropsychological assessments in addition to the KART. When the equation obtained from the CNc was applied to the extended validation sample, the KART-predicted IQs of the AD, MCI and the CN groups did not significantly differ, whereas their current global cognition scores significantly differed between the groups. In conclusion, the results support the validity of KART-predicted IQ as an index of premorbid IQ in individuals with AD. PMID:28723964
Benson, Rebecca; von Hippel, Paul T; Lynch, Jamie L
2017-03-21
More educated adults have lower average body mass index (BMI). This may be due to selection, if adolescents with lower BMI attain higher levels of education, or it may be due to causation, if higher educational attainment reduces BMI gain in adulthood. We test for selection and causation in the National Longitudinal Survey of Youth 1979, which has followed a representative US cohort from age 14-22 in 1979 through age 47-55 in 2012. Using ordinal logistic regression, we test the selection hypothesis that overweight and obese adolescents were less likely to earn high school diplomas and bachelor's degrees. Then, controlling for selection with individual fixed effects, we estimate the causal effect of degree completion on BMI and obesity status. Among 18-year-old women, but not among men, being overweight or obese predicts lower odds of attaining higher levels of education. At age 47-48, higher education is associated with lower BMI, but 70-90% of the association is due to selection. Net of selection, a bachelor's degree predicts less than a 1 kg reduction in body weight, and a high school credential does not reduce BMI. Copyright © 2017 Elsevier Ltd. All rights reserved.
Experiment Design for Complex VTOL Aircraft with Distributed Propulsion and Tilt Wing
NASA Technical Reports Server (NTRS)
Murphy, Patrick C.; Landman, Drew
2015-01-01
Selected experimental results from a wind tunnel study of a subscale VTOL concept with distributed propulsion and tilt lifting surfaces are presented. The vehicle complexity and automated test facility were ideal for use with a randomized designed experiment. Design of Experiments and Response Surface Methods were invoked to produce run efficient, statistically rigorous regression models with minimized prediction error. Static tests were conducted at the NASA Langley 12-Foot Low-Speed Tunnel to model all six aerodynamic coefficients over a large flight envelope. This work supports investigations at NASA Langley in developing advanced configurations, simulations, and advanced control systems.
Friedrich, Torben; Rahmann, Sven; Weigel, Wilfried; Rabsch, Wolfgang; Fruth, Angelika; Ron, Eliora; Gunzer, Florian; Dandekar, Thomas; Hacker, Jörg; Müller, Tobias; Dobrindt, Ulrich
2010-10-21
The Enterobacteriaceae comprise a large number of clinically relevant species with several individual subspecies. Overlapping virulence-associated gene pools and the high overall genome plasticity often interferes with correct enterobacterial strain typing and risk assessment. Array technology offers a fast, reproducible and standardisable means for bacterial typing and thus provides many advantages for bacterial diagnostics, risk assessment and surveillance. The development of highly discriminative broad-range microbial diagnostic microarrays remains a challenge, because of marked genome plasticity of many bacterial pathogens. We developed a DNA microarray for strain typing and detection of major antimicrobial resistance genes of clinically relevant enterobacteria. For this purpose, we applied a global genome-wide probe selection strategy on 32 available complete enterobacterial genomes combined with a regression model for pathogen classification. The discriminative power of the probe set was further tested in silico on 15 additional complete enterobacterial genome sequences. DNA microarrays based on the selected probes were used to type 92 clinical enterobacterial isolates. Phenotypic tests confirmed the array-based typing results and corroborate that the selected probes allowed correct typing and prediction of major antibiotic resistances of clinically relevant Enterobacteriaceae, including the subspecies level, e.g. the reliable distinction of different E. coli pathotypes. Our results demonstrate that the global probe selection approach based on longest common factor statistics as well as the design of a DNA microarray with a restricted set of discriminative probes enables robust discrimination of different enterobacterial variants and represents a proof of concept that can be adopted for diagnostics of a wide range of microbial pathogens. Our approach circumvents misclassifications arising from the application of virulence markers, which are highly affected by horizontal gene transfer. Moreover, a broad range of pathogens have been covered by an efficient probe set size enabling the design of high-throughput diagnostics.
NASA Astrophysics Data System (ADS)
Camera, Corrado; Bruggeman, Adriana; Hadjinicolaou, Panos; Pashiardis, Stelios; Lange, Manfred A.
2014-01-01
High-resolution gridded daily data sets are essential for natural resource management and the analyses of climate changes and their effects. This study aims to evaluate the performance of 15 simple or complex interpolation techniques in reproducing daily precipitation at a resolution of 1 km2 over topographically complex areas. Methods are tested considering two different sets of observation densities and different rainfall amounts. We used rainfall data that were recorded at 74 and 145 observational stations, respectively, spread over the 5760 km2 of the Republic of Cyprus, in the Eastern Mediterranean. Regression analyses utilizing geographical copredictors and neighboring interpolation techniques were evaluated both in isolation and combined. Linear multiple regression (LMR) and geographically weighted regression methods (GWR) were tested. These included a step-wise selection of covariables, as well as inverse distance weighting (IDW), kriging, and 3D-thin plate splines (TPS). The relative rank of the different techniques changes with different station density and rainfall amounts. Our results indicate that TPS performs well for low station density and large-scale events and also when coupled with regression models. It performs poorly for high station density. The opposite is observed when using IDW. Simple IDW performs best for local events, while a combination of step-wise GWR and IDW proves to be the best method for large-scale events and high station density. This study indicates that the use of step-wise regression with a variable set of geographic parameters can improve the interpolation of large-scale events because it facilitates the representation of local climate dynamics.
Arbitrator Evaluation and Selection: A Policy Capturing Approach.
1980-09-01
b. M -years S (In-house). 4. Often it is not possible to attach equivalent dollar values to research, although the results of the research may, in...149 M . WILLIAMS-MODIFIED MODEL FULL SEVEN CUE REGRESSION MODEL ... ............ 152 N. COMPUTER PROGRAM TO TEST FOR POLICY DIFFERENCES...idams Management Rights .188* .320 .098 .112 2.939 2.693 C3ontract Language * Past Practice M * ** Fairness ** .195 ** .095 2.982 Effect on the Worker R2
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
Olivera, André Rodrigues; Roesler, Valter; Iochpe, Cirano; Schmidt, Maria Inês; Vigo, Álvaro; Barreto, Sandhi Maria; Duncan, Bruce Bartholow
2017-01-01
Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. The best models were created using artificial neural networks and logistic regression. -These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.
Rostami, Reza; Sadeghi, Vahid; Zarei, Jamileh; Haddadi, Parvaneh; Mohazzab-Torabi, Saman; Salamati, Payman
2013-04-01
The aim of this study was to compare the Persian version of the wechsler intelligence scale for children - fourth edition (WISC-IV) and cognitive assessment system (CAS) tests, to determine the correlation between their scales and to evaluate the probable concurrent validity of these tests in patients with learning disorders. One-hundered-sixty-two children with learning disorder who were presented at Atieh Comprehensive Psychiatry Center were selected in a consecutive non-randomized order. All of the patients were assessed based on WISC-IV and CAS scores questionnaires. Pearson correlation coefficient was used to analyze the correlation between the data and to assess the concurrent validity of the two tests. Linear regression was used for statistical modeling. The type one error was considered 5% in maximum. There was a strong correlation between total score of WISC-IV test and total score of CAS test in the patients (r=0.75, P<0.001). The correlations among the other scales were mostly high and all of them were statistically significant (P<0.001). A linear regression model was obtained (α = 0.51, β = 0.81 and P<0.001). There is an acceptable correlation between the WISC-IV scales and CAS test in children with learning disorders. A concurrent validity is established between the two tests and their scales.
Rostami, Reza; Sadeghi, Vahid; Zarei, Jamileh; Haddadi, Parvaneh; Mohazzab-Torabi, Saman; Salamati, Payman
2013-01-01
Objective The aim of this study was to compare the Persian version of the wechsler intelligence scale for children - fourth edition (WISC-IV) and cognitive assessment system (CAS) tests, to determine the correlation between their scales and to evaluate the probable concurrent validity of these tests in patients with learning disorders. Methods One-hundered-sixty-two children with learning disorder who were presented at Atieh Comprehensive Psychiatry Center were selected in a consecutive non-randomized order. All of the patients were assessed based on WISC-IV and CAS scores questionnaires. Pearson correlation coefficient was used to analyze the correlation between the data and to assess the concurrent validity of the two tests. Linear regression was used for statistical modeling. The type one error was considered 5% in maximum. Findings There was a strong correlation between total score of WISC-IV test and total score of CAS test in the patients (r=0.75, P<0.001). The correlations among the other scales were mostly high and all of them were statistically significant (P<0.001). A linear regression model was obtained (α = 0.51, β = 0.81 and P<0.001). Conclusion There is an acceptable correlation between the WISC-IV scales and CAS test in children with learning disorders. A concurrent validity is established between the two tests and their scales. PMID:23724180
Advanced colorectal neoplasia risk stratification by penalized logistic regression.
Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F
2016-08-01
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.
NASA Technical Reports Server (NTRS)
Colwell, R. N. (Principal Investigator)
1984-01-01
The geometric quality of TM film and digital products is evaluated by making selective photomeasurements and by measuring the coordinates of known features on both the TM products and map products. These paired observations are related using a standard linear least squares regression approach. Using regression equations and coefficients developed from 225 (TM film product) and 20 (TM digital product) control points, map coordinates of test points are predicted. The residual error vectors and analysis of variance (ANOVA) were performed on the east and north residual using nine image segments (blocks) as treatments. Based on the root mean square error of the 223 (TM film product) and 22 (TM digital product) test points, users of TM data expect the planimetric accuracy of mapped points to be within 91 meters and within 117 meters for the film products, and to be within 12 meters and within 14 meters for the digital products.
Cohen, Robert; Bidet, Philippe; Elbez, Annie; Levy, Corinne; Bossuyt, Patrick M.; Chalumeau, Martin
2017-01-01
Background There is controversy whether physicians can rely on signs and symptoms to select children with pharyngitis who should undergo a rapid antigen detection test (RADT) for group A streptococcus (GAS). Our objective was to evaluate the efficiency of signs and symptoms in selectively testing children with pharyngitis. Materials and methods In this multicenter, prospective, cross-sectional study, French primary care physicians collected clinical data and double throat swabs from 676 consecutive children with pharyngitis; the first swab was used for the RADT and the second was used for a throat culture (reference standard). We developed a logistic regression model combining signs and symptoms with GAS as the outcome. We then derived a model-based selective testing strategy, assuming that children with low and high calculated probability of GAS (<0.12 and >0.85) would be managed without the RADT. Main outcomes and measures were performance of the model (c-index and calibration) and efficiency of the model-based strategy (proportion of participants in whom RADT could be avoided). Results Throat culture was positive for GAS in 280 participants (41.4%). Out of 17 candidate signs and symptoms, eight were retained in the prediction model. The model had an optimism-corrected c-index of 0.73; calibration of the model was good. With the model-based strategy, RADT could be avoided in 6.6% of participants (95% confidence interval 4.7% to 8.5%), as compared to a RADT-for-all strategy. Conclusions This study demonstrated that relying on signs and symptoms for selectively testing children with pharyngitis is not efficient. We recommend using a RADT in all children with pharyngitis. PMID:28235012
Cohen, Jérémie F; Cohen, Robert; Bidet, Philippe; Elbez, Annie; Levy, Corinne; Bossuyt, Patrick M; Chalumeau, Martin
2017-01-01
There is controversy whether physicians can rely on signs and symptoms to select children with pharyngitis who should undergo a rapid antigen detection test (RADT) for group A streptococcus (GAS). Our objective was to evaluate the efficiency of signs and symptoms in selectively testing children with pharyngitis. In this multicenter, prospective, cross-sectional study, French primary care physicians collected clinical data and double throat swabs from 676 consecutive children with pharyngitis; the first swab was used for the RADT and the second was used for a throat culture (reference standard). We developed a logistic regression model combining signs and symptoms with GAS as the outcome. We then derived a model-based selective testing strategy, assuming that children with low and high calculated probability of GAS (<0.12 and >0.85) would be managed without the RADT. Main outcomes and measures were performance of the model (c-index and calibration) and efficiency of the model-based strategy (proportion of participants in whom RADT could be avoided). Throat culture was positive for GAS in 280 participants (41.4%). Out of 17 candidate signs and symptoms, eight were retained in the prediction model. The model had an optimism-corrected c-index of 0.73; calibration of the model was good. With the model-based strategy, RADT could be avoided in 6.6% of participants (95% confidence interval 4.7% to 8.5%), as compared to a RADT-for-all strategy. This study demonstrated that relying on signs and symptoms for selectively testing children with pharyngitis is not efficient. We recommend using a RADT in all children with pharyngitis.
The role of multidimensional attentional abilities in academic skills of children with ADHD.
Preston, Andrew S; Heaton, Shelley C; McCann, Sarah J; Watson, William D; Selke, Gregg
2009-01-01
Despite reports of academic difficulties in children with attention-deficit/hyperactivity disorder (ADHD), little is known about the relationship between performance on tests of academic achievement and measures of attention. The current study assessed intellectual ability, parent-reported inattention, academic achievement, and attention in 45 children (ages 7-15) diagnosed with ADHD. Hierarchical regressions were performed with selective, sustained, and attentional control/switching domains of the Test of Everyday Attention for Children as predictor variables and with performance on the Wechsler Individual Achievement Test-Second Edition as dependent variables. It was hypothesized that sustained attention and attentional control/switching would predict performance on achievement tests. Results demonstrate that attentional control/ switching accounted for a significant amount of variance in all academic areas (reading, math, and spelling), even after accounting for verbal IQ and parent-reported inattention. Sustained attention predicted variance only in math, whereas selective attention did not account for variance in any achievement domain. Therefore, attentional control/switching, which involves components of executive functions, plays an important role in academic performance.
Balogun, Anthony Gbenro; Balogun, Shyngle Kolawole; Onyencho, Chidi Victor
2017-02-13
This study investigated the moderating role of achievement motivation in the relationship between test anxiety and academic performance. Three hundred and ninety three participants (192 males and 201 females) selected from a public university in Ondo State, Nigeria using a purposive sampling technique, participated in the study. They responded to measures of test anxiety and achievement motivation. Three hypotheses were tested using moderated hierarchical multiple regression analysis. Results showed that test anxiety had a negative impact on academic performance (β = -.23; p < .05). Achievement motivation had a positive impact on academic performance (β = .38; p < .05). Also, achievement motivation significantly moderated the relationship between test anxiety and academic performance (β = .10; p < .01). These findings suggest that university management should design appropriate psycho-educational interventions that would enhance students' achievement motivation.
Two Paradoxes in Linear Regression Analysis.
Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong
2016-12-25
Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.
Mendak-Ziółko, Magdalena; Konopka, Tomasz; Bogucki, Zdzisław Artur
2012-09-01
The objective of this study was to identify, among an array of potential risk factors for burning mouth syndrome (BMS), those that are potentially the most significant in the development of the disease. Sixty-three participants, divided into group I (with BMS: 33 patients ages 41 to 82 years [mean age: 61.5 ± 9.4]) and group II (without BMS: 30 healthy volunteers ages 42-83 years [mean age: 60.5 ± 10.5]) were studied. All underwent a dental examination and psychological tests. Neurological tests (neurophysiological test, electroneurography, and tests of the autonomic nervous system) were performed. Mean parameters were analyzed by Student t test, Kruskal-Wallis test, and χ(2) test, and multifactor analysis was performed with logistic regression and by calculating the odds ratio. In the logistic regression test, 3 factors were significant in the etiopathogenesis of BMS: a value more than 39 μV for the amplitude of the positive peak of the potential induced by stimulating the trigeminal nerve on the left side (P2-L); a value above 5.96 ms for the latency of wave V of the brainstem auditory evoked potentials on the right side (V-R); and a value over 2.35 ms for the latency of the sensory ulnar nerve response. The BMS sufferer was characterized as having mild sensory and autonomic small fiber neuropathy with concomitant central disorders. Copyright © 2012 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Mekanik, F.; Imteaz, M. A.; Gato-Trinidad, S.; Elmahdi, A.
2013-10-01
In this study, the application of Artificial Neural Networks (ANN) and Multiple regression analysis (MR) to forecast long-term seasonal spring rainfall in Victoria, Australia was investigated using lagged El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) as potential predictors. The use of dual (combined lagged ENSO-IOD) input sets for calibrating and validating ANN and MR Models is proposed to investigate the simultaneous effect of past values of these two major climate modes on long-term spring rainfall prediction. The MR models that did not violate the limits of statistical significance and multicollinearity were selected for future spring rainfall forecast. The ANN was developed in the form of multilayer perceptron using Levenberg-Marquardt algorithm. Both MR and ANN modelling were assessed statistically using mean square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott index of agreement (d). The developed MR and ANN models were tested on out-of-sample test sets; the MR models showed very poor generalisation ability for east Victoria with correlation coefficients of -0.99 to -0.90 compared to ANN with correlation coefficients of 0.42-0.93; ANN models also showed better generalisation ability for central and west Victoria with correlation coefficients of 0.68-0.85 and 0.58-0.97 respectively. The ability of multiple regression models to forecast out-of-sample sets is compatible with ANN for Daylesford in central Victoria and Kaniva in west Victoria (r = 0.92 and 0.67 respectively). The errors of the testing sets for ANN models are generally lower compared to multiple regression models. The statistical analysis suggest the potential of ANN over MR models for rainfall forecasting using large scale climate modes.
Wang, LiQiang; Li, CuiFeng
2014-10-01
A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.
Caffarra, Paolo; Ghetti, Caterina; Ruffini, Livia; Spallazzi, Marco; Spotti, Annamaria; Barocco, Federica; Guzzo, Caterina; Marchi, Massimo; Gardini, Simona
2016-01-01
Free and Cued Selective Reminding Test (FCSRT) measures immediate and delayed episodic memory and cueing sensitivity and is suitable to detect prodromal Alzheimer's disease (AD). The present study aimed at investigating the segregation effect of FCSRT scores on brain metabolism of memory-related structures, usually affected by AD pathology, in the Mild Cognitive Impairment (MCI) stage. A cohort of forty-eight MCI patients underwent FCSRT and 18F-FDG-PET. Multiple regression analysis showed that Immediate Free Recall correlated with brain metabolism in the bilateral anterior cingulate and delayed free recall with the left anterior cingulate and medial frontal gyrus, whereas semantic cueing sensitivity with the left posterior cingulate. FCSRT in MCI is associated with neuro-functional activity of specific regions of memory-related structures connected to hippocampal formation, such as the cingulate cortex, usually damaged in AD.
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
Chun, Hyonho; Keleş, Sündüz
2010-01-01
Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611
Fuchs, P C; Barry, A L; Thornsberry, C; Gavan, T L; Jones, R N
1983-01-01
Augmentin (Beecham Laboratories, Bristol, Tenn.), a combination drug consisting of two parts amoxicillin to one part clavulanic acid and a potent beta-lactamase inhibitor, was evaluated in vitro in comparison with ampicillin or amoxicillin or both for its inhibitory and bactericidal activities against selected clinical isolates. Regression analysis was performed and tentative disk diffusion susceptibility breakpoints were determined. A multicenter performance study of the disk diffusion test was conducted with three quality control organisms to determine tentative quality control limits. All methicillin-susceptible staphylococci and Haemophilus influenzae isolates were susceptible to Augmentin, although the minimal inhibitory concentrations for beta-lactamase-producing strains of both groups were, on the average, fourfold higher than those for enzyme-negative strains. Among the Enterobacteriaceae, Augmentin exhibited significantly greater activity than did ampicillin against Klebsiella pneumoniae, Citrobacter diversus, Proteus vulgaris, and about one-third of the Escherichia coli strains tested. Bactericidal activity usually occurred at the minimal inhibitory concentration. There was a slight inoculum concentration effect on the Augmentin minimal inhibitory concentrations. On the basis of regression and error rate-bounded analyses, the suggested interpretive disk diffusion susceptibility breakpoints for Augmentin are: susceptible, greater than or equal to 18 mm; resistant, less than or equal to 13 mm (gram-negative bacilli); and susceptible, greater than or equal to 20 mm (staphylococci and H. influenzae). The use of a beta-lactamase-producing organism, such as E. coli Beecham 1532, is recommended for quality assurance of Augmentin susceptibility testing. PMID:6625554
Lee, SeokHyun; Cho, KwangHyun; Park, MiNa; Choi, TaeJung; Kim, SiDong; Do, ChangHee
2016-01-01
This study was conducted to estimate the genetic parameters of β-hydroxybutyrate (BHBA) and acetone concentration in milk by Fourier transform infrared spectroscopy along with test-day milk production traits including fat %, protein % and milk yield based on monthly samples of milk obtained as part of a routine milk recording program in Korea. Additionally, the feasibility of using such data in the official dairy cattle breeding system for selection of cows with low susceptibility of ketosis was evaluated. A total of 57,190 monthly test-day records for parities 1, 2, and 3 of 7,895 cows with pedigree information were collected from April 2012 to August 2014 from herds enrolled in the Korea Animal Improvement Association. Multi-trait random regression models were separately applied to estimate genetic parameters of test-day records for each parity. The model included fixed herd test-day effects, calving age and season effects, and random regressions for additive genetic and permanent environmental effects. Abundance of variation of acetone may provide a more sensitive indication of ketosis than many zero observations in concentration of milk BHBA. Heritabilities of milk BHBA levels ranged from 0.04 to 0.17 with a mean of 0.09 for the interval between 4 and 305 days in milk during three lactations. The average heritabilities for milk acetone concentration were 0.29, 0.29, and 0.22 for parities 1, 2, and 3, respectively. There was no clear genetic association of the concentration of two ketone bodies with three test-day milk production traits, even if some correlations among breeding values of the test-day records in this study were observed. These results suggest that genetic selection for low susceptibility of ketosis in early lactation is possible. Further, it is desirable for the breeding scheme of dairy cattle to include the records of milk acetone rather than the records of milk BHBA. PMID:27608643
Lee, SeokHyun; Cho, KwangHyun; Park, MiNa; Choi, TaeJung; Kim, SiDong; Do, ChangHee
2016-11-01
This study was conducted to estimate the genetic parameters of β-hydroxybutyrate (BHBA) and acetone concentration in milk by Fourier transform infrared spectroscopy along with test-day milk production traits including fat %, protein % and milk yield based on monthly samples of milk obtained as part of a routine milk recording program in Korea. Additionally, the feasibility of using such data in the official dairy cattle breeding system for selection of cows with low susceptibility of ketosis was evaluated. A total of 57,190 monthly test-day records for parities 1, 2, and 3 of 7,895 cows with pedigree information were collected from April 2012 to August 2014 from herds enrolled in the Korea Animal Improvement Association. Multi-trait random regression models were separately applied to estimate genetic parameters of test-day records for each parity. The model included fixed herd test-day effects, calving age and season effects, and random regressions for additive genetic and permanent environmental effects. Abundance of variation of acetone may provide a more sensitive indication of ketosis than many zero observations in concentration of milk BHBA. Heritabilities of milk BHBA levels ranged from 0.04 to 0.17 with a mean of 0.09 for the interval between 4 and 305 days in milk during three lactations. The average heritabilities for milk acetone concentration were 0.29, 0.29, and 0.22 for parities 1, 2, and 3, respectively. There was no clear genetic association of the concentration of two ketone bodies with three test-day milk production traits, even if some correlations among breeding values of the test-day records in this study were observed. These results suggest that genetic selection for low susceptibility of ketosis in early lactation is possible. Further, it is desirable for the breeding scheme of dairy cattle to include the records of milk acetone rather than the records of milk BHBA.
NASA Astrophysics Data System (ADS)
Mohd. Rijal, Omar; Mohd. Noor, Norliza; Teng, Shee Lee
A statistical method of comparing two digital chest radiographs for Pulmonary Tuberculosis (PTB) patients has been proposed. After applying appropriate image registration procedures, a selected subset of each image is converted to an image histogram (or box plot). Comparing two chest X-ray images is equivalent to the direct comparison of the two corresponding histograms. From each histogram, eleven percentiles (of image intensity) are calculated. The number of percentiles that shift to the left (NLSP) when second image is compared to the first has been shown to be an indicator of patients` progress. In this study, the values of NLSP is to be compared with the actual diagnosis (Y) of several medical practitioners. A logistic regression model is used to study the relationship between NLSP and Y. This study showed that NLSP may be used as an alternative or second opinion for Y. The proposed regression model also show that important explanatory variables such as outcomes of sputum test (Z) and degree of image registration (W) may be omitted when estimating Y-values.
Zhang, Yan; Zou, Hong-Yan; Shi, Pei; Yang, Qin; Tang, Li-Juan; Jiang, Jian-Hui; Wu, Hai-Long; Yu, Ru-Qin
2016-01-01
Determination of benzo[a]pyrene (BaP) in cigarette smoke can be very important for the tobacco quality control and the assessment of its harm to human health. In this study, mid-infrared spectroscopy (MIR) coupled to chemometric algorithm (DPSO-WPT-PLS), which was based on the wavelet packet transform (WPT), discrete particle swarm optimization algorithm (DPSO) and partial least squares regression (PLS), was used to quantify harmful ingredient benzo[a]pyrene in the cigarette mainstream smoke with promising result. Furthermore, the proposed method provided better performance compared to several other chemometric models, i.e., PLS, radial basis function-based PLS (RBF-PLS), PLS with stepwise regression variable selection (Stepwise-PLS) as well as WPT-PLS with informative wavelet coefficients selected by correlation coefficient test (rtest-WPT-PLS). It can be expected that the proposed strategy could become a new effective, rapid quantitative analysis technique in analyzing the harmful ingredient BaP in cigarette mainstream smoke. Copyright © 2015 Elsevier B.V. All rights reserved.
Brenn, T; Arnesen, E
1985-01-01
For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.
Fan, Wenzhe; Zhang, Yu; Carr, Peter W; Rutan, Sarah C; Dumarey, Melanie; Schellinger, Adam P; Pritts, Wayne
2009-09-18
Fourteen judiciously selected reversed phase columns were tested with 18 cationic drug solutes under the isocratic elution conditions advised in the Snyder-Dolan (S-D) hydrophobic subtraction method of column classification. The standard errors (S.E.) of the least squares regressions of logk' vs. logk'(REF) were obtained for a given column against a reference column and used to compare and classify columns based on their selectivity. The results are consistent with those obtained with a study of the 16 test solutes recommended by Snyder and Dolan. To the extent these drugs are representative, these results show that the S-D classification scheme is also generally applicable to pharmaceuticals under isocratic conditions. That is, those columns judged to be similar based on the 16 S-D solutes were similar based on the 18 drugs; furthermore those columns judged to have significantly different selectivities based on the 16 S-D probes appeared to be quite different for the drugs as well. Given that the S-D method has been used to classify more than 400 different types of reversed phases the extension to cationic drugs is a significant finding.
Williams-Sether, Tara
2004-01-01
The Dakota Water Resources Act, passed by the U.S. Congress on December 15, 2000, authorized the Secretary of the Interior to conduct a comprehensive study of future water-quantity and quality needs of the Red River of the North Basin in North Dakota and possible options to meet those water needs. Previous Red River of the North Basin studies conducted by the Bureau of Reclamation used streamflow and water-quality data bases developed by the U.S. Geological Survey that included data for 1931-84. As a result of the recent congressional authorization and results of previous studies by the Bureau of Reclamation, redevelopment of the streamflow and water-quality data bases with current data through 1999 are needed in order to evaluate and predict the water-quantity and quality effects within the Red River of the North Basin. This report provides updated statistical summaries of selected water-quality constituents and streamflow and the regression relations between them. Available data for 1931-99 were used to develop regression equations between 5 selected water-quality constituents and streamflow for 38 gaging stations in the Red River of the North Basin. The water-quality constituents that were regressed against streamflow were hardness (as CaCO3), sodium, chloride, sulfate, and dissolved solids. Statistical summaries of the selected water-quality constituents and streamflow for the gaging stations used in the regression equations development and the applications and limitations of the regression equations are presented in this report.
Two Paradoxes in Linear Regression Analysis
FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong
2016-01-01
Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
[Analysis on willingness to pay for HIV antibody saliva rapid test and related factors].
Li, Junjie; Huo, Junli; Cui, Wenqing; Zhang, Xiujie; Hu, Yi; Su, Xingfang; Zhang, Wanyue; Li, Youfang; Shi, Yuhua; Jia, Manhong
2015-02-01
To understand the willingness to pay for HIV antibody saliva rapid test and its influential factors among people seeking counsel and HIV test, STD clinic patients, university students, migrant people, female sex workers (FSWs), men who have sex with men (MSM) and injecting drug users (IDUs). An anonymous questionnaire survey was conducted among 511 subjects in the 7 groups selected by different sampling methods, and 509 valid questionnaires were collected. The majority of subjects were males (54.8%) and aged 20-29 years (41.5%). Among the subjects, 60.3% had education level of high school or above, 55.4% were unmarried, 37.3% were unemployed, 73.3% had monthly expenditure <2 000 Yuan RMB, 44.2% had received HIV test, 28.3% knew HIV saliva test, 21.0% were willing to receive HIV saliva test, 2.0% had received HIV saliva test, only 1.0% had bought HIV test kit for self-test, and 84.1% were willing to pay for HIV antibody saliva rapid test. Univariate logistic regression analysis indicated that subject group, age, education level, employment status, monthly expenditure level, HIV test experience and willingness to receive HIV saliva test were correlated statistically with willingness to pay for HIV antibody saliva rapid test. Multivariate logistic regression analysis showed that subject group and monthly expenditure level were statistically correlated with willingness to pay for HIV antibody saliva rapid test. The willingness to pay for HIV antibody saliva rapid test and acceptable price of HIV antibody saliva rapid test varied in different areas and populations. Different populations may have different willingness to pay for HIV antibody saliva rapid test;the affordability of the test could influence the willingness to pay for the test.
An interactive website for analytical method comparison and bias estimation.
Bahar, Burak; Tuncel, Ayse F; Holmes, Earle W; Holmes, Daniel T
2017-12-01
Regulatory standards mandate laboratories to perform studies to ensure accuracy and reliability of their test results. Method comparison and bias estimation are important components of these studies. We developed an interactive website for evaluating the relative performance of two analytical methods using R programming language tools. The website can be accessed at https://bahar.shinyapps.io/method_compare/. The site has an easy-to-use interface that allows both copy-pasting and manual entry of data. It also allows selection of a regression model and creation of regression and difference plots. Available regression models include Ordinary Least Squares, Weighted-Ordinary Least Squares, Deming, Weighted-Deming, Passing-Bablok and Passing-Bablok for large datasets. The server processes the data and generates downloadable reports in PDF or HTML format. Our website provides clinical laboratories a practical way to assess the relative performance of two analytical methods. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Li, Zhenghua; Cheng, Fansheng; Xia, Zhining
2011-01-01
The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.
A Feature-Free 30-Disease Pathological Brain Detection System by Linear Regression Classifier.
Chen, Yi; Shao, Ying; Yan, Jie; Yuan, Ti-Fei; Qu, Yanwen; Lee, Elizabeth; Wang, Shuihua
2017-01-01
Alzheimer's disease patients are increasing rapidly every year. Scholars tend to use computer vision methods to develop automatic diagnosis system. (Background) In 2015, Gorji et al. proposed a novel method using pseudo Zernike moment. They tested four classifiers: learning vector quantization neural network, pattern recognition neural network trained by Levenberg-Marquardt, by resilient backpropagation, and by scaled conjugate gradient. This study presents an improved method by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Therefore, it can be used to detect Alzheimer's disease. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
Threshold regression to accommodate a censored covariate.
Qian, Jing; Chiou, Sy Han; Maye, Jacqueline E; Atem, Folefac; Johnson, Keith A; Betensky, Rebecca A
2018-06-22
In several common study designs, regression modeling is complicated by the presence of censored covariates. Examples of such covariates include maternal age of onset of dementia that may be right censored in an Alzheimer's amyloid imaging study of healthy subjects, metabolite measurements that are subject to limit of detection censoring in a case-control study of cardiovascular disease, and progressive biomarkers whose baseline values are of interest, but are measured post-baseline in longitudinal neuropsychological studies of Alzheimer's disease. We propose threshold regression approaches for linear regression models with a covariate that is subject to random censoring. Threshold regression methods allow for immediate testing of the significance of the effect of a censored covariate. In addition, they provide for unbiased estimation of the regression coefficient of the censored covariate. We derive the asymptotic properties of the resulting estimators under mild regularity conditions. Simulations demonstrate that the proposed estimators have good finite-sample performance, and often offer improved efficiency over existing methods. We also derive a principled method for selection of the threshold. We illustrate the approach in application to an Alzheimer's disease study that investigated brain amyloid levels in older individuals, as measured through positron emission tomography scans, as a function of maternal age of dementia onset, with adjustment for other covariates. We have developed an R package, censCov, for implementation of our method, available at CRAN. © 2018, The International Biometric Society.
Nutrition Report Cards: An Opportunity to Improve School Lunch Selection
Wansink, Brian; Just, David R.; Patterson, Richard W.; Smith, Laura E.
2013-01-01
Objective To explore the feasibility and implementation efficiency of Nutritional Report Cards(NRCs) in helping children make healthier food choices at school. Methods Pilot testing was conducted in a rural New York school district (K-12). Over a five-week period, 27 parents received a weekly e-mail containing a NRC listing how many meal components (fruits, vegetables, starches, milk), snacks, and a-la-carte foods their child selected. We analyzed choices of students in the NRC group vs. the control group, both prior to and during the intervention period. Point-of-sale system data for a-la-carte items was analyzed using Generalized Least Squares regressions with clustered standard errors. Results NRCs encouraged more home conversations about nutrition and more awareness of food selections. Despite the small sample, the NRC was associated with reduced selection of some items, such as the percentage of those selecting cookies which decreased from 14.3 to 6.5 percent. Additionally, despite requiring new keys on the check-out registers to generate the NRC, checkout times increased by only 0.16 seconds per transaction, and compiling and sending the NRCs required a total weekly investment of 30 minutes of staff time. Conclusions This test of concept suggests that NRCs are a feasible and inexpensive tool to guide children towards healthier choices. PMID:24098324
Nutrition Report Cards: an opportunity to improve school lunch selection.
Wansink, Brian; Just, David R; Patterson, Richard W; Smith, Laura E
2013-01-01
To explore the feasibility and implementation efficiency of Nutritional Report Cards (NRCs) in helping children make healthier food choices at school. Pilot testing was conducted in a rural New York school district (K-12). Over a five-week period, 27 parents received a weekly e-mail containing a NRC listing how many meal components (fruits, vegetables, starches, milk), snacks, and a-la-carte foods their child selected. We analyzed choices of students in the NRC group vs. the control group, both prior to and during the intervention period. Point-of-sale system data for a-la-carte items was analyzed using Generalized Least Squares regressions with clustered standard errors. NRCs encouraged more home conversations about nutrition and more awareness of food selections. Despite the small sample, the NRC was associated with reduced selection of some items, such as the percentage of those selecting cookies which decreased from 14.3 to 6.5 percent. Additionally, despite requiring new keys on the check-out registers to generate the NRC, checkout times increased by only 0.16 seconds per transaction, and compiling and sending the NRCs required a total weekly investment of 30 minutes of staff time. This test of concept suggests that NRCs are a feasible and inexpensive tool to guide children towards healthier choices.
Cross-validation pitfalls when selecting and assessing regression and classification models.
Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon
2014-03-29
We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.
Auditory dysfunction associated with solvent exposure
2013-01-01
Background A number of studies have demonstrated that solvents may induce auditory dysfunction. However, there is still little knowledge regarding the main signs and symptoms of solvent-induced hearing loss (SIHL). The aim of this research was to investigate the association between solvent exposure and adverse effects on peripheral and central auditory functioning with a comprehensive audiological test battery. Methods Seventy-two solvent-exposed workers and 72 non-exposed workers were selected to participate in the study. The test battery comprised pure-tone audiometry (PTA), transient evoked otoacoustic emissions (TEOAE), Random Gap Detection (RGD) and Hearing-in-Noise test (HINT). Results Solvent-exposed subjects presented with poorer mean test results than non-exposed subjects. A bivariate and multivariate linear regression model analysis was performed. One model for each auditory outcome (PTA, TEOAE, RGD and HINT) was independently constructed. For all of the models solvent exposure was significantly associated with the auditory outcome. Age also appeared significantly associated with some auditory outcomes. Conclusions This study provides further evidence of the possible adverse effect of solvents on the peripheral and central auditory functioning. A discussion of these effects and the utility of selected hearing tests to assess SIHL is addressed. PMID:23324255
Palomo, R; Casals-Coll, M; Sánchez-Benavides, G; Quintana, M; Manero, R M; Rognoni, T; Calvo, L; Aranciva, F; Tamayo, F; Peña-Casanova, J
2013-05-01
The Rey-Osterrieth Complex Figure (ROCF) and the Free and Cued Selective Reminding Test (FCSRT) are widely used in clinical practice. The ROCF assesses visual perception, constructional praxis, and visuo-spatial memory. The FCSRT assesses verbal learning and memory. In this study, as part of the Spanish normative studies project in young adults (NEURONORMA young adults), we present age- and education-adjusted normative data for both tests obtained by using linear regression techniques. The sample consisted of 179 healthy participants ranging in age from 18 to 49 years. We provide tables for converting raw scores to scaled scores in addition to tables with scores adjusted by socio-demographic factors. The results showed that education affects scores for some of the memory tests and the figure-copying task. Age was only found to have an effect on the performance of visuo-spatial memory tests, and the effect of sex was negligible. The normative data obtained will be extremely useful in the clinical neuropsychological evaluation of young Spanish adults. Copyright © 2011 Sociedad Española de Neurología. Published by Elsevier Espana. All rights reserved.
Seasonal mean pressure reconstruction for the North Atlantic (1750 1850) based on early marine data
NASA Astrophysics Data System (ADS)
Gallego, D.; Garcia-Herrera, R.; Ribera, P.; Jones, P. D.
2005-12-01
Measurements of wind strength and direction abstracted from European ships' logbooks during the recently finished CLIWOC project have been used to produce the first gridded Sea Level Pressure (SLP) reconstruction for the 1750-1850 period over the North Atlantic based solely on marine data. The reconstruction is based on a spatial regression analysis calibrated by using data taken from the ICOADS database. An objective methodology has been developed to select the optimal calibration period and spatial domain of the reconstruction by testing several thousands of possible models. The finally selected area, limited by the performance of the regression equations and by the availability of data, covers the region between 28° N and 52° N close to the European coast and between 28° N and 44° N in the open Ocean. The results provide a direct measure of the strength and extension of the Azores High during the 101 years of the study period. The comparison with the recent land-based SLP reconstruction by Luterbacher et al. (2002) indicates the presence of a common signal. The interannual variability of the CLIWOC reconstructions is rather high due to the current scarcity of abstracted wind data in the areas with best response in the regression. Guidelines are proposed to optimize the efficiency of future abstraction work.
Seasonal mean pressure reconstruction for the North Atlantic (1750 1850) based on early marine data
NASA Astrophysics Data System (ADS)
Gallego, D.; Garcia-Herrera, R.; Ribera, P.; Jones, P. D.
2005-08-01
Measures of wind strength and direction abstracted from European ships' logbooks during the recently finished CLIWOC project have been used to produce the first gridded Sea Level Pressure (SLP) reconstruction for the 1750-1850 period over the North Atlantic based solely on marine data. The reconstruction is based on a spatial regression analysis calibrated by using data taken from the ICOADS database. An objective methodology has been developed to select the optimal calibration period and spatial domain of the reconstruction by testing several thousands of possible models. The finally selected area, limited by the performance of the regression equations and by the availability of data, covers the region between 28°N and 52°N close to the European coast and between 28°N and 44°N in the open Ocean. The results provide a direct measure of the strength and extension of the Azores High during the 101 years of the study period. The comparison with the recent land-based SLP reconstruction by Luterbacher et al. (2002) indicates the presence of a common signal. The interannual variability of the CLIWOC reconstructions is rather high due to the current scarcity of abstracted wind data in the areas with best response in the regression. Guidelines are proposed to optimize the efficiency of future abstraction work.
Meta-Regression Approximations to Reduce Publication Selection Bias
ERIC Educational Resources Information Center
Stanley, T. D.; Doucouliagos, Hristos
2014-01-01
Publication selection bias is a serious challenge to the integrity of all empirical sciences. We derive meta-regression approximations to reduce this bias. Our approach employs Taylor polynomial approximations to the conditional mean of a truncated distribution. A quadratic approximation without a linear term, precision-effect estimate with…
Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto
2013-09-01
In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Combining Relevance Vector Machines and exponential regression for bearing residual life estimation
NASA Astrophysics Data System (ADS)
Di Maio, Francesco; Tsui, Kwok Leung; Zio, Enrico
2012-08-01
In this paper we present a new procedure for estimating the bearing Residual Useful Life (RUL) by combining data-driven and model-based techniques. Respectively, we resort to (i) Relevance Vector Machines (RVMs) for selecting a low number of significant basis functions, called Relevant Vectors (RVs), and (ii) exponential regression to compute and continuously update residual life estimations. The combination of these techniques is developed with reference to partially degraded thrust ball bearings and tested on real world vibration-based degradation data. On the case study considered, the proposed procedure outperforms other model-based methods, with the added value of an adequate representation of the uncertainty associated to the estimates of the quantification of the credibility of the results by the Prognostic Horizon (PH) metric.
NASA Technical Reports Server (NTRS)
Maahs, H. G.
1972-01-01
Eighteen material properties were measured on 45 different, commercially available, artificial graphites. Ablation performance of these same graphites were also measured in a Mach 2 airstream at a stagnation pressure of 5.6 atm. Correlations were developed, where possible, between pairs of the material properties. Multiple regression equations were then formulated relating ablation performance to the various material properties, thus identifying those material properties having the strongest effect on ablation performance. These regression equations reveal that ablation performance in the present test environment depends primarily on maximum grain size, density, ash content, thermal conductivity, and mean pore radius. For optimization of ablation performance, grain size should be small, ash content low, density and thermal conductivity high, and mean pore radius large.
Pereira, R J; Bignardi, A B; El Faro, L; Verneque, R S; Vercesi Filho, A E; Albuquerque, L G
2013-01-01
Studies investigating the use of random regression models for genetic evaluation of milk production in Zebu cattle are scarce. In this study, 59,744 test-day milk yield records from 7,810 first lactations of purebred dairy Gyr (Bos indicus) and crossbred (dairy Gyr × Holstein) cows were used to compare random regression models in which additive genetic and permanent environmental effects were modeled using orthogonal Legendre polynomials or linear spline functions. Residual variances were modeled considering 1, 5, or 10 classes of days in milk. Five classes fitted the changes in residual variances over the lactation adequately and were used for model comparison. The model that fitted linear spline functions with 6 knots provided the lowest sum of residual variances across lactation. On the other hand, according to the deviance information criterion (DIC) and bayesian information criterion (BIC), a model using third-order and fourth-order Legendre polynomials for additive genetic and permanent environmental effects, respectively, provided the best fit. However, the high rank correlation (0.998) between this model and that applying third-order Legendre polynomials for additive genetic and permanent environmental effects, indicates that, in practice, the same bulls would be selected by both models. The last model, which is less parameterized, is a parsimonious option for fitting dairy Gyr breed test-day milk yield records. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Covariate Selection for Multilevel Models with Missing Data
Marino, Miguel; Buxton, Orfeu M.; Li, Yi
2017-01-01
Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457
Factors Affecting the Selection of Patients on Waiting List: A Single Center Study.
Can, Ö; Kasapoğlu, U; Boynueğri, B; Tuğcu, M; Çağlar Ruhi, B; Canbakan, M; Murat Gökçe, A; Ata, P; İzzet Titiz, M; Apaydın, S
2015-06-01
There is an increasing gap between organ supply and demand for cadaveric transplantation in our country. Our aim was to evaluate factors affecting selection of patients on waiting list at our hospital. Patients who have been waiting on list and who were transplanted were compared in order to find factors, which affected the selection of patients. Non-parametric Mann-Whitney U test was used for comparison and cox regression analysis was used to find the risk factors that decrease the probability of transplantation in this retrospective case-control study. Patients in the transplanted group were significantly younger, had relatively lower body mass index than the awaiting group. Cardiovascular diseases were more in the awaiting group than the transplanted group. There was no patient with diabetes in transplanted group, despite fifteen diabetic patients were in the awaiting group. Selected patients had lower immunologic risk with regard to peak panel reactive antibody levels. No significant difference was found for gender, hypertension, hyperlipidemia, viral serology, time spent on dialysis and on waiting list between two groups. With cox regression analysis female gender, older age, diabetes mellitus, high body mass index, positive hepatitis B serology and high levels of peak class 1-2 peak panel reactive antibody positivity were found as risk factors that decrease the probability of transplantation. A tendency for selection of low risk patients was found with this study. Time and energy consuming complications and short allograft survival after transplantation in high risk patients and the scarcity of cadaveric pool in our country may contribute to this tendency. Copyright © 2015 Elsevier Inc. All rights reserved.
Kinetic rate constant prediction supports the conformational selection mechanism of protein binding.
Moal, Iain H; Bates, Paul A
2012-01-01
The prediction of protein-protein kinetic rate constants provides a fundamental test of our understanding of molecular recognition, and will play an important role in the modeling of complex biological systems. In this paper, a feature selection and regression algorithm is applied to mine a large set of molecular descriptors and construct simple models for association and dissociation rate constants using empirical data. Using separate test data for validation, the predicted rate constants can be combined to calculate binding affinity with accuracy matching that of state of the art empirical free energy functions. The models show that the rate of association is linearly related to the proportion of unbound proteins in the bound conformational ensemble relative to the unbound conformational ensemble, indicating that the binding partners must adopt a geometry near to that of the bound prior to binding. Mirroring the conformational selection and population shift mechanism of protein binding, the models provide a strong separate line of evidence for the preponderance of this mechanism in protein-protein binding, complementing structural and theoretical studies.
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Statistical validation of normal tissue complication probability models.
Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis
2012-09-01
To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.
[Evaluation of using statistical methods in selected national medical journals].
Sych, Z
1996-01-01
The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.
Cider fermentation process monitoring by Vis-NIR sensor system and chemometrics.
Villar, Alberto; Vadillo, Julen; Santos, Jose I; Gorritxategi, Eneko; Mabe, Jon; Arnaiz, Aitor; Fernández, Luis A
2017-04-15
Optimization of a multivariate calibration process has been undertaken for a Visible-Near Infrared (400-1100nm) sensor system, applied in the monitoring of the fermentation process of the cider produced in the Basque Country (Spain). The main parameters that were monitored included alcoholic proof, l-lactic acid content, glucose+fructose and acetic acid content. The multivariate calibration was carried out using a combination of different variable selection techniques and the most suitable pre-processing strategies were selected based on the spectra characteristics obtained by the sensor system. The variable selection techniques studied in this work include Martens Uncertainty test, interval Partial Least Square Regression (iPLS) and Genetic Algorithm (GA). This procedure arises from the need to improve the calibration models prediction ability for cider monitoring. Copyright © 2016 Elsevier Ltd. All rights reserved.
Guisande, Cástor; Vari, Richard P; Heine, Jürgen; García-Roselló, Emilio; González-Dacosta, Jacinto; Perez-Schofield, Baltasar J García; González-Vilas, Luis; Pelayo-Villamil, Patricia
2016-09-12
We present and discuss VARSEDIG, an algorithm which identifies the morphometric features that significantly discriminate two taxa and validates the morphological distinctness between them via a Monte-Carlo test. VARSEDIG is freely available as a function of the RWizard application PlotsR (http://www.ipez.es/RWizard) and as R package on CRAN. The variables selected by VARSEDIG with the overlap method were very similar to those selected by logistic regression and discriminant analysis, but overcomes some shortcomings of these methods. VARSEDIG is, therefore, a good alternative by comparison to current classical classification methods for identifying morphometric features that significantly discriminate a taxon and for validating its morphological distinctness from other taxa. As a demonstration of the potential of VARSEDIG for this purpose, we analyze morphological discrimination among some species of the Neotropical freshwater family Characidae.
NASA Astrophysics Data System (ADS)
Belciug, Smaranda; Serbanescu, Mircea-Sebastian
2015-09-01
Feature selection is considered a key factor in classifications/decision problems. It is currently used in designing intelligent decision systems to choose the best features which allow the best performance. This paper proposes a regression-based approach to select the most important predictors to significantly increase the classification performance. Application to breast cancer detection and recurrence using publically available datasets proved the efficiency of this technique.
Eash, David A.; Barnes, Kimberlee K.; O'Shea, Padraic S.
2016-09-19
A statewide study was led to develop regression equations for estimating three selected spring and three selected fall low-flow frequency statistics for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include spring (April through June) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years and fall (October through December) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years. Estimates of the three selected spring statistics are provided for 241 U.S. Geological Survey continuous-record streamgages, and estimates of the three selected fall statistics are provided for 238 of these streamgages, using data through June 2014. Because only 9 years of fall streamflow record were available, three streamgages included in the development of the spring regression equations were not included in the development of the fall regression equations. Because of regulation, diversion, or urbanization, 30 of the 241 streamgages were not included in the development of the regression equations. The study area includes Iowa and adjacent areas within 50 miles of the Iowa border. Because trend analyses indicated statistically significant positive trends when considering the period of record for most of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. Geographic information system software was used to measure 63 selected basin characteristics for each of the 211streamgages used to develop the regional regression equations. The study area was divided into three low-flow regions that were defined in a previous study for the development of regional regression equations.Because several streamgages included in the development of regional regression equations have estimates of zero flow calculated from observed streamflow for selected spring and fall low-flow frequency statistics, the final equations for the three low-flow regions were developed using two types of regression analyses—left-censored and generalized-least-squares regression analyses. A total of 211 streamgages were included in the development of nine spring regression equations—three equations for each of the three low-flow regions. A total of 208 streamgages were included in the development of nine fall regression equations—three equations for each of the three low-flow regions. A censoring threshold was used to develop 15 left-censored regression equations to estimate the three fall low-flow frequency statistics for each of the three low-flow regions and to estimate the three spring low-flow frequency statistics for the southern and northwest regions. For the northeast region, generalized-least-squares regression was used to develop three equations to estimate the three spring low-flow frequency statistics. For the northeast region, average standard errors of prediction range from 32.4 to 48.4 percent for the spring equations and average standard errors of estimate range from 56.4 to 73.8 percent for the fall equations. For the northwest region, average standard errors of estimate range from 58.9 to 62.1 percent for the spring equations and from 83.2 to 109.4 percent for the fall equations. For the southern region, average standard errors of estimate range from 43.2 to 64.0 percent for the spring equations and from 78.1 to 78.7 percent for the fall equations.The regression equations are applicable only to stream sites in Iowa with low flows not substantially affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. The regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system application. StreamStats allows users to click on any ungaged stream site and compute estimates of the six selected spring and fall low-flow statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged site are provided. StreamStats also allows users to click on any Iowa streamgage to obtain computed estimates for the six selected spring and fall low-flow statistics.
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
ERIC Educational Resources Information Center
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Predicting Treatment Response in Social Anxiety Disorder From Functional Magnetic Resonance Imaging
Doehrmann, Oliver; Ghosh, Satrajit S.; Polli, Frida E.; Reynolds, Gretchen O.; Horn, Franziska; Keshavan, Anisha; Triantafyllou, Christina; Saygin, Zeynep M.; Whitfield-Gabrieli, Susan; Hofmann, Stefan G.; Pollack, Mark; Gabrieli, John D.
2013-01-01
Context Current behavioral measures poorly predict treatment outcome in social anxiety disorder (SAD). To our knowledge, this is the first study to examine neuroimaging-based treatment prediction in SAD. Objective To measure brain activation in patients with SAD as a biomarker to predict subsequent response to cognitive behavioral therapy (CBT). Design Functional magnetic resonance imaging (fMRI) data were collected prior to CBT intervention. Changes in clinical status were regressed on brain responses and tested for selectivity for social stimuli. Setting Patients were treated with protocol-based CBT at anxiety disorder programs at Boston University or Massachusetts General Hospital and underwent neuroimaging data collection at Massachusetts Institute of Technology. Patients Thirty-nine medication-free patients meeting DSM-IV criteria for the generalized subtype of SAD. Interventions Brain responses to angry vs neutral faces or emotional vs neutral scenes were examined with fMRI prior to initiation of CBT. Main Outcome Measures Whole-brain regression analyses with differential fMRI responses for angry vs neutral faces and changes in Liebowitz Social Anxiety Scale score as the treatment outcome measure. Results Pretreatment responses significantly predicted subsequent treatment outcome of patients selectively for social stimuli and particularly in regions of higher-order visual cortex. Combining the brain measures with information on clinical severity accounted for more than 40% of the variance in treatment response and substantially exceeded predictions based on clinical measures at baseline. Prediction success was unaffected by testing for potential confounding factors such as depression severity at baseline. Conclusions The results suggest that brain imaging can provide biomarkers that substantially improve predictions for the success of cognitive behavioral interventions and more generally suggest that such biomarkers may offer evidence-based, personalized medicine approaches for optimally selecting among treatment options for a patient. PMID:22945462
Effect of xylitol versus sorbitol: a quantitative systematic review of clinical trials.
Mickenautsch, Steffen; Yengopal, Veerasamy
2012-08-01
This study aimed to appraise, within the context of tooth caries, the current clinical evidence and its risk for bias regarding the effects of xylitol in comparison with sorbitol. Databases were searched for clinical trials to 19 March 2011. Inclusion criteria required studies to: test a caries-related primary outcome; compare the effects of xylitol with those of sorbitol; describe a clinical trial with two or more arms, and utilise a prospective study design. Articles were excluded if they did not report computable data or did not follow up test and control groups in the same way. Individual dichotomous and continuous datasets were extracted from accepted articles. Selection and performance/detection bias were assessed. Sensitivity analysis was used to investigate attrition bias. Egger's regression and funnel plotting were used to investigate risk for publication bias. Nine articles were identified. Of these, eight were accepted and one was excluded. Ten continuous and eight dichotomous datasets were extracted. Because of high clinical heterogeneity, no meta-analysis was performed. Most of the datasets favoured xylitol, but this was not consistent. The accepted trials may be limited by selection bias. Results of the sensitivity analysis indicate a high risk for attrition bias. The funnel plot and Egger's regression results suggest a low publication bias risk. External fluoride exposure and stimulated saliva flow may have confounded the measured anticariogenic effect of xylitol. The evidence identified in support of xylitol over sorbitol is contradictory, is at high risk for selection and attrition bias and may be limited by confounder effects. Future high-quality randomised controlled trials are needed to show whether xylitol has a greater anticariogenic effect than sorbitol. © 2012 FDI World Dental Federation.
Mohammadi, Seyed-Farzad; Sabbaghi, Mostafa; Z-Mehrjardi, Hadi; Hashemi, Hassan; Alizadeh, Somayeh; Majdi, Mercede; Taee, Farough
2012-03-01
To apply artificial intelligence models to predict the occurrence of posterior capsule opacification (PCO) after phacoemulsification. Farabi Eye Hospital, Tehran, Iran. Clinical-based cross-sectional study. The posterior capsule status of eyes operated on for age-related cataract and the need for laser capsulotomy were determined. After a literature review, data polishing, and expert consultation, 10 input variables were selected. The QUEST algorithm was used to develop a decision tree. Three back-propagation artificial neural networks were constructed with 4, 20, and 40 neurons in 2 hidden layers and trained with the same transfer functions (log-sigmoid and linear transfer) and training protocol with randomly selected eyes. They were then tested on the remaining eyes and the networks compared for their performance. Performance indices were used to compare resultant models with the results of logistic regression analysis. The models were trained using 282 randomly selected eyes and then tested using 70 eyes. Laser capsulotomy for clinically significant PCO was indicated or had been performed 2 years postoperatively in 40 eyes. A sample decision tree was produced with accuracy of 50% (likelihood ratio 0.8). The best artificial neural network, which showed 87% accuracy and a positive likelihood ratio of 8, was achieved with 40 neurons. The area under the receiver-operating-characteristic curve was 0.71. In comparison, logistic regression reached accuracy of 80%; however, the likelihood ratio was not measurable because the sensitivity was zero. A prototype artificial neural network was developed that predicted posterior capsule status (requiring capsulotomy) with reasonable accuracy. No author has a financial or proprietary interest in any material or method mentioned. Copyright © 2012 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.
Abbiati, Milena; Baroffio, Anne; Gerbase, Margaret W.
2016-01-01
Introduction A consistent body of literature highlights the importance of a broader approach to select medical school candidates both assessing cognitive capacity and individual characteristics. However, selection in a great number of medical schools worldwide is still based on knowledge exams, a procedure that might neglect students with needed personal characteristics for future medical practice. We investigated whether the personal profile of students selected through a knowledge-based exam differed from those not selected. Methods Students applying for medical school (N=311) completed questionnaires assessing motivations for becoming a doctor, learning approaches, personality traits, empathy, and coping styles. Selection was based on the results of MCQ tests. Principal component analysis was used to draw a profile of the students. Differences between selected and non-selected students were examined by Multivariate ANOVAs, and their impact on selection by logistic regression analysis. Results Students demonstrating a profile of diligence with higher conscientiousness, deep learning approach, and task-focused coping were more frequently selected (p=0.01). Other personal characteristics such as motivation, sociability, and empathy did not significantly differ, comparing selected and non-selected students. Conclusion Selection through a knowledge-based exam privileged diligent students. It did neither advantage nor preclude candidates with a more humane profile. PMID:27079886
Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression
ERIC Educational Resources Information Center
Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.
2013-01-01
Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…
2013-06-01
Character in Sports Index CV Cross Validation FAS Faculty Appraisal Score FFM Five-Factor Model, also known as the “Big Five” GAM... FFM ). USMA does not allow personality testing as a selection tool. However, perhaps we may discover whether pre-admission information can predict...characteristic, and personality factors as described by the Five Factor Model ( FFM ) to determine their effect on one’s academic performance at USMA (Clark
1993-03-01
statistical mathe- matics, began in the late 1800’s when Sir Francis Galton first attempted to use practical mathematical techniques to investigate the...randomly collected (sampled) many pairs of parent/child height mea- surements (data), Galton observed that for a given parent- height average, the...ty only Maximum Adjusted R2 will be discussed. However, Maximum Adjusted R’ and Minimum MSE test exactly the same 2.thing. Adjusted R is related to R
Robust Variable Selection with Exponential Squared Loss.
Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping
2013-04-01
Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are [Formula: see text] and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods.
Robust Variable Selection with Exponential Squared Loss
Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping
2013-01-01
Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are n-consistent and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods. PMID:23913996
Selection and study performance: comparing three admission processes within one medical school.
Schripsema, Nienke R; van Trigt, Anke M; Borleffs, Jan C C; Cohen-Schotanus, Janke
2014-12-01
This study was conducted to: (i) analyse whether students admitted to one medical school based on top pre-university grades, a voluntary multifaceted selection process, or lottery, respectively, differed in study performance; (ii) examine whether students who were accepted in the multifaceted selection process outperformed their rejected peers, and (iii) analyse whether participation in the multifaceted selection procedure was related to performance. We examined knowledge test and professionalism scores, study progress and dropout in three cohorts of medical students admitted to the University of Groningen, the Netherlands in 2009, 2010 and 2011 (n = 1055). We divided the lottery-admitted group into, respectively, students who had not participated and students who had been rejected in the multifaceted selection process. We used ancova modelling, logistic regression and Bonferroni post hoc multiple-comparison tests and controlled for gender and cohort. The top pre-university grade group achieved higher knowledge test scores and more Year 1 course credits than all other groups (p < 0.05). This group received the highest possible professionalism score more often than the lottery-admitted group that had not participated in the multifaceted selection process (p < 0.05). The group of students accepted in the multifaceted selection process obtained higher written test scores than the lottery-admitted group that had not participated (p < 0.05) and achieved the highest possible professionalism score more often than both lottery-admitted groups. The lottery-admitted group that had not participated in the multifaceted selection process earned fewer Year 1 and 2 course credits than all other groups (p < 0.05). Dropout rates differed among the groups (p < 0.05), but correction for multiple comparisons rendered all pairwise differences non-significant. A top pre-university grade point average was the best predictor of performance. For so-called non-academic performance, the multifaceted selection process was efficient in identifying applicants with suitable skills. Participation in the multifaceted selection procedure seems to be predictive of higher performance. Further research is needed to assess whether our results are generalisable to other medical schools. © 2014 John Wiley & Sons Ltd.
Husbands, Adrian; Mathieson, Alistair; Dowell, Jonathan; Cleland, Jennifer; MacKenzie, Rhoda
2014-04-23
The UK Clinical Aptitude Test (UKCAT) was designed to address issues identified with traditional methods of selection. This study aims to examine the predictive validity of the UKCAT and compare this to traditional selection methods in the senior years of medical school. This was a follow-up study of two cohorts of students from two medical schools who had previously taken part in a study examining the predictive validity of the UKCAT in first year. The sample consisted of 4th and 5th Year students who commenced their studies at the University of Aberdeen or University of Dundee medical schools in 2007. Data collected were: demographics (gender and age group), UKCAT scores; Universities and Colleges Admissions Service (UCAS) form scores; admission interview scores; Year 4 and 5 degree examination scores. Pearson's correlations were used to examine the relationships between admissions variables, examination scores, gender and age group, and to select variables for multiple linear regression analysis to predict examination scores. Ninety-nine and 89 students at Aberdeen medical school from Years 4 and 5 respectively, and 51 Year 4 students in Dundee, were included in the analysis. Neither UCAS form nor interview scores were statistically significant predictors of examination performance. Conversely, the UKCAT yielded statistically significant validity coefficients between .24 and .36 in four of five assessments investigated. Multiple regression analysis showed the UKCAT made a statistically significant unique contribution to variance in examination performance in the senior years. Results suggest the UKCAT appears to predict performance better in the later years of medical school compared to earlier years and provides modest supportive evidence for the UKCAT's role in student selection within these institutions. Further research is needed to assess the predictive validity of the UKCAT against professional and behavioural outcomes as the cohort commences working life.
2014-01-01
Background The UK Clinical Aptitude Test (UKCAT) was designed to address issues identified with traditional methods of selection. This study aims to examine the predictive validity of the UKCAT and compare this to traditional selection methods in the senior years of medical school. This was a follow-up study of two cohorts of students from two medical schools who had previously taken part in a study examining the predictive validity of the UKCAT in first year. Methods The sample consisted of 4th and 5th Year students who commenced their studies at the University of Aberdeen or University of Dundee medical schools in 2007. Data collected were: demographics (gender and age group), UKCAT scores; Universities and Colleges Admissions Service (UCAS) form scores; admission interview scores; Year 4 and 5 degree examination scores. Pearson’s correlations were used to examine the relationships between admissions variables, examination scores, gender and age group, and to select variables for multiple linear regression analysis to predict examination scores. Results Ninety-nine and 89 students at Aberdeen medical school from Years 4 and 5 respectively, and 51 Year 4 students in Dundee, were included in the analysis. Neither UCAS form nor interview scores were statistically significant predictors of examination performance. Conversely, the UKCAT yielded statistically significant validity coefficients between .24 and .36 in four of five assessments investigated. Multiple regression analysis showed the UKCAT made a statistically significant unique contribution to variance in examination performance in the senior years. Conclusions Results suggest the UKCAT appears to predict performance better in the later years of medical school compared to earlier years and provides modest supportive evidence for the UKCAT’s role in student selection within these institutions. Further research is needed to assess the predictive validity of the UKCAT against professional and behavioural outcomes as the cohort commences working life. PMID:24762134
Backset and cervical retraction capacity among occupants in a modern car.
Jonsson, Bertil; Stenlund, Hans; Svensson, Mats Y; Björnstig, Ulf
2007-03-01
The horizontal distance between the back of the head and the frontal of the head restraint (backset) and rearward head movement relative to the torso (cervical retraction) were studied in different occupant postures and positions in a modern car. A stratified randomized population of 154 test subjects was studied in a Volvo V70 year model 2003 car, in driver, front passenger, and rear passenger position. In each position, the subjects adopted (i) a self-selected posture, (ii) a sagging posture, and (iii) an erect posture. Cervical retraction, backset, and vertical distance from the top of the head restraint to the occipital protuberance in the back of the head of the test subject were measured. These data were analyzed using repeated measures ANOVA and linear regression analysis with a significance level set to p < 0.05. In the self-selected posture, the average backset was 61 mm for drivers, 29 mm for front passengers, and 103 mm for rear passengers (p < 0.001). Women had lower mean backset (40 mm) than men (81 mm), particularly in the self-selected driving position. Backset was larger and cervical retraction capacity lower in the sagging posture than in the self-selected posture for occupants in all three occupant positions. Rear passengers had the largest backset values. Backset values decreased with increased age. The average cervical retraction capacity in self-selected posture was 35 mm for drivers, 30 mm for front passengers, and 33 mm for rear passengers (p < 0.001). Future design of rear-end impact protection may take these study results into account when trying to reduce backset before impact. Our results might be used for future development and use of BioRID manikins and rear-end tests in consumer rating test programs such as Euro-NCAP.
Long-term neurodevelopmental outcome after selective feticide in monochorionic pregnancies.
van Klink, Jmm; Koopman, H M; Middeldorp, J M; Klumper, F J; Rijken, M; Oepkes, D; Lopriore, E
2015-10-01
To assess the incidence of and risk factors for adverse long-term neurodevelopmental outcome in complicated monochorionic pregnancies treated with selective feticide at our centre between 2000 and 2011. Observational cohort study. National referral centre for fetal therapy (Leiden University Medical Centre, the Netherlands). Neurodevelopmental outcome was assessed in 74 long-term survivors. Children, at least 2 years of age, underwent an assessment of neurologic, motor and cognitive development using standardised psychometric tests and the parents completed a behavioural questionnaire. A composite outcome termed neurodevelopmental impairment including cerebral palsy (GMFCS II-V), cognitive and/or motor test score of <70, bilateral blindness or bilateral deafness requiring amplification. A total of 131 monochorionic pregnancies were treated with selective feticide at the Leiden University Medical Centre. Overall survival rate was 88/131 (67%). Long-term outcome was assessed in 74/88 (84%). Neurodevelopmental impairment was detected in 5/74 [6.8%, 95% confidence interval (CI), 1.1-12.5] of survivors. Overall adverse outcome, including perinatal mortality or neurodevelopmental impairment was 48/131 (36.6%). In multivariate analysis, parental educational level was associated with cognitive test scores (regression coefficient B 3.9, 95% CI 1.8-6.0). Behavioural problems were reported in 10/69 (14.5%). Adverse long-term outcome in survivor twins of complicated monochorionic pregnancies treated with selective feticide appears to be more prevalent than in the general population. Cognitive test scores were associated with parental educational level. Neurodevelopmental impairment after selective feticide was detected in 5/74 (6.8%, 95% CI 1.1-12.5) of survivors. © 2015 Royal College of Obstetricians and Gynaecologists.
Selective adsorption of flavor-active components on hydrophobic resins.
Saffarionpour, Shima; Sevillano, David Mendez; Van der Wielen, Luuk A M; Noordman, T Reinoud; Brouwer, Eric; Ottens, Marcel
2016-12-09
This work aims to propose an optimum resin that can be used in industrial adsorption process for tuning flavor-active components or removal of ethanol for producing an alcohol-free beer. A procedure is reported for selective adsorption of volatile aroma components from water/ethanol mixtures on synthetic hydrophobic resins. High throughput 96-well microtiter-plates batch uptake experimentation is applied for screening resins for adsorption of esters (i.e. isoamyl acetate, and ethyl acetate), higher alcohols (i.e. isoamyl alcohol and isobutyl alcohol), a diketone (diacetyl) and ethanol. The miniaturized batch uptake method is adapted for adsorption of volatile components, and validated with column breakthrough analysis. The results of single-component adsorption tests on Sepabeads SP20-SS are expressed in single-component Langmuir, Freundlich, and Sips isotherm models and multi-component versions of Langmuir and Sips models are applied for expressing multi-component adsorption results obtained on several tested resins. The adsorption parameters are regressed and the selectivity over ethanol is calculated for each tested component and tested resin. Resin scores for four different scenarios of selective adsorption of esters, higher alcohols, diacetyl, and ethanol are obtained. The optimal resin for adsorption of esters is Sepabeads SP20-SS with resin score of 87% and for selective removal of higher alcohols, XAD16N, and XAD4 from Amberlite resin series are proposed with scores of 80 and 74% respectively. For adsorption of diacetyl, XAD16N and XAD4 resins with score of 86% are the optimum choice and Sepabeads SP2MGS and XAD761 resins showed the highest affinity towards ethanol. Copyright © 2016 Elsevier B.V. All rights reserved.
Watson, Kara M.; McHugh, Amy R.
2014-01-01
Regional regression equations were developed for estimating monthly flow-duration and monthly low-flow frequency statistics for ungaged streams in Coastal Plain and non-coastal regions of New Jersey for baseline and current land- and water-use conditions. The equations were developed to estimate 87 different streamflow statistics, which include the monthly 99-, 90-, 85-, 75-, 50-, and 25-percentile flow-durations of the minimum 1-day daily flow; the August–September 99-, 90-, and 75-percentile minimum 1-day daily flow; and the monthly 7-day, 10-year (M7D10Y) low-flow frequency. These 87 streamflow statistics were computed for 41 continuous-record streamflow-gaging stations (streamgages) with 20 or more years of record and 167 low-flow partial-record stations in New Jersey with 10 or more streamflow measurements. The regression analyses used to develop equations to estimate selected streamflow statistics were performed by testing the relation between flow-duration statistics and low-flow frequency statistics for 32 basin characteristics (physical characteristics, land use, surficial geology, and climate) at the 41 streamgages and 167 low-flow partial-record stations. The regression analyses determined drainage area, soil permeability, average April precipitation, average June precipitation, and percent storage (water bodies and wetlands) were the significant explanatory variables for estimating the selected flow-duration and low-flow frequency statistics. Streamflow estimates were computed for two land- and water-use conditions in New Jersey—land- and water-use during the baseline period of record (defined as the years a streamgage had little to no change in development and water use) and current land- and water-use conditions (1989–2008)—for each selected station using data collected through water year 2008. The baseline period of record is representative of a period when the basin was unaffected by change in development. The current period is representative of the increased development of the last 20 years (1989–2008). The two different land- and water-use conditions were used as surrogates for development to determine whether there have been changes in low-flow statistics as a result of changes in development over time. The State was divided into two low-flow regression regions, the Coastal Plain and the non-coastal region, in order to improve the accuracy of the regression equations. The left-censored parametric survival regression method was used for the analyses to account for streamgages and partial-record stations that had zero flow values for some of the statistics. The average standard error of estimate for the 348 regression equations ranged from 16 to 340 percent. These regression equations and basin characteristics are presented in the U.S. Geological Survey (USGS) StreamStats Web-based geographic information system application. This tool allows users to click on an ungaged site on a stream in New Jersey and get the estimated flow-duration and low-flow frequency statistics. Additionally, the user can click on a streamgage or partial-record station and get the “at-site” streamflow statistics. The low-flow characteristics of a stream ultimately affect the use of the stream by humans. Specific information on the low-flow characteristics of streams is essential to water managers who deal with problems related to municipal and industrial water supply, fish and wildlife conservation, and dilution of wastewater.
Xing, Jian; Burkom, Howard; Tokars, Jerome
2011-12-01
Automated surveillance systems require statistical methods to recognize increases in visit counts that might indicate an outbreak. In prior work we presented methods to enhance the sensitivity of C2, a commonly used time series method. In this study, we compared the enhanced C2 method with five regression models. We used emergency department chief complaint data from US CDC BioSense surveillance system, aggregated by city (total of 206 hospitals, 16 cities) during 5/2008-4/2009. Data for six syndromes (asthma, gastrointestinal, nausea and vomiting, rash, respiratory, and influenza-like illness) was used and was stratified by mean count (1-19, 20-49, ≥50 per day) into 14 syndrome-count categories. We compared the sensitivity for detecting single-day artificially-added increases in syndrome counts. Four modifications of the C2 time series method, and five regression models (two linear and three Poisson), were tested. A constant alert rate of 1% was used for all methods. Among the regression models tested, we found that a Poisson model controlling for the logarithm of total visits (i.e., visits both meeting and not meeting a syndrome definition), day of week, and 14-day time period was best. Among 14 syndrome-count categories, time series and regression methods produced approximately the same sensitivity (<5% difference) in 6; in six categories, the regression method had higher sensitivity (range 6-14% improvement), and in two categories the time series method had higher sensitivity. When automated data are aggregated to the city level, a Poisson regression model that controls for total visits produces the best overall sensitivity for detecting artificially added visit counts. This improvement was achieved without increasing the alert rate, which was held constant at 1% for all methods. These findings will improve our ability to detect outbreaks in automated surveillance system data. Published by Elsevier Inc.
Dry Eye Disease: Concordance Between the Diagnostic Tests in African Eyes.
Onwubiko, Stella N; Eze, Boniface I; Udeh, Nnenma N; Onwasigwe, Ernest N; Umeh, Rich E
2016-11-01
To assess the concordance between the diagnostic tests for dry eye disease (DED) in a Nigerian hospital population. The study was a hospital-based cross-sectional survey of adults (≥18 years) presenting at the eye clinic of the University of Nigeria Teaching Hospital (UNTH), Enugu; September-December, 2011. Participants' socio-demographic data were collected. Each subject was assessed for DED using the "Ocular Surface Disease Index" (OSDI) questionnaire, tear-film breakup time (TBUT), and Schirmer test. The intertest concordance was assessed using kappa statistic, correlation, and regression coefficients. The participants (n=402; men: 193) were aged 50.1±19.1 standard deviation years (range: 18-94 years). Dry eye disease was diagnosed in 203 by TBUT, 170 by Schirmer test, and 295 by OSDI; the concordance between the tests were OSDI versus TBUT (Kappa, κ=-0.194); OSDI versus Schirmer (κ=-0.276); and TBUT versus Schirmer (κ=0.082). Ocular Surface Disease Index was inversely correlated with Schirmer test (Spearman ρ=-0.231, P<0.001) and TBUT (ρ=-0.237, P<0.001). In the linear regression model, OSDI was poorly predicted by TBUT (β=-0.09; 95% confidence interval (CI): -0.26 to -0.03, P=0.14) and Schirmer test (β=-0.35, 95% CI: -0.53 to -0.18, P=0.18). At UNTH, there is poor agreement, and almost equal correlation, between the subjective and objective tests for DED. Therefore, the selection of diagnostic test for DED should be informed by cost-effectiveness and diagnostic resource availability, not diagnostic efficiency or utility.
Discriminative least squares regression for multiclass classification and feature selection.
Xiang, Shiming; Nie, Feiping; Meng, Gaofeng; Pan, Chunhong; Zhang, Changshui
2012-11-01
This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the ε-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.
Harmsen, Wouter J; Ribbers, Gerard M; Slaman, Jorrit; Heijenbrok-Kal, Majanka H; Khajeh, Ladbon; van Kooten, Fop; Neggers, Sebastiaan J C M M; van den Berg-Emons, Rita J
2017-05-01
Peak oxygen uptake (VO 2peak ) established during progressive cardiopulmonary exercise testing (CPET) is the "gold-standard" for cardiorespiratory fitness. However, CPET measurements may be limited in patients with aneurysmal subarachnoid hemorrhage (a-SAH) by disease-related complaints, such as cardiovascular health-risks or anxiety. Furthermore, CPET with gas-exchange analyses require specialized knowledge and infrastructure with limited availability in most rehabilitation facilities. To determine whether an easy-to-administer six-minute walk test (6MWT) is a valid clinical alternative to progressive CPET in order to predict VO 2peak in individuals with a-SAH. Twenty-seven patients performed the 6MWT and CPET with gas-exchange analyses on a cycle ergometer. Univariate and multivariate regression models were made to investigate the predictability of VO 2peak from the six-minute walk distance (6MWD). Univariate regression showed that the 6MWD was strongly related to VO 2peak (r = 0.75, p < 0.001), with an explained variance of 56% and a prediction error of 4.12 ml/kg/min, representing 18% of mean VO 2peak . Adding age and sex to an extended multivariate regression model improved this relationship (r = 0.82, p < 0.001), with an explained variance of 67% and a prediction error of 3.67 ml/kg/min corresponding to 16% of mean VO 2peak . The 6MWT is an easy-to-administer submaximal exercise test that can be selected to estimate cardiorespiratory fitness at an aggregated level, in groups of patients with a-SAH, which may help to evaluate interventions in a clinical or research setting. However, the relatively large prediction error does not allow for an accurate prediction in individual patients.
1990-03-01
and M.H. Knuter. Applied Linear Regression Models. Homewood IL: Richard D. Erwin Inc., 1983. Pritsker, A. Alan B. Introduction to Simulation and SLAM...Control Variates in Simulation," European Journal of Operational Research, 42: (1989). Neter, J., W. Wasserman, and M.H. Xnuter. Applied Linear Regression Models
Jaworski, Mariusz; Panczyk, Mariusz; Cedro, Małgorzata; Kucharska, Alicja
2018-01-01
Adherence by diabetic patients to dietary recommendations is important for effective therapy. Considering patients' expectations in case of diet is significant in this regard. The aim of this paper was to analyze the relationship between selected independent variables (eg, regular blood glucose testing) and patients' adherence to dietary recommendations, bearing in mind that the degree of disease acceptance might play a mediation role. A cross-sectional study was conducted in 91 patients treated for type 2 diabetes mellitus in a public medical facility. Paper-and-pencil interviewing was administered ahead of the planned visit with a diabetes specialist. Two measures were applied in the study: the Acceptance and Action Diabetes Questionnaire and the Patient Diet Adherence in Diabetes Scale. Additionally, data related to sociodemographic characteristics, lifestyle-related factors, and the course of the disease (management, incidence of complications, and dietician's supervision) were also collected. The regression method was used in the analysis, and Cohen's methodology was used to estimate partial mediation. Significance of the mediation effect was assessed by the Goodman test. P -values of <0.05 were considered statistically significant. Patients' non-adherence to dietary recommendations was related to a low level of disease acceptance (standardized regression coefficient =-0.266; P =0.010). Moreover, failure to perform regular blood glucose testing was associated with a lack of disease acceptance (standardized regression coefficient =-0.455; P =0.000). However, the lack of regular blood glucose testing and low level of acceptance had only partially negative impacts on adherence to dietary recommendations (Goodman mediation test, Z =1.939; P =0.054). This dependence was not seen in patients treated with diet and concomitant oral medicines and/or insulin therapy. Effective dietary education should include activities promoting a more positive attitude toward the disease. This may be obtained by individual counseling, respecting the patient's needs, and focus on regular blood glucose testing.
Kothe, Christian; Hissbach, Johanna; Hampe, Wolfgang
2014-01-01
Although some recent studies concluded that dexterity is not a reliable predictor of performance in preclinical laboratory courses in dentistry, they could not disprove earlier findings which confirmed the worth of manual dexterity tests in dental admission. We developed a wire bending test (HAM-Man) which was administered during dental freshmen's first week in 2008, 2009, and 2010. The purpose of our study was to evaluate if the HAM-Man is a useful selection criterion additional to the high school grade point average (GPA) in dental admission. Regression analysis revealed that GPA only accounted for a maximum of 9% of students' performance in preclinical laboratory courses, in six out of eight models the explained variance was below 2%. The HAM-Man incrementally explained up to 20.5% of preclinical practical performance over GPA. In line with findings from earlier studies the HAM-Man test of manual dexterity showed satisfactory incremental validity. While GPA has a focus on cognitive abilities, the HAM-Man reflects learning of unfamiliar psychomotor skills, spatial relationships, and dental techniques needed in preclinical laboratory courses. The wire bending test HAM-Man is a valuable additional selection instrument for applicants of dental schools.
Wood, Molly S.; Fosness, Ryan L.; Skinner, Kenneth D.; Veilleux, Andrea G.
2016-06-27
The U.S. Geological Survey, in cooperation with the Idaho Transportation Department, updated regional regression equations to estimate peak-flow statistics at ungaged sites on Idaho streams using recent streamflow (flow) data and new statistical techniques. Peak-flow statistics with 80-, 67-, 50-, 43-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities (1.25-, 1.50-, 2.00-, 2.33-, 5.00-, 10.0-, 25.0-, 50.0-, 100-, 200-, and 500-year recurrence intervals, respectively) were estimated for 192 streamgages in Idaho and bordering States with at least 10 years of annual peak-flow record through water year 2013. The streamgages were selected from drainage basins with little or no flow diversion or regulation. The peak-flow statistics were estimated by fitting a log-Pearson type III distribution to records of annual peak flows and applying two additional statistical methods: (1) the Expected Moments Algorithm to help describe uncertainty in annual peak flows and to better represent missing and historical record; and (2) the generalized Multiple Grubbs Beck Test to screen out potentially influential low outliers and to better fit the upper end of the peak-flow distribution. Additionally, a new regional skew was estimated for the Pacific Northwest and used to weight at-station skew at most streamgages. The streamgages were grouped into six regions (numbered 1_2, 3, 4, 5, 6_8, and 7, to maintain consistency in region numbering with a previous study), and the estimated peak-flow statistics were related to basin and climatic characteristics to develop regional regression equations using a generalized least squares procedure. Four out of 24 evaluated basin and climatic characteristics were selected for use in the final regional peak-flow regression equations.Overall, the standard error of prediction for the regional peak-flow regression equations ranged from 22 to 132 percent. Among all regions, regression model fit was best for region 4 in west-central Idaho (average standard error of prediction=46.4 percent; pseudo-R2>92 percent) and region 5 in central Idaho (average standard error of prediction=30.3 percent; pseudo-R2>95 percent). Regression model fit was poor for region 7 in southern Idaho (average standard error of prediction=103 percent; pseudo-R2<78 percent) compared to other regions because few streamgages in region 7 met the criteria for inclusion in the study, and the region’s semi-arid climate and associated variability in precipitation patterns causes substantial variability in peak flows.A drainage area ratio-adjustment method, using ratio exponents estimated using generalized least-squares regression, was presented as an alternative to the regional regression equations if peak-flow estimates are desired at an ungaged site that is close to a streamgage selected for inclusion in this study. The alternative drainage area ratio-adjustment method is appropriate for use when the drainage area ratio between the ungaged and gaged sites is between 0.5 and 1.5.The updated regional peak-flow regression equations had lower total error (standard error of prediction) than all regression equations presented in a 1982 study and in four of six regions presented in 2002 and 2003 studies in Idaho. A more extensive streamgage screening process used in the current study resulted in fewer streamgages used in the current study than in the 1982, 2002, and 2003 studies. Fewer streamgages used and the selection of different explanatory variables were likely causes of increased error in some regions compared to previous studies, but overall, regional peak‑flow regression model fit was generally improved for Idaho. The revised statistical procedures and increased streamgage screening applied in the current study most likely resulted in a more accurate representation of natural peak-flow conditions.The updated, regional peak-flow regression equations will be integrated in the U.S. Geological Survey StreamStats program to allow users to estimate basin and climatic characteristics and peak-flow statistics at ungaged locations of interest. StreamStats estimates peak-flow statistics with quantifiable certainty only when used at sites with basin and climatic characteristics within the range of input variables used to develop the regional regression equations. Both the regional regression equations and StreamStats should be used to estimate peak-flow statistics only in naturally flowing, relatively unregulated streams without substantial local influences to flow, such as large seeps, springs, or other groundwater-surface water interactions that are not widespread or characteristic of the respective region.
Estimates of genetic parameters and eigenvector indices for milk production of Holstein cows.
Savegnago, R P; Rosa, G J M; Valente, B D; Herrera, L G G; Carneiro, R L R; Sesana, R C; El Faro, L; Munari, D P
2013-01-01
The objectives of the present study were to estimate genetic parameters of monthly test-day milk yield (TDMY) of the first lactation of Brazilian Holstein cows using random regression (RR), and to compare the genetic gains for milk production and persistency, derived from RR models, using eigenvector indices and selection indices that did not consider eigenvectors. The data set contained monthly TDMY of 3,543 first lactations of Brazilian Holstein cows calving between 1994 and 2011. The RR model included the fixed effect of the contemporary group (herd-month-year of test days), the covariate calving age (linear and quadratic effects), and a fourth-order regression on Legendre orthogonal polynomials of days in milk (DIM) to model the population-based mean curve. Additive genetic and nongenetic animal effects were fit as RR with 4 classes of residual variance random effect. Eigenvector indices based on the additive genetic RR covariance matrix were used to evaluate the genetic gains of milk yield and persistency compared with the traditional selection index (selection index based on breeding values of milk yield until 305 DIM). The heritability estimates for monthly TDMY ranged from 0.12 ± 0.04 to 0.31 ± 0.04. The estimates of additive genetic and nongenetic animal effects correlation were close to 1 at adjacent monthly TDMY, with a tendency to diminish as the time between DIM classes increased. The first eigenvector was related to the increase of the genetic response of the milk yield and the second eigenvector was related to the increase of the genetic gains of the persistency but it contributed to decrease the genetic gains for total milk yield. Therefore, using this eigenvector to improve persistency will not contribute to change the shape of genetic curve pattern. If the breeding goal is to improve milk production and persistency, complete sequential eigenvector indices (selection indices composite with all eigenvectors) could be used with higher economic values for persistency. However, if the breeding goal is to improve only milk yield, the traditional selection index is indicated. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Dyar, M. D.; Carmosino, M. L.; Breves, E. A.; Ozanne, M. V.; Clegg, S. M.; Wiens, R. C.
2012-04-01
A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the response variables as possible while avoiding multicollinearity between principal components. When the selected number of principal components is projected back into the original feature space of the spectra, 6144 correlation coefficients are generated, a small fraction of which are mathematically significant to the regression. In contrast, the lasso models require only a small number (< 24) of non-zero correlation coefficients (β values) to determine the concentration of each of the ten major elements. Causality between the positively-correlated emission lines chosen by the lasso and the elemental concentration was examined. In general, the higher the lasso coefficient (β), the greater the likelihood that the selected line results from an emission of that element. Emission lines with negative β values should arise from elements that are anti-correlated with the element being predicted. For elements except Fe, Al, Ti, and P, the lasso-selected wavelength with the highest β value corresponds to the element being predicted, e.g. 559.8 nm for neutral Ca. However, the specific lines chosen by the lasso with positive β values are not always those from the element being predicted. Other wavelengths and the elements that most strongly correlate with them to predict concentration are obviously related to known geochemical correlations or close overlap of emission lines, while others must result from matrix effects. Use of the lasso technique thus directly informs our understanding of the underlying physical processes that give rise to LIBS emissions by determining which lines can best represent concentration, and which lines from other elements are causing matrix effects.
Reversed-phase liquid chromatography column testing: robustness study of the test.
Le Mapihan, K; Vial, J; Jardy, A
2004-12-24
Choosing the right RPLC column for an actual separation among the more than 600 commercially available ones still represents a real challenge for the analyst particularly when basic solutes are involved. Many tests dedicated to the characterization and the classification of stationary phases have been proposed in the literature and some of them highlighted the need of a better understanding of retention properties to lead to a rational choice of columns. However, unlike classical chromatographic methods, the problem of their robustness evaluation has often been left unaddressed. In the present study, we present a robustness study that was applied to the chromatographic testing procedure we had developed and optimized previously. A design of experiment (DoE) approach was implemented. Four factors, previously identified as potentially influent, were selected and subjected to small controlled variations: solvent fraction, temperature, pH and buffer concentration. As our model comprised quadratic terms instead of a simple linear model, we chose a D-optimal design in order to minimize the experiment number. As a previous batch-to-batch study [K. Le Mapihan, Caractérisation et classification des phases stationnaires utilisées pour l'analyse CPL de produits pharmaceutiques, Ph.D. Thesis, Pierre and Marie Curie University, 2004] had shown a low variability on the selected stationary phase, it was then possible to split the design into two parts, according to the solvent nature, each using one column. Actually, our testing procedure involving assays both with methanol and with acetonitrile as organic modifier, such an approach enabled to avoid a possible bias due to the column ageing considering the number of experiments required (16 + 6 center points). Experimental results were computed thanks to a Partial Least Squares regression procedure, more adapted than the classical regression to handle factors and responses not completely independent. The results showed the behavior of the solutes in relation to their physico-chemical properties and the relevance of the second term degree of our model. Finally, the robust domain of the test has been fairly identified, so that any potential user precisely knows to which extend each experimental parameter must be controlled when our testing procedure is to be implemented.
NASA Astrophysics Data System (ADS)
Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed
2017-01-01
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration.
Moore, Eric J; Price, Daniel L; Van Abel, Kathryn M; Carlson, Matthew L
2015-02-01
Application to otolaryngology-head and neck surgery residency is highly competitive, and the interview process strives to select qualified applicants with a high aptitude for the specialty. Commonly employed criteria for applicant selection have failed to show correlation with proficiency during residency training. We evaluate the correlation between the results of a surgical aptitude test administered to otolaryngology resident applicants and their performance during residency. Retrospective study at an academic otolaryngology-head and neck surgery residency program. Between 2007 and 2013, 224 resident applicants participated in a previously described surgical aptitude test administered at a microvascular surgical station. The composite score and attitudinal scores for 24 consecutive residents who matched at our institution were recorded, and their residency performance was analyzed by faculty survey on a five-point scale. The composite and attitudinal scores were analyzed for correlation with residency performance score by regression analysis. Twenty-four residents were evaluated for overall quality as a clinician by eight faculty members who were blinded to the results of surgical aptitude testing. The results of these surveys showed good inter-rater reliability. Both the overall aptitude test scores and the subset attitudinal score showed reliability in predicting performance during residency training. The goal of the residency selection process is to evaluate the candidate's potential for success in residency and beyond. The results of this study suggest that a simple-to-administer clinical skills test may have predictive value for success in residency and clinician quality. 4. © 2014 The American Laryngological, Rhinological and Otological Society, Inc.
NASA Astrophysics Data System (ADS)
Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.
2017-01-01
One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.
Influence of child rearing by grandparent on the development of children aged six to twelve years.
Nanthamongkolchai, Sutham; Munsawaengsub, Chokchai; Nanthamongkolchai, Chantira
2009-03-01
To investigate the influence of child rearing by grandparent on the development of children aged six to twelve years. A cross-sectional study was conducted in 320 children that were cared for by a parent and grandparent selected by cluster sampling. The data were collected between March 10 and April 8, 2006 by questionnaire about child and family factors. The TONI-III test was used to test the child development. Data were analyzed by frequency distribution, logistic regression, and multiple logistic regression. Child caregiver had a significant influence on child development (p-value < 0.05). Children reared by a grandparent had 2.0 times higher chance of having delayed development compared with those who were reared by the parent. In addition, significant family factors that had impact on the child development were child rearing and family income. Child rearing by a grandparent had 2.0 times higher chance of having delayed development than those reared by the parent. Therefore, family and health personnel should plan to ensure the development and learning process of children that are cared by the grandparent.
NASA Astrophysics Data System (ADS)
Kusumaningsih, W.; Rachmayanti, S.; Werdhani, R. A.
2017-08-01
Hypertension and diabetes mellitus are the most common risk factors of stroke. The study aimed to determine the relationship between hypertension and diabetes mellitus risk factors and dependence on assistance with activities of daily living in chronic stroke patients. The study used an analytical observational cross-sectional design. The study’s sample included 44 stroke patients selected using the quota sampling method. The relationship between the variables was analyzed using the bivariate chi-squared test and multivariate logistic regression. Based on the chi-squared test, the relationship between the Modified Shah Barthel Index (MSBI) score and hypertension and diabetes mellitus as stroke risk factors, were p = 0.122 and p = 0.002, respectively. The logistic regression results suggest that hypertension and diabetes mellitus are stroke risk factors related to the MSBI score: p = 0.076 (OR 4.076; CI 95% 0.861-19.297) and p = 0.007 (OR 22.690; CI 95% 2.332-220.722), respectively. Diabetes mellitus is the most prominent risk factor of severe dependency on assistance with activities of daily living in chronic stroke patients.
Moscetti, Roberto; Sturm, Barbara; Crichton, Stuart Oj; Amjad, Waseem; Massantini, Riccardo
2018-05-01
The potential of hyperspectral imaging (500-1010 nm) was evaluated for monitoring of the quality of potato slices (var. Anuschka) of 5, 7 and 9 mm thickness subjected to air drying at 50 °C. The study investigated three different feature selection methods for the prediction of dry basis moisture content and colour of potato slices using partial least squares regression (PLS). The feature selection strategies tested include interval PLS regression (iPLS), and differences and ratios between raw reflectance values for each possible pair of wavelengths (R[λ 1 ]-R[λ 2 ] and R[λ 1 ]:R[λ 2 ], respectively). Moreover, the combination of spectral and spatial domains was tested. Excellent results were obtained using the iPLS algorithm. However, features from both datasets of raw reflectance differences and ratios represent suitable alternatives for development of low-complex prediction models. Finally, the dry basis moisture content was high accurately predicted by combining spectral data (i.e. R[511 nm]-R[994 nm]) and spatial domain (i.e. relative area shrinkage of slice). Modelling the data acquired during drying through hyperspectral imaging can provide useful information concerning the chemical and physicochemical changes of the product. With all this information, the proposed approach lays the foundations for a more efficient smart dryer that can be designed and its process optimized for drying of potato slices. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Tantipoj, Chanita; Sakoolnamarka, Serena Siraratna; Supa-amornkul, Sirirak; Lohsoonthorn, Vitool; Deerochanawong, Chaicharn; Khovidhunkit, Siribangon Piboonniyom; Hiransuthikul, Narin
2017-03-01
Diabetes mellitus type 2 (DM) is associated with oral diseases. Some studies indicated that patients who seek dental treatment could have undiagnosed hyperglycemic condition. The aim of this study was to assess the prevalence of undiagnosed hyperglycemia and selected associated factors among Thai dental patients. Dental patients without a history of hyperglycemia were recruited from the Special Clinic, Faculty of Dentistry, Mahidol University, Bangkok, Thailand and His Majesty the King’s Dental Service Unit, Thailand. The patients were randomly selected and a standardized questionnaire was used to collect demographic data from each patient. Blood pressure, body mass index (BMI), and waist circumference were recorded for each subject. The number of missing teeth, periodontal status, and salivary flow rate were also investigated. HbA1c was assessed using a finger prick blood sample and analyzed with a point-of-care testing machine. Hyperglycemia was defined as a HbA1c ≥5.7%. The prevalence of hyperglycemia among participants was calculated and multivariate logistic regression analysis was used to identify risk factors. A total of 724 participants were included in the study; 33.8% had hyperglycemia. On multiple logistic regression analysis, older age, family history of DM, being overweight (BMI ≥23 kg/m2), having central obesity and having severe periodontitis were significantly associated with hyperglycemia. The high prevalence of hyperglycemia in this study of dental patients suggests this setting may be appropriate to screen for patients with hyperglycemia.
Development and validation of the neck dissection impairment index: a quality of life measure.
Taylor, Rodney J; Chepeha, Judith C; Teknos, Theodoros N; Bradford, Carol R; Sharma, Pramod K; Terrell, Jeffrey E; Hogikyan, Norman D; Wolf, Gregory T; Chepeha, Douglas B
2002-01-01
To validate a health-related quality-of-life (QOL) instrument for patients following neck dissection and to identify the factors that affect QOL following neck dissection. Cross-sectional validation study. The outpatient clinic of a tertiary care cancer center. Convenience sample of 54 patients previously treated for head and neck cancer who underwent a selective neck dissection or modified radical neck dissection (64 total neck dissections). Patients had a minimum postoperative convalescence of 11 months. Thirty-two underwent accessory nerve-sparing modified radical neck dissection, and 32 underwent selective neck dissection. A 10-item, self-report instrument, the Neck Dissection Impairment Index (NDII), was developed and validated. Reliability was evaluated with test-retest correlation and internal consistency using the Cronbach alpha coefficient. Convergent validity was assessed using the 36-Item Short-Form Health Survey (SF-36) and the Constant Shoulder Scale, a shoulder function test. Multiple variable regression was used to determine variables that most affected QOL following neck dissection The 10-item NDII test-retest correlation was 0.91 (P<.001) with an internal consistency Cronbach alpha coefficient of.95. The NDII correlated with the Constant Shoulder Scale (r = 0.85, P<.001) and with the SF-36 physical functioning (r = 0.50, P<.001) and role-physical functioning (r = 0.60, P<.001) domains. Using multiple variable regression, the variables that contributed most to QOL score were patient's age and weight, radiation treatment, and neck dissection type. The NDII is a valid, reliable instrument for assessing neck dissection impairment. Patient's age, weight, radiation treatment, and neck dissection type were important factors that affect QOL following neck dissection.
Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data
Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark
2010-01-01
Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815
Genetic variation in efficiency to deposit fat and lean meat in Norwegian Landrace and Duroc pigs.
Martinsen, K H; Ødegård, J; Olsen, D; Meuwissen, T H E
2015-08-01
Feed costs amount to approximately 70% of the total costs in pork production, and feed efficiency is, therefore, an important trait for improving pork production efficiency. Production efficiency is generally improved by selection for high lean growth rate, reduced backfat, and low feed intake. These traits have given an effective slaughter pig but may cause problems in piglet production due to sows with limited body reserves. The aim of the present study was to develop a measure for feed efficiency that expressed the feed requirements per 1 kg deposited lean meat and fat, which is not improved by depositing less fat. Norwegian Landrace ( = 8,161) and Duroc ( = 7,202) boars from Topigs Norsvin's testing station were computed tomography scanned to determine their deposition of lean meat and fat. The trait was analyzed in a univariate animal model, where total feed intake in the test period was the dependent variable and fat and lean meat were included as random regression cofactors. These cofactors were measures for fat and lean meat efficiencies of individual boars. Estimation of fraction of total genetic variance due to lean meat or fat efficiency was calculated by the ratio between the genetic variance of the random regression cofactor and the total genetic variance in total feed intake during the test period. Genetic variance components suggested there was significant genetic variance among Norwegian Landrace and Duroc boars in efficiency for deposition of lean meat (0.23 ± 0.04 and 0.38 ± 0.06) and fat (0.26 ± 0.03 and 0.17 ± 0.03) during the test period. The fraction of the total genetic variance in feed intake explained by lean meat deposition was 12% for Norwegian Landrace and 15% for Duroc. Genetic fractions explained by fat deposition were 20% for Norwegian Landrace and 10% for Duroc. The results suggested a significant part of the total genetic variance in feed intake in the test period was explained by fat and lean meat efficiency. These new efficiency measures may give the breeders opportunities to select for animals with a genetic potential to deposit lean meat efficiently and at low feed costs in slaughter pigs rather than selecting for reduced the feed intake and backfat.
Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis
2016-01-01
Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
Fan, Wenzhe; Zhang, Yu; Carr, Peter W.; Rutan, Sarah C.; Dumarey, Melanie; Schellinger, Adam P.; Pritts, Wayne
2011-01-01
Fourteen judiciously selected reversed-phase columns were tested with 18 cationic drug solutes under the isocratic elution conditions advised in the Snyder-Dolan (S-D) hydrophobic subtraction method of column classification. The standard errors (S.E.) of the least squares regressions of log k′ vs. log k′REF were obtained for a given column against a reference column and used to compare and classify columns based on their selectivity. The results are consistent with those obtained with a study of the 16 test solutes recommended by Snyder and Dolan. To the extent that these drugs are representative these results show that the S-D classification scheme is also generally applicable to pharmaceuticals under isocratic conditions. That is, those columns judged to be similar based on the S-D 16 solutes were similar based on the 18 drugs; furthermore those columns judged to have significantly different selectivities based on the 16 S-D probes appeared to be quite different for the drugs as well. Given that the S-D method has been used to classify more than 400 different types of reversed phases the extension to cationic drugs is a significant finding. PMID:19698948
Neither fixed nor random: weighted least squares meta-regression.
Stanley, T D; Doucouliagos, Hristos
2017-03-01
Our study revisits and challenges two core conventional meta-regression estimators: the prevalent use of 'mixed-effects' or random-effects meta-regression analysis and the correction of standard errors that defines fixed-effects meta-regression analysis (FE-MRA). We show how and explain why an unrestricted weighted least squares MRA (WLS-MRA) estimator is superior to conventional random-effects (or mixed-effects) meta-regression when there is publication (or small-sample) bias that is as good as FE-MRA in all cases and better than fixed effects in most practical applications. Simulations and statistical theory show that WLS-MRA provides satisfactory estimates of meta-regression coefficients that are practically equivalent to mixed effects or random effects when there is no publication bias. When there is publication selection bias, WLS-MRA always has smaller bias than mixed effects or random effects. In practical applications, an unrestricted WLS meta-regression is likely to give practically equivalent or superior estimates to fixed-effects, random-effects, and mixed-effects meta-regression approaches. However, random-effects meta-regression remains viable and perhaps somewhat preferable if selection for statistical significance (publication bias) can be ruled out and when random, additive normal heterogeneity is known to directly affect the 'true' regression coefficient. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A Highly Efficient Design Strategy for Regression with Outcome Pooling
Mitchell, Emily M.; Lyles, Robert H.; Manatunga, Amita K.; Perkins, Neil J.; Schisterman, Enrique F.
2014-01-01
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. PMID:25220822
A highly efficient design strategy for regression with outcome pooling.
Mitchell, Emily M; Lyles, Robert H; Manatunga, Amita K; Perkins, Neil J; Schisterman, Enrique F
2014-12-10
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. Copyright © 2014 John Wiley & Sons, Ltd.
Vidaki, Athina; Ballard, David; Aliferi, Anastasia; Miller, Thomas H; Barron, Leon P; Syndercombe Court, Denise
2017-05-01
The ability to estimate the age of the donor from recovered biological material at a crime scene can be of substantial value in forensic investigations. Aging can be complex and is associated with various molecular modifications in cells that accumulate over a person's lifetime including epigenetic patterns. The aim of this study was to use age-specific DNA methylation patterns to generate an accurate model for the prediction of chronological age using data from whole blood. In total, 45 age-associated CpG sites were selected based on their reported age coefficients in a previous extensive study and investigated using publicly available methylation data obtained from 1156 whole blood samples (aged 2-90 years) analysed with Illumina's genome-wide methylation platforms (27K/450K). Applying stepwise regression for variable selection, 23 of these CpG sites were identified that could significantly contribute to age prediction modelling and multiple regression analysis carried out with these markers provided an accurate prediction of age (R 2 =0.92, mean absolute error (MAE)=4.6 years). However, applying machine learning, and more specifically a generalised regression neural network model, the age prediction significantly improved (R 2 =0.96) with a MAE=3.3 years for the training set and 4.4 years for a blind test set of 231 cases. The machine learning approach used 16 CpG sites, located in 16 different genomic regions, with the top 3 predictors of age belonged to the genes NHLRC1, SCGN and CSNK1D. The proposed model was further tested using independent cohorts of 53 monozygotic twins (MAE=7.1 years) and a cohort of 1011 disease state individuals (MAE=7.2 years). Furthermore, we highlighted the age markers' potential applicability in samples other than blood by predicting age with similar accuracy in 265 saliva samples (R 2 =0.96) with a MAE=3.2 years (training set) and 4.0 years (blind test). In an attempt to create a sensitive and accurate age prediction test, a next generation sequencing (NGS)-based method able to quantify the methylation status of the selected 16 CpG sites was developed using the Illumina MiSeq ® platform. The method was validated using DNA standards of known methylation levels and the age prediction accuracy has been initially assessed in a set of 46 whole blood samples. Although the resulted prediction accuracy using the NGS data was lower compared to the original model (MAE=7.5years), it is expected that future optimization of our strategy to account for technical variation as well as increasing the sample size will improve both the prediction accuracy and reproducibility. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Fisher, Charles K; Mehta, Pankaj
2015-06-01
Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Linden, Ariel; Adams, John L; Roberts, Nancy
2006-04-01
Although disease management (DM) has been in existence for over a decade, there is still much uncertainty as to its effectiveness in improving health status and reducing medical cost. The main reason is that most programme evaluations typically follow weak observational study designs that are subject to bias, most notably selection bias and regression to the mean. The regression discontinuity (RD) design may be the best alternative to randomized studies for evaluating DM programme effectiveness. The most crucial element of the RD design is its use of a 'cut-off' score on a pre-test measure to determine assignment to intervention or control. A valuable feature of this technique is that the pre-test measure does not have to be the same as the outcome measure, thus maximizing the programme's ability to use research-based practice guidelines, survey instruments and other tools to identify those individuals in greatest need of the programme intervention. Similarly, the cut-off score can be based on clinical understanding of the disease process, empirically derived, or resource-based. In the RD design, programme effectiveness is determined by a change in the pre-post relationship at the cut-off point. While the RD design is uniquely suitable for DM programme evaluation, its success will depend, in large part, on fundamental changes being made in the way DM programmes identify and assign individuals to the programme intervention.
The natural history of cystic echinococcosis in untreated and albendazole-treated patients.
Solomon, N; Kachani, M; Zeyhle, E; Macpherson, C N L
2017-07-01
The World Health Organization (WHO) treatment protocols for cystic echinococcosis (CE) are based on the standardized ultrasound (US) classification. This study examined whether the classification reflected the natural history of CE in untreated and albendazole-treated patients. Data were collected during mass US screenings in CE endemic regions among transhumant populations, the Turkana and Berber peoples of Kenya and Morocco. Cysts were classified using the WHO classification. Patient records occurring prior to treatment, and after albendazole administration, were selected. 852 paired before/after observations of 360 cysts from 257 patients were analyzed. A McNemar-Bowker χ 2 test for symmetry was significant (p<0.0001). 744 observations (87.3%) maintained the same class, and 101 (11.9%) progressed, consistent with the classification. Regression to CE3B occurred in seven of 116 CE4 cyst observations (6.0%). A McNemar-Bowker χ 2 test of 1414 paired before/after observations of 288 cysts from 157 albendazole-treated patients was significant (p<0.0001). 1236 observations (87.4%) maintained the same class, and 149 (10.5%) progressed, consistent with the classification. Regression to CE3B occurred in 29 of 206 CE4 observations (14.1%). Significant asymmetry confirms the WHO classification's applicability to the natural history of CE and albendazole-induced changes. Regressions may reflect the stability of CE3B cysts. Copyright © 2017. Published by Elsevier B.V.
Li, Y.; Graubard, B. I.; Huang, P.; Gastwirth, J. L.
2015-01-01
Determining the extent of a disparity, if any, between groups of people, for example, race or gender, is of interest in many fields, including public health for medical treatment and prevention of disease. An observed difference in the mean outcome between an advantaged group (AG) and disadvantaged group (DG) can be due to differences in the distribution of relevant covariates. The Peters–Belson (PB) method fits a regression model with covariates to the AG to predict, for each DG member, their outcome measure as if they had been from the AG. The difference between the mean predicted and the mean observed outcomes of DG members is the (unexplained) disparity of interest. We focus on applying the PB method to estimate the disparity based on binary/multinomial/proportional odds logistic regression models using data collected from complex surveys with more than one DG. Estimators of the unexplained disparity, an analytic variance–covariance estimator that is based on the Taylor linearization variance–covariance estimation method, as well as a Wald test for testing a joint null hypothesis of zero for unexplained disparities between two or more minority groups and a majority group, are provided. Simulation studies with data selected from simple random sampling and cluster sampling, as well as the analyses of disparity in body mass index in the National Health and Nutrition Examination Survey 1999–2004, are conducted. Empirical results indicate that the Taylor linearization variance–covariance estimation is accurate and that the proposed Wald test maintains the nominal level. PMID:25382235
Inhibitory saccadic dysfunction is associated with cerebellar injury in multiple sclerosis.
Kolbe, Scott C; Kilpatrick, Trevor J; Mitchell, Peter J; White, Owen; Egan, Gary F; Fielding, Joanne
2014-05-01
Cognitive dysfunction is common in patients with multiple sclerosis (MS). Saccadic eye movement paradigms such as antisaccades (AS) can sensitively interrogate cognitive function, in particular, the executive and attentional processes of response selection and inhibition. Although we have previously demonstrated significant deficits in the generation of AS in MS patients, the neuropathological changes underlying these deficits were not elucidated. In this study, 24 patients with relapsing-remitting MS underwent testing using an AS paradigm. Rank correlation and multiple regression analyses were subsequently used to determine whether AS errors in these patients were associated with: (i) neurological and radiological abnormalities, as measured by standard clinical techniques, (ii) cognitive dysfunction, and (iii) regionally specific cerebral white and gray-matter damage. Although AS error rates in MS patients did not correlate with clinical disability (using the Expanded Disability Status Score), T2 lesion load or brain parenchymal fraction, AS error rate did correlate with performance on the Paced Auditory Serial Addition Task and the Symbol Digit Modalities Test, neuropsychological tests commonly used in MS. Further, voxel-wise regression analyses revealed associations between AS errors and reduced fractional anisotropy throughout most of the cerebellum, and increased mean diffusivity in the cerebellar vermis. Region-wise regression analyses confirmed that AS errors also correlated with gray-matter atrophy in the cerebellum right VI subregion. These results support the use of the AS paradigm as a marker for cognitive dysfunction in MS and implicate structural and microstructural changes to the cerebellum as a contributing mechanism for AS deficits in these patients. Copyright © 2013 Wiley Periodicals, Inc.
Reduced Lung Cancer Mortality With Lower Atmospheric Pressure.
Merrill, Ray M; Frutos, Aaron
2018-01-01
Research has shown that higher altitude is associated with lower risk of lung cancer and improved survival among patients. The current study assessed the influence of county-level atmospheric pressure (a measure reflecting both altitude and temperature) on age-adjusted lung cancer mortality rates in the contiguous United States, with 2 forms of spatial regression. Ordinary least squares regression and geographically weighted regression models were used to evaluate the impact of climate and other selected variables on lung cancer mortality, based on 2974 counties. Atmospheric pressure was significantly positively associated with lung cancer mortality, after controlling for sunlight, precipitation, PM2.5 (µg/m 3 ), current smoker, and other selected variables. Positive county-level β coefficient estimates ( P < .05) for atmospheric pressure were observed throughout the United States, higher in the eastern half of the country. The spatial regression models showed that atmospheric pressure is positively associated with age-adjusted lung cancer mortality rates, after controlling for other selected variables.
NASA Astrophysics Data System (ADS)
Ahn, Hyunjun; Jung, Younghun; Om, Ju-Seong; Heo, Jun-Haeng
2014-05-01
It is very important to select the probability distribution in Statistical hydrology. Goodness of fit test is a statistical method that selects an appropriate probability model for a given data. The probability plot correlation coefficient (PPCC) test as one of the goodness of fit tests was originally developed for normal distribution. Since then, this test has been widely applied to other probability models. The PPCC test is known as one of the best goodness of fit test because it shows higher rejection powers among them. In this study, we focus on the PPCC tests for the GEV distribution which is widely used in the world. For the GEV model, several plotting position formulas are suggested. However, the PPCC statistics are derived only for the plotting position formulas (Goel and De, In-na and Nguyen, and Kim et al.) in which the skewness coefficient (or shape parameter) are included. And then the regression equations are derived as a function of the shape parameter and sample size for a given significance level. In addition, the rejection powers of these formulas are compared using Monte-Carlo simulation. Keywords: Goodness-of-fit test, Probability plot correlation coefficient test, Plotting position, Monte-Carlo Simulation ACKNOWLEDGEMENTS This research was supported by a grant 'Establishing Active Disaster Management System of Flood Control Structures by using 3D BIM Technique' [NEMA-12-NH-57] from the Natural Hazard Mitigation Research Group, National Emergency Management Agency of Korea.
ERIC Educational Resources Information Center
Leow, Christine; Wen, Xiaoli; Korfmacher, Jon
2015-01-01
This article compares regression modeling and propensity score analysis as different types of statistical techniques used in addressing selection bias when estimating the impact of two-year versus one-year Head Start on children's school readiness. The analyses were based on the national Head Start secondary dataset. After controlling for…
Foong, Hui Foh; Hamid, Tengku Aizan; Ibrahim, Rahimah; Haron, Sharifah Azizah
2018-04-01
Research has found that depression in later life is associated with cognitive impairment. Thus, the mechanism to reduce the effect of depression on cognitive function is warranted. In this paper, we intend to examine whether intrinsic religiosity mediates the association between depression and cognitive function. The study included 2322 nationally representative community-dwelling elderly in Malaysia, randomly selected through a multi-stage proportional cluster random sampling from Peninsular Malaysia. The elderly were surveyed on socio-demographic information, cognitive function, depression and intrinsic religiosity. A four-step moderated hierarchical regression analysis was employed to test the moderating effect. Statistical analyses were performed using SPSS (version 15.0). Bivariate analyses showed that both depression and intrinsic religiosity had significant relationships with cognitive function. In addition, four-step moderated hierarchical regression analysis revealed that the intrinsic religiosity moderated the association between depression and cognitive function, after controlling for selected socio-demographic characteristics. Intrinsic religiosity might reduce the negative effect of depression on cognitive function. Professionals who are working with depressed older adults should seek ways to improve their intrinsic religiosity as one of the strategies to prevent cognitive impairment.
Efficient least angle regression for identification of linear-in-the-parameters models
Beach, Thomas H.; Rezgui, Yacine
2017-01-01
Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Evidence of Adverse Selection in Iranian Supplementary Health Insurance Market
Mahdavi, Gh; Izadi, Z
2012-01-01
Background: Existence or non-existence of adverse selection in insurance market is one of the important cases that have always been considered by insurers. Adverse selection is one of the consequences of asymmetric information. Theory of adverse selection states that high-risk individuals demand the insurance service more than low risk individuals do. Methods: The presence of adverse selection in Iran’s supplementary health insurance market is tested in this paper. The study group consists of 420 practitioner individuals aged 20 to 59. We estimate two logistic regression models in order to determine the effect of individual’s characteristics on decision to purchase health insurance coverage and loss occurrence. Using the correlation between claim occurrence and decision to purchase health insurance, the adverse selection problem in Iranian supplementary health insurance market is examined. Results: Individuals with higher level of education and income level purchase less supplementary health insurance and make fewer claims than others make and there is positive correlation between claim occurrence and decision to purchase supplementary health insurance. Conclusion: Our findings prove the evidence of the presence of adverse selection in Iranian supplementary health insurance market. PMID:23113209
Christidi, Foteini; Zalonis, Ioannis; Smyrnis, Nikolaos; Evdokimidis, Ioannis
2012-09-01
The present study investigates selective attention and verbal free recall in amyotrophic lateral sclerosis (ALS) and examines the contribution of selective attention, encoding, consolidation, and retrieval memory processes to patients' verbal free recall. We examined 22 non-demented patients with sporadic ALS and 22 demographically related controls using Stroop Neuropsychological Screening Test (SNST; selective attention) and Rey Auditory Verbal Learning Test (RAVLT; immediate & delayed verbal free recall). The item-specific deficit approach (ISDA) was applied to RAVLT to evaluate encoding, consolidation, and retrieval difficulties. ALS patients performed worse than controls on SNST (p < .001) and RAVLT immediate and delayed recall (p < .001) and showed deficient encoding (p = .001) and consolidation (p = .002) but not retrieval (p = .405). Hierarchical regression analysis revealed that SNST and ISDA indices accounted for: (a) 91.1% of the variance in RAVLT immediate recall, with encoding (p = .016), consolidation (p < .001), and retrieval (p = .032) significantly contributing to the overall model and the SNST alone accounting for 41.6%; and (b) 85.2% of the variance in RAVLT delayed recall, with consolidation (p < .001) and retrieval (p = .008) significantly contributing to the overall model and the SNST alone accounting for 39.8%. Thus, selective attention, encoding, and consolidation, and to a lesser extent of retrieval, influenced both immediate and delayed verbal free recall. Concluding, selective attention and the memory processes of encoding, consolidation, and retrieval should be considered while interpreting patients' impaired free recall. (JINS, 2012, 18, 1-10).
ERIC Educational Resources Information Center
And Others; Werts, Charles E.
1979-01-01
It is shown how partial covariance, part and partial correlation, and regression weights can be estimated and tested for significance by means of a factor analytic model. Comparable partial covariance, correlations, and regression weights have identical significance tests. (Author)
Togashi, K; Lin, C Y
2008-07-01
The objective of this study was to compare 6 selection criteria in terms of 3-parity total milk yield and 9 selection criteria in terms of total net merit (H) comprising 3-parity total milk yield and total lactation persistency. The 6 selection criteria compared were as follows: first-parity milk estimated breeding value (EBV; M1), first 2-parity milk EBV (M2), first 3-parity milk EBV (M3), first-parity eigen index (EI(1)), first 2-parity eigen index (EI(2)), and first 3-parity eigen index (EI(3)). The 9 selection criteria compared in terms of H were M1, M2, M3, EI(1), EI(2), EI(3), and first-parity, first 2-parity, and first 3-parity selection indices (I(1), I(2), and I(3), respectively). In terms of total milk yield, selection on M3 or EI(3) achieved the greatest genetic response, whereas selection on EI(1) produced the largest genetic progress per day. In terms of total net merit, selection on I(3) brought the largest response, whereas selection EI(1) yielded the greatest genetic progress per day. A multiple-lactation random regression test-day model simultaneously yields the EBV of the 3 lactations for all animals included in the analysis even though the younger animals do not have the opportunity to complete the first 3 lactations. It is important to use the first 3 lactation EBV for selection decision rather than only the first lactation EBV in spite of the fact that the first-parity selection criteria achieved a faster genetic progress per day than the 3-parity selection criteria. Under a multiple-lactation random regression animal model analysis, the use of the first 3 lactation EBV for selection decision does not prolong the generation interval as compared with the use of only the first lactation EBV. Thus, it is justified to compare genetic response on a lifetime basis rather than on a per-day basis. The results suggest the use of M3 or EI(3) for genetic improvement of total milk yield and the use of I(3) for genetic improvement of total net merit H. Although this study deals with selection for 3-parity milk production, the same principle applies to selection for lifetime milk production.
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
[Establishment of cervical vertebral skeletal maturation of female children in Shanghai].
Sun, Yan; Chen, Rong-jing; Yu, Quan; Fan, Li; Chen, Wei; Shen, Gang
2009-06-01
To establish a method for quantitatively evaluating skeletal maturation of cervical vertebrae of female children in Shanghai. The samples were selected from lateral cephalometric radiographs of 240 Shanghai girls, aged 8 to 15 years. The parameters were measured to indicate the morphological changes of the third (C3) and fourth (C4) vertebrae in width, height and the depth of the inferior curvature. The independent-sample t test and stepwise multiple regression analysis were used to estimate the growth status and the ratios of C3, C4 cervical vertebrae by SPSS 15.0 software package. The physical and morphological contour of C3, C4 cervical vertebrae increased proportionately with the increment of age. The regression formula for indicating cervical vertebral skeletal age of female children in Shanghai was expressed by the equation Y= -5.696+8.010 AH3/AP3+6.654 AH3/H3+6.045AH4/PH4 (r=0.912). The regression formula resulted from morphological measurements quantitatively indicates the skeletal maturation of cervical vertebrae of female children in Shanghai.
Landslide Hazard Mapping in Rwanda Using Logistic Regression
NASA Astrophysics Data System (ADS)
Piller, A.; Anderson, E.; Ballard, H.
2015-12-01
Landslides in the United States cause more than $1 billion in damages and 50 deaths per year (USGS 2014). Globally, figures are much more grave, yet monitoring, mapping and forecasting of these hazards are less than adequate. Seventy-five percent of the population of Rwanda earns a living from farming, mostly subsistence. Loss of farmland, housing, or life, to landslides is a very real hazard. Landslides in Rwanda have an impact at the economic, social, and environmental level. In a developing nation that faces challenges in tracking, cataloging, and predicting the numerous landslides that occur each year, satellite imagery and spatial analysis allow for remote study. We have focused on the development of a landslide inventory and a statistical methodology for assessing landslide hazards. Using logistic regression on approximately 30 test variables (i.e. slope, soil type, land cover, etc.) and a sample of over 200 landslides, we determine which variables are statistically most relevant to landslide occurrence in Rwanda. A preliminary predictive hazard map for Rwanda has been produced, using the variables selected from the logistic regression analysis.
Monthly streamflow forecasting with auto-regressive integrated moving average
NASA Astrophysics Data System (ADS)
Nasir, Najah; Samsudin, Ruhaidah; Shabri, Ani
2017-09-01
Forecasting of streamflow is one of the many ways that can contribute to better decision making for water resource management. The auto-regressive integrated moving average (ARIMA) model was selected in this research for monthly streamflow forecasting with enhancement made by pre-processing the data using singular spectrum analysis (SSA). This study also proposed an extension of the SSA technique to include a step where clustering was performed on the eigenvector pairs before reconstruction of the time series. The monthly streamflow data of Sungai Muda at Jeniang, Sungai Muda at Jambatan Syed Omar and Sungai Ketil at Kuala Pegang was gathered from the Department of Irrigation and Drainage Malaysia. A ratio of 9:1 was used to divide the data into training and testing sets. The ARIMA, SSA-ARIMA and Clustered SSA-ARIMA models were all developed in R software. Results from the proposed model are then compared to a conventional auto-regressive integrated moving average model using the root-mean-square error and mean absolute error values. It was found that the proposed model can outperform the conventional model.
Linear regression analysis: part 14 of a series on evaluation of scientific publications.
Schneider, Astrid; Hommel, Gerhard; Blettner, Maria
2010-11-01
Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
What Physical Fitness Component Is Most Closely Associated With Adolescents' Blood Pressure?
Nunes, Heloyse E G; Alves, Carlos A S; Gonçalves, Eliane C A; Silva, Diego A S
2017-12-01
This study aimed to determine which of four selected physical fitness variables, would be most associated with blood pressure changes (systolic and diastolic) in a large sample of adolescents. This was a descriptive and cross-sectional, epidemiological study of 1,117 adolescents aged 14-19 years from southern Brazil. Systolic and diastolic blood pressure were measured by a digital pressure device, and the selected physical fitness variables were body composition (body mass index), flexibility (sit-and-reach test), muscle strength/resistance (manual dynamometer), and aerobic fitness (Modified Canadian Aerobic Fitness Test). Simple and multiple linear regression analyses revealed that aerobic fitness and muscle strength/resistance best explained variations in systolic blood pressure for boys (17.3% and 7.4% of variance) and girls (7.4% of variance). Aerobic fitness, body composition, and muscle strength/resistance are all important indicators of blood pressure control, but aerobic fitness was a stronger predictor of systolic blood pressure in boys and of diastolic blood pressure in both sexes.
Wang, Yonghua; Li, Yan; Wang, Bin
2007-01-01
Nicotine and a variety of other drugs and toxins are metabolized by cytochrome P450 (CYP) 2A6. The aim of the present study was to build a quantitative structure-activity relationship (QSAR) model to predict the activities of nicotine analogues on CYP2A6. Kernel partial least squares (K-PLS) regression was employed with the electro-topological descriptors to build the computational models. Both the internal and external predictabilities of the models were evaluated with test sets to ensure their validity and reliability. As a comparison to K-PLS, a standard PLS algorithm was also applied on the same training and test sets. Our results show that the K-PLS produced reasonable results that outperformed the PLS model on the datasets. The obtained K-PLS model will be helpful for the design of novel nicotine-like selective CYP2A6 inhibitors.
Lorenz, David L.; Sanocki, Chris A.; Kocian, Matthew J.
2010-01-01
Knowledge of the peak flow of floods of a given recurrence interval is essential for regulation and planning of water resources and for design of bridges, culverts, and dams along Minnesota's rivers and streams. Statistical techniques are needed to estimate peak flow at ungaged sites because long-term streamflow records are available at relatively few places. Because of the need to have up-to-date peak-flow frequency information in order to estimate peak flows at ungaged sites, the U.S. Geological Survey (USGS) conducted a peak-flow frequency study in cooperation with the Minnesota Department of Transportation and the Minnesota Pollution Control Agency. Estimates of peak-flow magnitudes for 1.5-, 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence intervals are presented for 330 streamflow-gaging stations in Minnesota and adjacent areas in Iowa and South Dakota based on data through water year 2005. The peak-flow frequency information was subsequently used in regression analyses to develop equations relating peak flows for selected recurrence intervals to various basin and climatic characteristics. Two statistically derived techniques-regional regression equation and region of influence regression-can be used to estimate peak flow on ungaged streams smaller than 3,000 square miles in Minnesota. Regional regression equations were developed for selected recurrence intervals in each of six regions in Minnesota: A (northwestern), B (north central and east central), C (northeastern), D (west central and south central), E (southwestern), and F (southeastern). The regression equations can be used to estimate peak flows at ungaged sites. The region of influence regression technique dynamically selects streamflow-gaging stations with characteristics similar to a site of interest. Thus, the region of influence regression technique allows use of a potentially unique set of gaging stations for estimating peak flow at each site of interest. Two methods of selecting streamflow-gaging stations, similarity and proximity, can be used for the region of influence regression technique. The regional regression equation technique is the preferred technique as an estimate of peak flow in all six regions for ungaged sites. The region of influence regression technique is not appropriate for regions C, E, and F because the interrelations of some characteristics of those regions do not agree with the interrelations throughout the rest of the State. Both the similarity and proximity methods for the region of influence technique can be used in the other regions (A, B, and D) to provide additional estimates of peak flow. The peak-flow-frequency estimates and basin characteristics for selected streamflow-gaging stations and regional peak-flow regression equations are included in this report.
Chen, Lidong; Basu, Anup; Zhang, Maojun; Wang, Wei; Liu, Yu
2014-03-20
A complementary catadioptric imaging technique was proposed to solve the problem of low and nonuniform resolution in omnidirectional imaging. To enhance this research, our paper focuses on how to generate a high-resolution panoramic image from the captured omnidirectional image. To avoid the interference between the inner and outer images while fusing the two complementary views, a cross-selection kernel regression method is proposed. First, in view of the complementarity of sampling resolution in the tangential and radial directions between the inner and the outer images, respectively, the horizontal gradients in the expected panoramic image are estimated based on the scattered neighboring pixels mapped from the outer, while the vertical gradients are estimated using the inner image. Then, the size and shape of the regression kernel are adaptively steered based on the local gradients. Furthermore, the neighboring pixels in the next interpolation step of kernel regression are also selected based on the comparison between the horizontal and vertical gradients. In simulation and real-image experiments, the proposed method outperforms existing kernel regression methods and our previous wavelet-based fusion method in terms of both visual quality and objective evaluation.
Creasy, John M; Midya, Abhishek; Chakraborty, Jayasree; Adams, Lauryn B; Gomes, Camilla; Gonen, Mithat; Seastedt, Kenneth P; Sutton, Elizabeth J; Cercek, Andrea; Kemeny, Nancy E; Shia, Jinru; Balachandran, Vinod P; Kingham, T Peter; Allen, Peter J; DeMatteo, Ronald P; Jarnagin, William R; D'Angelica, Michael I; Do, Richard K G; Simpson, Amber L
2018-06-19
This study investigates whether quantitative image analysis of pretreatment CT scans can predict volumetric response to chemotherapy for patients with colorectal liver metastases (CRLM). Patients treated with chemotherapy for CRLM (hepatic artery infusion (HAI) combined with systemic or systemic alone) were included in the study. Patients were imaged at baseline and approximately 8 weeks after treatment. Response was measured as the percentage change in tumour volume from baseline. Quantitative imaging features were derived from the index hepatic tumour on pretreatment CT, and features statistically significant on univariate analysis were included in a linear regression model to predict volumetric response. The regression model was constructed from 70% of data, while 30% were reserved for testing. Test data were input into the trained model. Model performance was evaluated with mean absolute prediction error (MAPE) and R 2 . Clinicopatholologic factors were assessed for correlation with response. 157 patients were included, split into training (n = 110) and validation (n = 47) sets. MAPE from the multivariate linear regression model was 16.5% (R 2 = 0.774) and 21.5% in the training and validation sets, respectively. Stratified by HAI utilisation, MAPE in the validation set was 19.6% for HAI and 25.1% for systemic chemotherapy alone. Clinical factors associated with differences in median tumour response were treatment strategy, systemic chemotherapy regimen, age and KRAS mutation status (p < 0.05). Quantitative imaging features extracted from pretreatment CT are promising predictors of volumetric response to chemotherapy in patients with CRLM. Pretreatment predictors of response have the potential to better select patients for specific therapies. • Colorectal liver metastases (CRLM) are downsized with chemotherapy but predicting the patients that will respond to chemotherapy is currently not possible. • Heterogeneity and enhancement patterns of CRLM can be measured with quantitative imaging. • Prediction model constructed that predicts volumetric response with 20% error suggesting that quantitative imaging holds promise to better select patients for specific treatments.
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press. With a CD: data, software, guides. (2009). 2. Kanevski M. Spatial Predictions of Soil Contamination Using General Regression Neural Networks. Systems Research and Information Systems, Volume 8, number 4, 1999. 3. Robert S., Foresti L., Kanevski M. Spatial prediction of monthly wind speeds in complex terrain with adaptive general regression neural networks. International Journal of Climatology, 33 pp. 1793-1804, 2013.
Kennedy, Jeffrey R.; Paretti, Nicholas V.; Veilleux, Andrea G.
2014-01-01
Regression equations, which allow predictions of n-day flood-duration flows for selected annual exceedance probabilities at ungaged sites, were developed using generalized least-squares regression and flood-duration flow frequency estimates at 56 streamgaging stations within a single, relatively uniform physiographic region in the central part of Arizona, between the Colorado Plateau and Basin and Range Province, called the Transition Zone. Drainage area explained most of the variation in the n-day flood-duration annual exceedance probabilities, but mean annual precipitation and mean elevation were also significant variables in the regression models. Standard error of prediction for the regression equations varies from 28 to 53 percent and generally decreases with increasing n-day duration. Outside the Transition Zone there are insufficient streamgaging stations to develop regression equations, but flood-duration flow frequency estimates are presented at select streamgaging stations.
Image quality (IQ) guided multispectral image compression
NASA Astrophysics Data System (ADS)
Zheng, Yufeng; Chen, Genshe; Wang, Zhonghai; Blasch, Erik
2016-05-01
Image compression is necessary for data transportation, which saves both transferring time and storage space. In this paper, we focus on our discussion on lossy compression. There are many standard image formats and corresponding compression algorithms, for examples, JPEG (DCT -- discrete cosine transform), JPEG 2000 (DWT -- discrete wavelet transform), BPG (better portable graphics) and TIFF (LZW -- Lempel-Ziv-Welch). The image quality (IQ) of decompressed image will be measured by numerical metrics such as root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural Similarity (SSIM) Index. Given an image and a specified IQ, we will investigate how to select a compression method and its parameters to achieve an expected compression. Our scenario consists of 3 steps. The first step is to compress a set of interested images by varying parameters and compute their IQs for each compression method. The second step is to create several regression models per compression method after analyzing the IQ-measurement versus compression-parameter from a number of compressed images. The third step is to compress the given image with the specified IQ using the selected compression method (JPEG, JPEG2000, BPG, or TIFF) according to the regressed models. The IQ may be specified by a compression ratio (e.g., 100), then we will select the compression method of the highest IQ (SSIM, or PSNR). Or the IQ may be specified by a IQ metric (e.g., SSIM = 0.8, or PSNR = 50), then we will select the compression method of the highest compression ratio. Our experiments tested on thermal (long-wave infrared) images (in gray scales) showed very promising results.
Factors associated with abnormal eating attitudes among Greek adolescents.
Bilali, Aggeliki; Galanis, Petros; Velonakis, Emmanuel; Katostaras, Theofanis
2010-01-01
To estimate the prevalence of abnormal eating attitudes among Greek adolescents and identify possible risk factors associated with these attitudes. Cross-sectional, school-based study. Six randomly selected schools in Patras, southern Greece. The study population consisted of 540 Greek students aged 13-18 years, and the response rate was 97%. The dependent variable was scores on the Eating Attitudes Test-26, with scores > or = 20 indicating abnormal eating attitudes. Bivariate analysis included independent Student t test, chi-square test, and Fisher's exact test. Multivariate logistic regression analysis was applied for the identification of the predictive factors, which were associated independently with abnormal eating attitudes. A 2-sided P value of less than .05 was considered statistically significant. The prevalence of abnormal eating attitudes was 16.7%. Multivariate logistic regression analysis demonstrated that females, urban residents, and those with a body mass index outside normal range, a perception of being overweight, body dissatisfaction, and a family member on a diet were independently related to abnormal eating attitudes. The results indicate that a proportion of Greek adolescents report abnormal eating attitudes and suggest that multiple factors contribute to the development of these attitudes. These findings are useful for further research into this topic and would be valuable in designing preventive interventions. Copyright 2010 Society for Nutrition Education. Published by Elsevier Inc. All rights reserved.
A Powerful Test for Comparing Multiple Regression Functions.
Maity, Arnab
2012-09-01
In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).
Nishii, Takashi; Genkawa, Takuma; Watari, Masahiro; Ozaki, Yukihiro
2012-01-01
A new selection procedure of an informative near-infrared (NIR) region for regression model building is proposed that uses an online NIR/mid-infrared (mid-IR) dual-region spectrometer in conjunction with two-dimensional (2D) NIR/mid-IR heterospectral correlation spectroscopy. In this procedure, both NIR and mid-IR spectra of a liquid sample are acquired sequentially during a reaction process using the NIR/mid-IR dual-region spectrometer; the 2D NIR/mid-IR heterospectral correlation spectrum is subsequently calculated from the obtained spectral data set. From the calculated 2D spectrum, a NIR region is selected that includes bands of high positive correlation intensity with mid-IR bands assigned to the analyte, and used for the construction of a regression model. To evaluate the performance of this procedure, a partial least-squares (PLS) regression model of the ethanol concentration in a fermentation process was constructed. During fermentation, NIR/mid-IR spectra in the 10000 - 1200 cm(-1) region were acquired every 3 min, and a 2D NIR/mid-IR heterospectral correlation spectrum was calculated to investigate the correlation intensity between the NIR and mid-IR bands. NIR regions that include bands at 4343, 4416, 5778, 5904, and 5955 cm(-1), which result from the combinations and overtones of the C-H group of ethanol, were selected for use in the PLS regression models, by taking the correlation intensity of a mid-IR band at 2985 cm(-1) arising from the CH(3) asymmetric stretching vibration mode of ethanol as a reference. The predicted results indicate that the ethanol concentrations calculated from the PLS regression models fit well to those obtained by high-performance liquid chromatography. Thus, it can be concluded that the selection procedure using the NIR/mid-IR dual-region spectrometer combined with 2D NIR/mid-IR heterospectral correlation spectroscopy is a powerful method for the construction of a reliable regression model.
Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B
2016-09-01
Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that genetic gain for body weight can be achieved by selection. Also, selection for body weight at 42 days of age can be maintained as a selection criterion. © 2016 Poultry Science Association Inc.
Lin, Kao-Chang; Huang, Po-Chang; Yeh, Poh-Shiow; Kuo, Jinn-Rung; Ke, Der-Shin
2010-12-01
Polychlorinated biphenyls (PCB)/polychlorinated dibenzofurans (PCDF) are known to affect central nervous functioning. In recent studies, elderly patients who have been exposed to these have been noted to have psychological deficits. There is little known about which test is sensitive to neurotoxins in cognitive evaluation. The objective of the present study was to compare the significance between selective psychological tests in cognitive assessment in PCB-laden elderly. A retrospective PCB/PCDF exposed cohort was observed. Exposed elderly aged ≥ 60 years and registered in Central Health Administration were enrolled, and similar age- and sex-matched subjects served as non-exposed controls. The Mini-Mental State Examination (MMSE) and Attention and Digit Span (ADS) were tested in both groups. Student's t-test, χ(2) -test and linear regression models were used for statistical analysis. A total of 165 exposed patients and 151 controls were analyzed. The exposed group included 49% men, a mean age of 69.3 ± 6.4 years and an education level of 4.0 ± 3.9 years. The controls included 52% men, a mean age of 69.9 ± 5.5 years and an education level of 4.5 ± 3.2 years. There was no statistical difference in MMSE before and after adjusting for the confounding variables of age, sex and education (P= 0.16 vs P= 0.12). However, ADS-forward and ADS-total scores showed a significant decline in the exposed subjects (P= 0.0001 vs P= 0.001). Using a linear regression among stratified PCB and cognitive functioning (≤30 ppb; 31-89; ≥90), a dose effect was found at the medium (31-89 ppb) and high exposure (≧90 ppb) levels. Our observations showed attention and short-term memory were impaired in PCB-laden elderly patients. Higher exposure level showed lower cognitive functioning in ADS. The MMSE was insensitive to neurotoxins. The present study shows that the selective test has a decisive role in toxic-related cognitive assessments. © 2010 The Authors. Psychogeriatrics © 2010 Japanese Psychogeriatric Society.
Olson, Scott A.; with a section by Veilleux, Andrea G.
2014-01-01
This report provides estimates of flood discharges at selected annual exceedance probabilities (AEPs) for streamgages in and adjacent to Vermont and equations for estimating flood discharges at AEPs of 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent (recurrence intervals of 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-years, respectively) for ungaged, unregulated, rural streams in Vermont. The equations were developed using generalized least-squares regression. Flood-frequency and drainage-basin characteristics from 145 streamgages were used in developing the equations. The drainage-basin characteristics used as explanatory variables in the regression equations include drainage area, percentage of wetland area, and the basin-wide mean of the average annual precipitation. The average standard errors of prediction for estimating the flood discharges at the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent AEP with these equations are 34.9, 36.0, 38.7, 42.4, 44.9, 47.3, 50.7, and 55.1 percent, respectively. Flood discharges at selected AEPs for streamgages were computed by using the Expected Moments Algorithm. To improve estimates of the flood discharges for given exceedance probabilities at streamgages in Vermont, a new generalized skew coefficient was developed. The new generalized skew for the region is a constant, 0.44. The mean square error of the generalized skew coefficient is 0.078. This report describes a technique for using results from the regression equations to adjust an AEP discharge computed from a streamgage record. This report also describes a technique for using a drainage-area adjustment to estimate flood discharge at a selected AEP for an ungaged site upstream or downstream from a streamgage. The final regression equations and the flood-discharge frequency data used in this study will be available in StreamStats. StreamStats is a World Wide Web application providing automated regression-equation solutions for user-selected sites on streams.
Variable Selection for Regression Models of Percentile Flows
NASA Astrophysics Data System (ADS)
Fouad, G.
2017-12-01
Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high degree of multicollinearity, possibly illustrating the co-evolution of climatic and physiographic conditions. Given the ineffectiveness of many variables used here, future work should develop new variables that target specific processes associated with percentile flows.
Mohammad, Khandoker Akib; Fatima-Tuz-Zahura, Most; Bari, Wasimul
2017-01-28
The cause-specific under-five mortality of Bangladesh has been studied by fitting cumulative incidence function (CIF) based Fine and Gray competing risk regression model (1999). For the purpose of analysis, Bangladesh Demographic and Health Survey (BDHS), 2011 data set was used. Three types of mode of mortality for the under-five children are considered. These are disease, non-disease and other causes. Product-Limit survival probabilities for the under-five child mortality with log-rank test were used to select a set of covariates for the regression model. The covariates found to have significant association in bivariate analysis were only considered in the regression analysis. Potential determinants of under-five child mortality due to disease is size of child at birth, while gender of child, NGO (non-government organization) membership of mother, mother's education level, and size of child at birth are due to non-disease and age of mother at birth, NGO membership of mother, and mother's education level are for the mortality due to other causes. Female participation in the education programs needs to be increased because of the improvement of child health and government should arrange family and social awareness programs as well as health related programs for women so that they are aware of their child health.
Pan, Yue; Liu, Hongmei; Metsch, Lisa R; Feaster, Daniel J
2017-02-01
HIV testing is the foundation for consolidated HIV treatment and prevention. In this study, we aim to discover the most relevant variables for predicting HIV testing uptake among substance users in substance use disorder treatment programs by applying random forest (RF), a robust multivariate statistical learning method. We also provide a descriptive introduction to this method for those who are unfamiliar with it. We used data from the National Institute on Drug Abuse Clinical Trials Network HIV testing and counseling study (CTN-0032). A total of 1281 HIV-negative or status unknown participants from 12 US community-based substance use disorder treatment programs were included and were randomized into three HIV testing and counseling treatment groups. The a priori primary outcome was self-reported receipt of HIV test results. Classification accuracy of RF was compared to logistic regression, a standard statistical approach for binary outcomes. Variable importance measures for the RF model were used to select the most relevant variables. RF based models produced much higher classification accuracy than those based on logistic regression. Treatment group is the most important predictor among all covariates, with a variable importance index of 12.9%. RF variable importance revealed that several types of condomless sex behaviors, condom use self-efficacy and attitudes towards condom use, and level of depression are the most important predictors of receipt of HIV testing results. There is a non-linear negative relationship between count of condomless sex acts and the receipt of HIV testing. In conclusion, RF seems promising in discovering important factors related to HIV testing uptake among large numbers of predictors and should be encouraged in future HIV prevention and treatment research and intervention program evaluations.
Shafiq, Ali; Brawner, Clinton A; Aldred, Heather A; Lewis, Barry; Williams, Celeste T; Tita, Christina; Schairer, John R; Ehrman, Jonathan K; Velez, Mauricio; Selektor, Yelena; Lanfear, David E; Keteyian, Steven J
2016-04-01
Although cardiopulmonary exercise (CPX) testing in patients with heart failure and reduced ejection fraction is well established, there are limited data on the value of CPX variables in patients with HF and preserved ejection fraction (HFpEF). We sought to determine the prognostic value of select CPX measures in patients with HFpEF. This was a retrospective analysis of patients with HFpEF (ejection fraction ≥ 50%) who performed a CPX test between 1997 and 2010. Selected CPX variables included peak oxygen uptake (VO2), percent predicted maximum oxygen uptake (ppMVO2), minute ventilation to carbon dioxide production slope (VE/VCO2 slope) and exercise oscillatory ventilation (EOV). Separate Cox regression analyses were performed to assess the relationship between each CPX variable and a composite outcome of all-cause mortality or cardiac transplant. We identified 173 HFpEF patients (45% women, 58% non-white, age 54 ± 14 years) with complete CPX data. During a median follow-up of 5.2 years, there were 42 deaths and 5 cardiac transplants. The 1-, 3-, and 5-year cumulative event-free survival was 96%, 90%, and 82%, respectively. Based on the Wald statistic from the Cox regression analyses adjusted for age, sex, and β-blockade therapy, ppMVO2 was the strongest predictor of the end point (Wald χ(2) = 15.0, hazard ratio per 10%, P < .001), followed by peak VO2 (Wald χ(2) = 11.8, P = .001). VE/VCO2 slope (Wald χ(2)= 0.4, P = .54) and EOV (Wald χ(2) = 0.15, P = .70) had no significant association to the composite outcome. These data support the prognostic utility of peak VO2 and ppMVO2 in patients with HFpEF. Additional studies are needed to define optimal cut points to identify low- and high-risk patients. Copyright © 2016 Elsevier Inc. All rights reserved.
Chen, Yan; Xiao, Huangmeng; Zhou, Xieda; Huang, Xiaoyu; Li, Yanbing; Xiao, Haipeng; Cao, Xiaopei
2017-10-01
Various studies have validated plasma free metanephrines (MNs) as biomarkers for pheochromocytoma and paraganglioma (PPGL). This meta-analysis aimed to estimate the overall diagnostic accuracy of this biochemical test for PPGL. We searched the PubMed, the Cochrane Library, Web of Science, Embase, Scopus, OvidSP, and ProQuest Dissertations & Theses databases from January 1, 1995 to December 2, 2016 and selected studies written in English that assessed plasma free MNs in the diagnosis of PPGL. Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) was used to evaluate the quality of the included studies. We calculated pooled sensitivities, specificities, positive and negative likelihood ratios, diagnostic odds ratios (DORs) and areas under curve (AUCs) with their 95% confidence intervals (95% CIs). Heterogeneity was assessed by I 2 . To identify the source of heterogeneity, we evaluated the threshold effect and performed a meta-regression. Deeks' funnel plot was selected for investigating any potential publication bias. Although the combination of metanephrine (MN) and normetanephrine (NMN) carried lower specificity (0.94, 95% CI 0.90-0.97) than NMN (0.97, 95% CI 0.92-0.99), NMN was generally more accurate than individual tests, with the highest AUC (0.99, 95% CI 0.97-0.99), DOR (443.35, 95% CI 216.9-906.23), and pooled sensitivity (0.97, 95% CI 0.94-0.98) values. Threshold effect and meta-regression analyses showed that different cut-offs, blood sampling positions, study types and test methods contributed to heterogeneity. This meta-analysis suggested an effective value for combined plasma free MNs for the diagnosis of PPGL, but testing for MNs requires more standardization using tightly regulated studies. AUC = area under curve; CI = confidence interval; DOR = diagnostic odds ratio; EIA = enzyme immunoassay; LC-ECD = liquid chromatography-electrochemical detection; LC-MS/MS = liquid chromatography-tandem mass spectrometry; MN = metanephrine; NMN = normetaneprhine; PPGL = pheochromocytoma and paraganglioma; QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies 2.
NASA Technical Reports Server (NTRS)
Colwell, R. N. (Principal Investigator)
1983-01-01
The geometric quality of the TM and MSS film products were evaluated by making selective photo measurements such as scale, linear and area determinations; and by measuring the coordinates of known features on both the film products and map products and then relating these paired observations using a standard linear least squares regression approach. Quantitative interpretation tests are described which evaluate the quality and utility of the TM film products and various band combinations for detecting and identifying important forest and agricultural features.
An Overview of the JPSS Ground Project Algorithm Integration Process
NASA Astrophysics Data System (ADS)
Vicente, G. A.; Williams, R.; Dorman, T. J.; Williamson, R. C.; Shaw, F. J.; Thomas, W. M.; Hung, L.; Griffin, A.; Meade, P.; Steadley, R. S.; Cember, R. P.
2015-12-01
The smooth transition, implementation and operationalization of scientific software's from the National Oceanic and Atmospheric Administration (NOAA) development teams to the Join Polar Satellite System (JPSS) Ground Segment requires a variety of experiences and expertise. This task has been accomplished by a dedicated group of scientist and engineers working in close collaboration with the NOAA Satellite and Information Services (NESDIS) Center for Satellite Applications and Research (STAR) science teams for the JPSS/Suomi-NPOES Preparatory Project (S-NPP) Advanced Technology Microwave Sounder (ATMS), Cross-track Infrared Sounder (CrIS), Visible Infrared Imaging Radiometer Suite (VIIRS) and Ozone Mapping and Profiler Suite (OMPS) instruments. The presentation purpose is to describe the JPSS project process for algorithm implementation from the very early delivering stages by the science teams to the full operationalization into the Interface Processing Segment (IDPS), the processing system that provides Environmental Data Records (EDR's) to NOAA. Special focus is given to the NASA Data Products Engineering and Services (DPES) Algorithm Integration Team (AIT) functional and regression test activities. In the functional testing phase, the AIT uses one or a few specific chunks of data (granules) selected by the NOAA STAR Calibration and Validation (cal/val) Teams to demonstrate that a small change in the code performs properly and does not disrupt the rest of the algorithm chain. In the regression testing phase, the modified code is placed into to the Government Resources for Algorithm Verification, Integration, Test and Evaluation (GRAVITE) Algorithm Development Area (ADA), a simulated and smaller version of the operational IDPS. Baseline files are swapped out, not edited and the whole code package runs in one full orbit of Science Data Records (SDR's) using Calibration Look Up Tables (Cal LUT's) for the time of the orbit. The purpose of the regression test is to identify unintended outcomes. Overall the presentation provides a general and easy to follow overview of the JPSS Algorithm Change Process (ACP) and is intended to facility the audience understanding of a very extensive and complex process.
Omnibus Risk Assessment via Accelerated Failure Time Kernel Machine Modeling
Sinnott, Jennifer A.; Cai, Tianxi
2013-01-01
Summary Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai et al., 2011). In this paper, we derive testing and prediction methods for KM regression under the accelerated failure time model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. PMID:24328713
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Hissbach, Johanna; Feddersen, Lena; Sehner, Susanne; Hampe, Wolfgang
2012-01-01
Aims: Tests with natural-scientific content are predictive of the success in the first semesters of medical studies. Some universities in the German speaking countries use the ‘Test for medical studies’ (TMS) for student selection. One of its test modules, namely “medical and scientific comprehension”, measures the ability for deductive reasoning. In contrast, the Hamburg Assessment Test for Medicine, Natural Sciences (HAM-Nat) evaluates knowledge in natural sciences. In this study the predictive power of the HAM-Nat test will be compared to that of the NatDenk test, which is similar to the TMS module “medical and scientific comprehension” in content and structure. Methods: 162 medical school beginners volunteered to complete either the HAM-Nat (N=77) or the NatDenk test (N=85) in 2007. Until spring 2011, 84.2% of these successfully completed the first part of the medical state examination in Hamburg. Via different logistic regression models we tested the predictive power of high school grade point average (GPA or “Abiturnote”) and the test results (HAM-Nat and NatDenk) with regard to the study success criterion “first part of the medical state examination passed successfully up to the end of the 7th semester” (Success7Sem). The Odds Ratios (OR) for study success are reported. Results: For both test groups a significant correlation existed between test results and study success (HAM-Nat: OR=2.07; NatDenk: OR=2.58). If both admission criteria are estimated in one model, the main effects (GPA: OR=2.45; test: OR=2.32) and their interaction effect (OR=1.80) are significant in the HAM-Nat test group, whereas in the NatDenk test group only the test result (OR=2.21) significantly contributes to the variance explained. Conclusions: On their own both HAM-Nat and NatDenk have predictive power for study success, but only the HAM-Nat explains additional variance if combined with GPA. The selection according to HAM-Nat and GPA has under the current circumstances of medical school selection (many good applicants and only a limited number of available spaces) the highest predictive power of all models. PMID:23255967
A Practical Guide to Regression Discontinuity
ERIC Educational Resources Information Center
Jacob, Robin; Zhu, Pei; Somers, Marie-Andrée; Bloom, Howard
2012-01-01
Regression discontinuity (RD) analysis is a rigorous nonexperimental approach that can be used to estimate program impacts in situations in which candidates are selected for treatment based on whether their value for a numeric rating exceeds a designated threshold or cut-point. Over the last two decades, the regression discontinuity approach has…
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
A Ranking Approach to Genomic Selection.
Blondel, Mathieu; Onogi, Akio; Iwata, Hiroyoshi; Ueda, Naonori
2015-01-01
Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.
Discrimination of serum Raman spectroscopy between normal and colorectal cancer
NASA Astrophysics Data System (ADS)
Li, Xiaozhou; Yang, Tianyue; Yu, Ting; Li, Siqi
2011-07-01
Raman spectroscopy of tissues has been widely studied for the diagnosis of various cancers, but biofluids were seldom used as the analyte because of the low concentration. Herein, serum of 30 normal people, 46 colon cancer, and 44 rectum cancer patients were measured Raman spectra and analyzed. The information of Raman peaks (intensity and width) and that of the fluorescence background (baseline function coefficients) were selected as parameters for statistical analysis. Principal component regression (PCR) and partial least square regression (PLSR) were used on the selected parameters separately to see the performance of the parameters. PCR performed better than PLSR in our spectral data. Then linear discriminant analysis (LDA) was used on the principal components (PCs) of the two regression method on the selected parameters, and a diagnostic accuracy of 88% and 83% were obtained. The conclusion is that the selected features can maintain the information of original spectra well and Raman spectroscopy of serum has the potential for the diagnosis of colorectal cancer.
Optimized multiple linear mappings for single image super-resolution
NASA Astrophysics Data System (ADS)
Zhang, Kaibing; Li, Jie; Xiong, Zenggang; Liu, Xiuping; Gao, Xinbo
2017-12-01
Learning piecewise linear regression has been recognized as an effective way for example learning-based single image super-resolution (SR) in literature. In this paper, we employ an expectation-maximization (EM) algorithm to further improve the SR performance of our previous multiple linear mappings (MLM) based SR method. In the training stage, the proposed method starts with a set of linear regressors obtained by the MLM-based method, and then jointly optimizes the clustering results and the low- and high-resolution subdictionary pairs for regression functions by using the metric of the reconstruction errors. In the test stage, we select the optimal regressor for SR reconstruction by accumulating the reconstruction errors of m-nearest neighbors in the training set. Thorough experimental results carried on six publicly available datasets demonstrate that the proposed SR method can yield high-quality images with finer details and sharper edges in terms of both quantitative and perceptual image quality assessments.
Use of ocean color scanner data in water quality mapping
NASA Technical Reports Server (NTRS)
Khorram, S.
1981-01-01
Remotely sensed data, in combination with in situ data, are used in assessing water quality parameters within the San Francisco Bay-Delta. The parameters include suspended solids, chlorophyll, and turbidity. Regression models are developed between each of the water quality parameter measurements and the Ocean Color Scanner (OCS) data. The models are then extended to the entire study area for mapping water quality parameters. The results include a series of color-coded maps, each pertaining to one of the water quality parameters, and the statistical analysis of the OCS data and regression models. It is found that concurrently collected OCS data and surface truth measurements are highly useful in mapping the selected water quality parameters and locating areas having relatively high biological activity. In addition, it is found to be virtually impossible, at least within this test site, to locate such areas on U-2 color and color-infrared photography.
Odontological approach to sexual dimorphism in southeastern France.
Lladeres, Emilie; Saliba-Serre, Bérengère; Sastre, Julien; Foti, Bruno; Tardivo, Delphine; Adalian, Pascal
2013-01-01
The aim of this study was to establish a prediction formula to allow for the determination of sex among the southeastern French population using dental measurements. The sample consisted of 105 individuals (57 males and 48 females, aged between 18 and 25 years). Dental measurements were calculated using Euclidean distances, in three-dimensional space, from point coordinates obtained by a Microscribe. A multiple logistic regression analysis was performed to establish the prediction formula. Among 12 selected dental distances, a stepwise logistic regression analysis highlighted the two most significant discriminate predictors of sex: one located at the mandible and the other at the maxilla. A cutpoint was proposed to prediction of true sex. The prediction formula was then tested on a validation sample (20 males and 34 females, aged between 18 and 62 years and with a history of orthodontics or restorative care) to evaluate the accuracy of the method. © 2012 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Luna, Aderval S.; Gonzaga, Fabiano B.; da Rocha, Werickson F. C.; Lima, Igor C. A.
2018-01-01
Laser-induced breakdown spectroscopy (LIBS) analysis was carried out on eleven steel samples to quantify the concentrations of chromium, nickel, and manganese. LIBS spectral data were correlated to known concentrations of the samples using different strategies in partial least squares (PLS) regression models. For the PLS analysis, one predictive model was separately generated for each element, while different approaches were used for the selection of variables (VIP: variable importance in projection and iPLS: interval partial least squares) in the PLS model to quantify the contents of the elements. The comparison of the performance of the models showed that there was no significant statistical difference using the Wilcoxon signed rank test. The elliptical joint confidence region (EJCR) did not detect systematic errors in these proposed methodologies for each metal.
[Multivariate Adaptive Regression Splines (MARS), an alternative for the analysis of time series].
Vanegas, Jairo; Vásquez, Fabián
Multivariate Adaptive Regression Splines (MARS) is a non-parametric modelling method that extends the linear model, incorporating nonlinearities and interactions between variables. It is a flexible tool that automates the construction of predictive models: selecting relevant variables, transforming the predictor variables, processing missing values and preventing overshooting using a self-test. It is also able to predict, taking into account structural factors that might influence the outcome variable, thereby generating hypothetical models. The end result could identify relevant cut-off points in data series. It is rarely used in health, so it is proposed as a tool for the evaluation of relevant public health indicators. For demonstrative purposes, data series regarding the mortality of children under 5 years of age in Costa Rica were used, comprising the period 1978-2008. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Whole-genome regression and prediction methods applied to plant and animal breeding.
de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L
2013-02-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Yates, Janet; James, David
2013-02-26
The UK Clinical Aptitude Test (UKCAT) was introduced in 2006 as an additional tool for the selection of medical students. It tests mental ability in four distinct domains (Verbal Reasoning, Quantitative Reasoning, Abstract Reasoning, and Decision Analysis), and the results are available to students and admission panels in advance of the selection process. Our first study showed little evidence of any predictive validity for performance in the first two years of the Nottingham undergraduate course.The study objective was to determine whether the UKCAT scores had any predictive value for the later parts of the course, largely delivered via clinical placements. Students entering the course in 2007 and who had taken the UKCAT were asked for permission to use their anonymised data in research. The UKCAT scores were incorporated into a database with routine pre-admission socio-demographics and subsequent course performance data. Correlation analysis was followed by hierarchical multivariate linear regression. The original study group comprised 204/254 (80%) of the full entry cohort. With attrition over the five years of the course this fell to 185 (73%) by Year 5. The Verbal Reasoning score and the UKCAT Total score both demonstrated some univariate correlations with clinical knowledge marks, and slightly less with clinical skills. No parts of the UKCAT proved to be an independent predictor of clinical course marks, whereas prior attainment was a highly significant predictor (p <0.001). This study of one cohort of Nottingham medical students showed that UKCAT scores at admission did not independently predict subsequent performance on the course. Whilst the test adds another dimension to the selection process, its fairness and validity in selecting promising students remains unproven, and requires wider investigation and debate by other schools.
Predicting space telerobotic operator training performance from human spatial ability assessment
NASA Astrophysics Data System (ADS)
Liu, Andrew M.; Oman, Charles M.; Galvan, Raquel; Natapoff, Alan
2013-11-01
Our goal was to determine whether existing tests of spatial ability can predict an astronaut's qualification test performance after robotic training. Because training astronauts to be qualified robotics operators is so long and expensive, NASA is interested in tools that can predict robotics performance before training begins. Currently, the Astronaut Office does not have a validated tool to predict robotics ability as part of its astronaut selection or training process. Commonly used tests of human spatial ability may provide such a tool to predict robotics ability. We tested the spatial ability of 50 active astronauts who had completed at least one robotics training course, then used logistic regression models to analyze the correlation between spatial ability test scores and the astronauts' performance in their evaluation test at the end of the training course. The fit of the logistic function to our data is statistically significant for several spatial tests. However, the prediction performance of the logistic model depends on the criterion threshold assumed. To clarify the critical selection issues, we show how the probability of correct classification vs. misclassification varies as a function of the mental rotation test criterion level. Since the costs of misclassification are low, the logistic models of spatial ability and robotic performance are reliable enough only to be used to customize regular and remedial training. We suggest several changes in tracking performance throughout robotics training that could improve the range and reliability of predictive models.
Development of an automated ultrasonic testing system
NASA Astrophysics Data System (ADS)
Shuxiang, Jiao; Wong, Brian Stephen
2005-04-01
Non-Destructive Testing is necessary in areas where defects in structures emerge over time due to wear and tear and structural integrity is necessary to maintain its usability. However, manual testing results in many limitations: high training cost, long training procedure, and worse, the inconsistent test results. A prime objective of this project is to develop an automatic Non-Destructive testing system for a shaft of the wheel axle of a railway carriage. Various methods, such as the neural network, pattern recognition methods and knowledge-based system are used for the artificial intelligence problem. In this paper, a statistical pattern recognition approach, Classification Tree is applied. Before feature selection, a thorough study on the ultrasonic signals produced was carried out. Based on the analysis of the ultrasonic signals, three signal processing methods were developed to enhance the ultrasonic signals: Cross-Correlation, Zero-Phase filter and Averaging. The target of this step is to reduce the noise and make the signal character more distinguishable. Four features: 1. The Auto Regressive Model Coefficients. 2. Standard Deviation. 3. Pearson Correlation 4. Dispersion Uniformity Degree are selected. And then a Classification Tree is created and applied to recognize the peak positions and amplitudes. Searching local maximum is carried out before feature computing. This procedure reduces much computation time in the real-time testing. Based on this algorithm, a software package called SOFRA was developed to recognize the peaks, calibrate automatically and test a simulated shaft automatically. The automatic calibration procedure and the automatic shaft testing procedure are developed.
Torija, Antonio J; Ruiz, Diego P
2015-02-01
The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.
Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H
2018-01-01
Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P < 0.01), and correlated significantly with AF traits (P < 0.05). The best multiple linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.
NASA Technical Reports Server (NTRS)
Laverghetta, A. V.; Shimizu, T.
1999-01-01
The nucleus rotundus is a large thalamic nucleus in birds and plays a critical role in many visual discrimination tasks. In order to test the hypothesis that there are functionally distinct subdivisions in the nucleus rotundus, effects of selective lesions of the nucleus were studied in pigeons. The birds were trained to discriminate between different types of stationary objects and between different directions of moving objects. Multiple regression analyses revealed that lesions in the anterior, but not posterior, division caused deficits in discrimination of small stationary stimuli. Lesions in neither the anterior nor posterior divisions predicted effects in discrimination of moving stimuli. These results are consistent with a prediction led from the hypothesis that the nucleus is composed of functional subdivisions.
Attia, Khalid A M; Nassar, Mohammed W I; El-Zeiny, Mohamed B; Serag, Ahmed
2017-01-05
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration. Copyright © 2016 Elsevier B.V. All rights reserved.
Carrera, D; de la Flor, M; Galera, J; Amillano, K; Gomez, M; Izquierdo, V; Aguilar, E; López, S; Martínez, M; Martínez, S; Serra, J M; Pérez, M; Martin, L
2016-01-01
The aim of our study was to evaluate sentinel lymph node biopsy as a diagnostic test for assessing the presence of residual metastatic axillary lymph nodes after neoadjuvant chemotherapy, replacing the need for a lymphadenectomy in negative selective lymph node biopsy patients. A multicentre, diagnostic validation study was conducted in the province of Tarragona, on women with T1-T3, N1-N2 breast cancer, who presented with a complete axillary response after neoadjuvant chemotherapy. Study procedures consisted of performing an selective lymph node biopsy followed by lymphadenectomy. A total of 53 women were included in the study. Surgical detection rate was 90.5% (no sentinel node found in 5 patients). Histopathological analysis of the lymphadenectomy showed complete disease regression of axillary nodes in 35.4% (17/48) of the patients, and residual axillary node involvement in 64.6% (31/48) of them. In lymphadenectomy positive patients, 28 had a positive selective lymph node biopsy (true positive), while 3 had a negative selective lymph node biopsy (false negative). Of the 28 true selective lymph node biopsy positives, the sentinel node was the only positive node in 10 cases. All lymphadenectomy negative cases were selective lymph node biopsy negative. These data yield a sensitivity of 93.5%, a false negative rate of 9.7%, and a global test efficiency of 93.7%. Selective lymph node biopsy after chemotherapy in patients with a complete axillary response provides valid and reliable information regarding axillary status after neoadjuvant treatment, and might prevent lymphadenectomy in cases with negative selective lymph node biopsy. Copyright © 2016 Elsevier España, S.L.U. and SEMNIM. All rights reserved.
Xie, Heping; Wang, Fuxing; Hao, Yanbin; Chen, Jiaxue; An, Jing; Wang, Yuxin; Liu, Huashan
2017-01-01
Cueing facilitates retention and transfer of multimedia learning. From the perspective of cognitive load theory (CLT), cueing has a positive effect on learning outcomes because of the reduction in total cognitive load and avoidance of cognitive overload. However, this has not been systematically evaluated. Moreover, what remains ambiguous is the direct relationship between the cue-related cognitive load and learning outcomes. A meta-analysis and two subsequent meta-regression analyses were conducted to explore these issues. Subjective total cognitive load (SCL) and scores on a retention test and transfer test were selected as dependent variables. Through a systematic literature search, 32 eligible articles encompassing 3,597 participants were included in the SCL-related meta-analysis. Among them, 25 articles containing 2,910 participants were included in the retention-related meta-analysis and the following retention-related meta-regression, while there were 29 articles containing 3,204 participants included in the transfer-related meta-analysis and the transfer-related meta-regression. The meta-analysis revealed a statistically significant cueing effect on subjective ratings of cognitive load (d = -0.11, 95% CI = [-0.19, -0.02], p < 0.05), retention performance (d = 0.27, 95% CI = [0.08, 0.46], p < 0.01), and transfer performance (d = 0.34, 95% CI = [0.12, 0.56], p < 0.01). The subsequent meta-regression analyses showed that dSCL for cueing significantly predicted dretention for cueing (β = -0.70, 95% CI = [-1.02, -0.38], p < 0.001), as well as dtransfer for cueing (β = -0.60, 95% CI = [-0.92, -0.28], p < 0.001). Thus in line with CLT, adding cues in multimedia materials can indeed reduce SCL and promote learning outcomes, and the more SCL is reduced by cues, the better retention and transfer of multimedia learning.
Hao, Yanbin; Chen, Jiaxue; An, Jing; Wang, Yuxin; Liu, Huashan
2017-01-01
Cueing facilitates retention and transfer of multimedia learning. From the perspective of cognitive load theory (CLT), cueing has a positive effect on learning outcomes because of the reduction in total cognitive load and avoidance of cognitive overload. However, this has not been systematically evaluated. Moreover, what remains ambiguous is the direct relationship between the cue-related cognitive load and learning outcomes. A meta-analysis and two subsequent meta-regression analyses were conducted to explore these issues. Subjective total cognitive load (SCL) and scores on a retention test and transfer test were selected as dependent variables. Through a systematic literature search, 32 eligible articles encompassing 3,597 participants were included in the SCL-related meta-analysis. Among them, 25 articles containing 2,910 participants were included in the retention-related meta-analysis and the following retention-related meta-regression, while there were 29 articles containing 3,204 participants included in the transfer-related meta-analysis and the transfer-related meta-regression. The meta-analysis revealed a statistically significant cueing effect on subjective ratings of cognitive load (d = −0.11, 95% CI = [−0.19, −0.02], p < 0.05), retention performance (d = 0.27, 95% CI = [0.08, 0.46], p < 0.01), and transfer performance (d = 0.34, 95% CI = [0.12, 0.56], p < 0.01). The subsequent meta-regression analyses showed that dSCL for cueing significantly predicted dretention for cueing (β = −0.70, 95% CI = [−1.02, −0.38], p < 0.001), as well as dtransfer for cueing (β = −0.60, 95% CI = [−0.92, −0.28], p < 0.001). Thus in line with CLT, adding cues in multimedia materials can indeed reduce SCL and promote learning outcomes, and the more SCL is reduced by cues, the better retention and transfer of multimedia learning. PMID:28854205
Teutsch, T; Mesch, M; Giessen, H; Tarin, C
2015-01-01
In this contribution, a method to select discrete wavelengths that allow an accurate estimation of the glucose concentration in a biosensing system based on metamaterials is presented. The sensing concept is adapted to the particular application of ophthalmic glucose sensing by covering the metamaterial with a glucose-sensitive hydrogel and the sensor readout is performed optically. Due to the fact that in a mobile context a spectrometer is not suitable, few discrete wavelengths must be selected to estimate the glucose concentration. The developed selection methods are based on nonlinear support vector regression (SVR) models. Two selection methods are compared and it is shown that wavelengths selected by a sequential forward feature selection algorithm achieves an estimation improvement. The presented method can be easily applied to different metamaterial layouts and hydrogel configurations.
Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis
NASA Technical Reports Server (NTRS)
Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)
1979-01-01
The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.
Gotvald, Anthony J.
2017-01-13
The U.S. Geological Survey, in cooperation with the Georgia Department of Natural Resources, Environmental Protection Division, developed regional regression equations for estimating selected low-flow frequency and mean annual flow statistics for ungaged streams in north Georgia that are not substantially affected by regulation, diversions, or urbanization. Selected low-flow frequency statistics and basin characteristics for 56 streamgage locations within north Georgia and 75 miles beyond the State’s borders in Alabama, Tennessee, North Carolina, and South Carolina were combined to form the final dataset used in the regional regression analysis. Because some of the streamgages in the study recorded zero flow, the final regression equations were developed using weighted left-censored regression analysis to analyze the flow data in an unbiased manner, with weights based on the number of years of record. The set of equations includes the annual minimum 1- and 7-day average streamflow with the 10-year recurrence interval (referred to as 1Q10 and 7Q10), monthly 7Q10, and mean annual flow. The final regional regression equations are functions of drainage area, mean annual precipitation, and relief ratio for the selected low-flow frequency statistics and drainage area and mean annual precipitation for mean annual flow. The average standard error of estimate was 13.7 percent for the mean annual flow regression equation and ranged from 26.1 to 91.6 percent for the selected low-flow frequency equations.The equations, which are based on data from streams with little to no flow alterations, can be used to provide estimates of the natural flows for selected ungaged stream locations in the area of Georgia north of the Fall Line. The regression equations are not to be used to estimate flows for streams that have been altered by the effects of major dams, surface-water withdrawals, groundwater withdrawals (pumping wells), diversions, or wastewater discharges. The regression equations should be used only for ungaged sites with drainage areas between 1.67 and 576 square miles, mean annual precipitation between 47.6 and 81.6 inches, and relief ratios between 0.146 and 0.607; these are the ranges of the explanatory variables used to develop the equations. An attempt was made to develop regional regression equations for the area of Georgia south of the Fall Line by using the same approach used during this study for north Georgia; however, the equations resulted with high average standard errors of estimates and poorly predicted flows below 0.5 cubic foot per second, which may be attributed to the karst topography common in that area.The final regression equations developed from this study are planned to be incorporated into the U.S. Geological Survey StreamStats program. StreamStats is a Web-based geographic information system that provides users with access to an assortment of analytical tools useful for water-resources planning and management, and for engineering design applications, such as the design of bridges. The StreamStats program provides streamflow statistics and basin characteristics for U.S. Geological Survey streamgage locations and ungaged sites of interest. StreamStats also can compute basin characteristics and provide estimates of streamflow statistics for ungaged sites when users select the location of a site along any stream in Georgia.
Schutz, Christine M; Dalton, Leanne; Tepe, Rodger E
2013-01-01
This study was designed to extend research on the relationship between chiropractic students' learning and study strategies and national board examination performance. Sixty-nine first trimester chiropractic students self-administered the Learning and Study Strategies Inventory (LASSI). Linear trends tests (for continuous variables) and Mantel-Haenszel trend tests (for categorical variables) were utilized to determine if the 10 LASSI subtests and 3 factors predicted low, medium and high levels of National Board of Chiropractic Examiners (NBCE) Part 1 scores. Multiple regression was performed to predict overall mean NBCE examination scores using the 3 LASSI factors as predictor variables. Four LASSI subtests (Anxiety, Concentration, Selecting Main Ideas, Test Strategies) and one factor (Goal Orientation) were significantly associated with NBCE examination levels. One factor (Goal Orientation) was a significant predictor of overall mean NBCE examination performance. Learning and study strategies are predictive of NBCE Part 1 examination performance in chiropractic students. The current study found LASSI subtests Anxiety, Concentration, Selecting Main Ideas, and Test Strategies, and the Goal-Orientation factor to be significant predictors of NBCE scores. The LASSI may be useful to educators in preparing students for academic success. Further research is warranted to explore the effects of learning and study strategies training on GPA and NBCE performance.
Kothe, Christian; Hissbach, Johanna; Hampe, Wolfgang
2014-01-01
Although some recent studies concluded that dexterity is not a reliable predictor of performance in preclinical laboratory courses in dentistry, they could not disprove earlier findings which confirmed the worth of manual dexterity tests in dental admission. We developed a wire bending test (HAM-Man) which was administered during dental freshmen’s first week in 2008, 2009, and 2010. The purpose of our study was to evaluate if the HAM-Man is a useful selection criterion additional to the high school grade point average (GPA) in dental admission. Regression analysis revealed that GPA only accounted for a maximum of 9% of students’ performance in preclinical laboratory courses, in six out of eight models the explained variance was below 2%. The HAM-Man incrementally explained up to 20.5% of preclinical practical performance over GPA. In line with findings from earlier studies the HAM-Man test of manual dexterity showed satisfactory incremental validity. While GPA has a focus on cognitive abilities, the HAM-Man reflects learning of unfamiliar psychomotor skills, spatial relationships, and dental techniques needed in preclinical laboratory courses. The wire bending test HAM-Man is a valuable additional selection instrument for applicants of dental schools. PMID:24872857
On the reliable and flexible solution of practical subset regression problems
NASA Technical Reports Server (NTRS)
Verhaegen, M. H.
1987-01-01
A new algorithm for solving subset regression problems is described. The algorithm performs a QR decomposition with a new column-pivoting strategy, which permits subset selection directly from the originally defined regression parameters. This, in combination with a number of extensions of the new technique, makes the method a very flexible tool for analyzing subset regression problems in which the parameters have a physical meaning.
Daily values flow comparison and estimates using program HYCOMP, version 1.0
Sanders, Curtis L.
2002-01-01
A method used by the U.S. Geological Survey for quality control in computing daily value flow records is to compare hydrographs of computed flows at a station under review to hydrographs of computed flows at a selected index station. The hydrographs are placed on top of each other (as hydrograph overlays) on a light table, compared, and missing daily flow data estimated. This method, however, is subjective and can produce inconsistent results, because hydrographers can differ when calculating acceptable limits of deviation between observed and estimated flows. Selection of appropriate index stations also is judgemental, giving no consideration to the mathematical correlation between the review station and the index station(s). To address the limitation of the hydrograph overlay method, a set of software programs, written in the SAS macrolanguage, was developed and designated Program HYDCOMP. The program automatically selects statistically comparable index stations by correlation and regression, and performs hydrographic comparisons and estimates of missing data by regressing daily mean flows at the review station against -8 to +8 lagged flows at one or two index stations and day-of-week. Another advantage that HYDCOMP has over the graphical method is that estimated flows, the criteria for determining the quality of the data, and the selection of index stations are determined statistically, and are reproducible from one user to another. HYDCOMP will load the most-correlated index stations into another file containing the ?best index stations,? but will not overwrite stations already in the file. A knowledgeable user should delete unsuitable index stations from this file based on standard error of estimate, hydrologic similarity of candidate index stations to the review station, and knowledge of the individual station characteristics. Also, the user can add index stations not selected by HYDCOMP, if desired. Once the file of best-index stations is created, a user may do hydrographic comparison and data estimates by entering the number of the review station, selecting an index station, and specifying the periods to be used for regression and plotting. For example, the user can restrict the regression to ice-free periods of the year to exclude flows estimated during iced conditions. However, the regression could still be used to estimate flow during iced conditions. HYDCOMP produces the standard error of estimate as a measure of the central scatter of the regression and R-square (coefficient of determination) for evaluating the accuracy of the regression. Output from HYDCOMP includes plots of percent residuals against (1) time within the regression and plot periods, (2) month and day of the year for evaluating seasonal bias in the regression, and (3) the magnitude of flow. For hydrographic comparisons, it plots 2-month segments of hydrographs over the selected plot period showing the observed flows, the regressed flows, the 95 percent confidence limit flows, flow measurements, and regression limits. If the observed flows at the review station remain outside the 95 percent confidence limits for a prolonged period, there may be some error in the flows at the review station or at the index station(s). In addition, daily minimum and maximum temperatures and daily rainfall are shown on the hydrographs, if available, to help indicate whether an apparent change in flow may result from rainfall or from changes in backwater from melting ice or freezing water. HYDCOMP statistically smooths estimated flows from non-missing flows at the edges of the gaps in data into regressed flows at the center of the gaps using the Kalman smoothing algorithm. Missing flows are automatically estimated by HYDCOMP, but the user also can specify that periods of erroneous, but nonmissing flows, be estimated by the program.
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
A generalized right truncated bivariate Poisson regression model with applications to health data.
Islam, M Ataharul; Chowdhury, Rafiqul I
2017-01-01
A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.
A generalized right truncated bivariate Poisson regression model with applications to health data
Islam, M. Ataharul; Chowdhury, Rafiqul I.
2017-01-01
A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344
Zachor, Ditza A; Ben-Itzchak, Esther
2016-01-01
Autism spectrum disorder (ASD) is a heterogeneous group of disorders which occurs with numerous medical conditions. In previous research, subtyping in ASD has been based mostly on cognitive ability and ASD symptom severity. The aim of the current study was to investigate whether specific medical conditions in ASD are associated with unique behavioral profiles. The medical conditions included in the study were macrocephaly, microcephaly, developmental regression, food selectivity, and sleep problems. The behavioral profile was composed of cognitive ability, adaptive skills, and autism severity, and was examined in each of the aforementioned medical conditions. The study population included 1224 participants, 1043 males and 181 females (M:F ratio = 5.8:1) with a mean age of 49.9 m (SD = 29.4) diagnosed with ASD using standardized tests. Groups with and without the specific medical conditions were compared on the behavioral measures. Developmental regression was present in 19% of the population and showed a more severe clinical presentation, with lower cognitive abilities, more severe ASD symptoms, and more impaired adaptive functioning. Microcephaly was observed in 6.3% of the population and was characterized by a lower cognitive ability and more impaired adaptive functioning in comparison to the normative head circumference (HC) group. Severe food selectivity was found in 9.8% and severe sleep problems in 5.1% of the ASD population. The food selectivity and sleep problem subgroups, both showed more severe autism symptoms only as described by the parents, but not per the professional assessment, and more impaired adaptive skills. Macrocephaly was observed in 7.9% of the ASD population and did not differ from the normative HC group in any of the examined behavioral measures. Based on these findings, two unique medical-behavioral subtypes in ASD that affect inherited traits of cognition and/or autism severity were suggested. The microcephaly phenotype occurred with more impaired cognition and the developmental regression phenotype with widespread, more severe impairments in cognition and autism severity. In contrast, severe food selectivity and sleep problems represent only comorbidities to ASD that affect functioning. Defining specific subgroups in ASD with a unique biological signature and specific behavioral phenotypes may help future genetic and neuroscience research.
NASA Astrophysics Data System (ADS)
Li, Shaoyuan; Chen, Xiuhua; Ma, Wenhui; Ding, Zhao; Zhang, Cong; Chen, Zhengjie; He, Xiao; Shang, Yudong; Zou, Yuxin
2016-11-01
Developing an innovative “Test Paper” based on virgin nanoporous silicon (NPSi) which shows intense visible emission and excellent fluorescence stability. The visual fluorescence quenching “Test Paper” was highly selective and sensitive recognizing Cu2+ at μmol/L level. Within the concentration range of 5 × 10-7 ~50 × 10-7mol/L, the linear regression equation of IPL = 1226.3-13.6[CCu2+] (R = 0.99) was established for Cu2+ quantitative detection. And finally, Cu2+ fluorescence quenching mechanism of NPSi prober was proposed by studying the surface chemistry change of NPSi and metal ions immersed-NPSi using XPS characterization. The results indicate that SiHx species obviously contribute to the PL emission of NPSi, and the introduce of oxidization state and the nonradiative recombination center are responsible for the PL quenching. These results demonstrate how virgin NPSi wafer can serve as Cu2+ sensor. This work is of great significant to promote the development of simple instruments that could realize rapid, visible and real-time detection of various toxic metal ions.
Shahbazy, Mohammad; Kompany-Zareh, Mohsen; Najafpour, Mohammad Mahdi
2015-11-01
Water oxidation is among the most important reactions in artificial photosynthesis, and nano-sized layered manganese-calcium oxides are efficient catalysts toward this reaction. Herein, a quantitative structure-activity relationship (QSAR) model was constructed to predict the catalytic activities of twenty manganese-calcium oxides toward water oxidation using multiple linear regression (MLR) and genetic algorithm (GA) for multivariate calibration and feature selection, respectively. Although there are eight controlled parameters during synthesizing of the desired catalysts including ripening time, temperature, manganese content, calcium content, potassium content, the ratio of calcium:manganese, the average manganese oxidation state and the surface of catalyst, by using GA only three of them (potassium content, the ratio of calcium:manganese and the average manganese oxidation state) were selected as the most effective parameters on catalytic activities of these compounds. The model's accuracy criteria such as R(2)test and Q(2)test in order to predict catalytic rate for external test set experiments; were equal to 0.941 and 0.906, respectively. Therefore, model reveals acceptable capability to anticipate the catalytic activity. Copyright © 2015 Elsevier B.V. All rights reserved.
Hemmateenejad, Bahram; Yazdani, Mahdieh
2009-02-16
Steroids are widely distributed in nature and are found in plants, animals, and fungi in abundance. A data set consists of a diverse set of steroids have been used to develop quantitative structure-electrochemistry relationship (QSER) models for their half-wave reduction potential. Modeling was established by means of multiple linear regression (MLR) and principle component regression (PCR) analyses. In MLR analysis, the QSPR models were constructed by first grouping descriptors and then stepwise selection of variables from each group (MLR1) and stepwise selection of predictor variables from the pool of all calculated descriptors (MLR2). Similar procedure was used in PCR analysis so that the principal components (or features) were extracted from different group of descriptors (PCR1) and from entire set of descriptors (PCR2). The resulted models were evaluated using cross-validation, chance correlation, application to prediction reduction potential of some test samples and accessing applicability domain. Both MLR approaches represented accurate results however the QSPR model found by MLR1 was statistically more significant. PCR1 approach produced a model as accurate as MLR approaches whereas less accurate results were obtained by PCR2 approach. In overall, the correlation coefficients of cross-validation and prediction of the QSPR models resulted from MLR1, MLR2 and PCR1 approaches were higher than 90%, which show the high ability of the models to predict reduction potential of the studied steroids.
Do physical maturity and birth date predict talent in male youth ice hockey players?
Sherar, Lauren B; Baxter-Jones, Adam D G; Faulkner, Robert A; Russell, Keith W
2007-06-01
The aim of this study was to examine the relationships among biological maturity, physical size, relative age (i.e. birth date), and selection into a male Canadian provincial age-banded ice hockey team. In 2003, 619 male ice hockey players aged 14-15 years attended Saskatchewan provincial team selection camps, 281 of whom participated in the present study. Data from 93 age-matched controls were obtained from the Saskatchewan Pediatric Bone Mineral Accrual Study (1991-1997). During the initial selection camps, birth dates, heights, sitting heights, and body masses were recorded. Age at peak height velocity, an indicator of biological maturity, was determined in the controls and predicted in the ice hockey players. Data were analysed using one-way analysis of variance, logistic regression, and a Kolmogorov-Smirnov test. The ice hockey players selected for the final team were taller, heavier, and more mature (P < 0.05) than both the unselected players and the age-matched controls. Furthermore, age at peak height velocity predicted (P < 0.05) being selected at the first and second selection camps. The birth dates of those players selected for the team were positively skewed, with the majority of those selected being born in the months January to June. In conclusion, team selectors appear to preferentially select early maturing male ice hockey players who have birth dates early in the selection year.
Santos, Frédéric; Guyomarc'h, Pierre; Bruzek, Jaroslav
2014-12-01
Accuracy of identification tools in forensic anthropology primarily rely upon the variations inherent in the data upon which they are built. Sex determination methods based on craniometrics are widely used and known to be specific to several factors (e.g. sample distribution, population, age, secular trends, measurement technique, etc.). The goal of this study is to discuss the potential variations linked to the statistical treatment of the data. Traditional craniometrics of four samples extracted from documented osteological collections (from Portugal, France, the U.S.A., and Thailand) were used to test three different classification methods: linear discriminant analysis (LDA), logistic regression (LR), and support vector machines (SVM). The Portuguese sample was set as a training model on which the other samples were applied in order to assess the validity and reliability of the different models. The tests were performed using different parameters: some included the selection of the best predictors; some included a strict decision threshold (sex assessed only if the related posterior probability was high, including the notion of indeterminate result); and some used an unbalanced sex-ratio. Results indicated that LR tends to perform slightly better than the other techniques and offers a better selection of predictors. Also, the use of a decision threshold (i.e. p>0.95) is essential to ensure an acceptable reliability of sex determination methods based on craniometrics. Although the Portuguese, French, and American samples share a similar sexual dimorphism, application of Western models on the Thai sample (that displayed a lower degree of dimorphism) was unsuccessful. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Regressive Evolution in the Mexican Cave Tetra, Astyanax mexicanus
Protas, Meredith; Conrad, Melissa; Gross, Joshua B.; Tabin, Clifford; Borowsky, Richard
2007-01-01
Summary Cave adapted animals generally have reduced pigmentation and eyes, but the evolutionary forces driving the reductions are unknown; Darwin famously questioned the role of natural selection in eye loss in cave fishes; “As it is difficult to imagine that eyes, although useless, could be in any way injurious to animals living in darkness, I attribute their loss wholly to disuse” [1]. We studied the genetic basis of this phenomenon in the Mexican cave tetra, Astyanax mexicanus, by mapping the quantitative trait loci (QTL) determining differences in eye/lens sizes and melanophore number between cave and surface fish. In addition, we mapped QTL for the putatively constructive traits of jaw size, tooth number, and numbers of taste buds. The data suggest that eyes and pigmentation regressed through different mechanisms. Cave alleles at each eye/lens QTL we detected caused size reductions. This uniform negative polarity is consistent with evolution by natural selection and inconsistent with evolution by drift. In contrast, QTL polarities for melanophore number were mixed, consistent with evolution by genetic drift or indirect selection through pleiotropy. Past arguments against a role for selection in regression of cave fish eyes cited the insignificant cost of their development [2, 3], but we argue that the energetic cost of their maintenance is sufficiently high for eyes to be detrimental in the cave environment. Regression, a ubiquitous aspect of all evolutionary change, can be caused either by selection or genetic drift/pleiotropy. PMID:17306543
Li, Feiming; Gimpel, John R; Arenson, Ethan; Song, Hao; Bates, Bruce P; Ludwin, Fredric
2014-04-01
Few studies have investigated how well scores from the Comprehensive Osteopathic Medical Licensing Examination-USA (COMLEX-USA) series predict resident outcomes, such as performance on board certification examinations. To determine how well COMLEX-USA predicts performance on the American Osteopathic Board of Emergency Medicine (AOBEM) Part I certification examination. The target study population was first-time examinees who took AOBEM Part I in 2011 and 2012 with matched performances on COMLEX-USA Level 1, Level 2-Cognitive Evaluation (CE), and Level 3. Pearson correlations were computed between AOBEM Part I first-attempt scores and COMLEX-USA performances to measure the association between these examinations. Stepwise linear regression analysis was conducted to predict AOBEM Part I scores by the 3 COMLEX-USA scores. An independent t test was conducted to compare mean COMLEX-USA performances between candidates who passed and who failed AOBEM Part I, and a stepwise logistic regression analysis was used to predict the log-odds of passing AOBEM Part I on the basis of COMLEX-USA scores. Scores from AOBEM Part I had the highest correlation with COMLEX-USA Level 3 scores (.57) and slightly lower correlation with COMLEX-USA Level 2-CE scores (.53). The lowest correlation was between AOBEM Part I and COMLEX-USA Level 1 scores (.47). According to the stepwise regression model, COMLEX-USA Level 1 and Level 2-CE scores, which residency programs often use as selection criteria, together explained 30% of variance in AOBEM Part I scores. Adding Level 3 scores explained 37% of variance. The independent t test indicated that the 397 examinees passing AOBEM Part I performed significantly better than the 54 examinees failing AOBEM Part I in all 3 COMLEX-USA levels (P<.001 for all 3 levels). The logistic regression model showed that COMLEX-USA Level 1 and Level 3 scores predicted the log-odds of passing AOBEM Part I (P=.03 and P<.001, respectively). The present study empirically supported the predictive and discriminant validities of the COMLEX-USA series in relation to the AOBEM Part I certification examination. Although residency programs may use COMLEX-USA Level 1 and Level 2-CE scores as partial criteria in selecting residents, Level 3 scores, though typically not available at the time of application, are actually the most statistically related to performances on AOBEM Part I.
Bell, Lana M; Byrne, Sue; Thompson, Alisha; Ratnam, Nirubasini; Blair, Eve; Bulsara, Max; Jones, Timothy W; Davis, Elizabeth A
2007-02-01
Overweight/obesity in children is increasing. Incidence data for medical complications use arbitrary cutoff values for categories of overweight and obesity. Continuous relationships are seldom reported. The objective of this study is to report relationships of child body mass index (BMI) z-score as a continuous variable with the medical complications of overweight. This study is a part of the larger, prospective cohort Growth and Development Study. Children were recruited from the community through randomly selected primary schools. Overweight children seeking treatment were recruited through tertiary centers. Children aged 6-13 yr were community-recruited normal weight (n = 73), community-recruited overweight (n = 53), and overweight treatment-seeking (n = 51). Medical history, family history, and symptoms of complications of overweight were collected by interview, and physical examination was performed. Investigations included oral glucose tolerance tests, fasting lipids, and liver function tests. Adjusted regression was used to model each complication of obesity with age- and sex-specific child BMI z-scores entered as a continuous dependent variable. Adjusted logistic regression showed the proportion of children with musculoskeletal pain, obstructive sleep apnea symptoms, headaches, depression, anxiety, bullying, and acanthosis nigricans increased with child BMI z-score. Adjusted linear regression showed BMI z-score was significantly related to systolic and diastolic blood pressure, insulin during oral glucose tolerance test, total cholesterol, high-density lipoprotein, triglycerides, and alanine aminotransferase. Child's BMI z-score is independently related to complications of overweight and obesity in a linear or curvilinear fashion. Children's risks of most complications increase across the entire range of BMI values and are not defined by thresholds.
Futia, Gregory L; Schlaepfer, Isabel R; Qamar, Lubna; Behbakht, Kian; Gibson, Emily A
2017-07-01
Detection of circulating tumor cells (CTCs) in a blood sample is limited by the sensitivity and specificity of the biomarker panel used to identify CTCs over other blood cells. In this work, we present Bayesian theory that shows how test sensitivity and specificity set the rarity of cell that a test can detect. We perform our calculation of sensitivity and specificity on our image cytometry biomarker panel by testing on pure disease positive (D + ) populations (MCF7 cells) and pure disease negative populations (D - ) (leukocytes). In this system, we performed multi-channel confocal fluorescence microscopy to image biomarkers of DNA, lipids, CD45, and Cytokeratin. Using custom software, we segmented our confocal images into regions of interest consisting of individual cells and computed the image metrics of total signal, second spatial moment, spatial frequency second moment, and the product of the spatial-spatial frequency moments. We present our analysis of these 16 features. The best performing of the 16 features produced an average separation of three standard deviations between D + and D - and an average detectable rarity of ∼1 in 200. We performed multivariable regression and feature selection to combine multiple features for increased performance and showed an average separation of seven standard deviations between the D + and D - populations making our average detectable rarity of ∼1 in 480. Histograms and receiver operating characteristics (ROC) curves for these features and regressions are presented. We conclude that simple regression analysis holds promise to further improve the separation of rare cells in cytometry applications. © 2017 International Society for Advancement of Cytometry. © 2017 International Society for Advancement of Cytometry.
Estimation of Flood Discharges at Selected Recurrence Intervals for Streams in New Hampshire
Olson, Scott A.
2009-01-01
This report provides estimates of flood discharges at selected recurrence intervals for streamgages in and adjacent to New Hampshire and equations for estimating flood discharges at recurrence intervals of 2-, 5-, 10-, 25-, 50-, 100-, and 500-years for ungaged, unregulated, rural streams in New Hampshire. The equations were developed using generalized least-squares regression. Flood-frequency and drainage-basin characteristics from 117 streamgages were used in developing the equations. The drainage-basin characteristics used as explanatory variables in the regression equations include drainage area, mean April precipitation, percentage of wetland area, and main channel slope. The average standard error of prediction for estimating the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence interval flood discharges with these equations are 30.0, 30.8, 32.0, 34.2, 36.0, 38.1, and 43.4 percent, respectively. Flood discharges at selected recurrence intervals for selected streamgages were computed following the guidelines in Bulletin 17B of the U.S. Interagency Advisory Committee on Water Data. To determine the flood-discharge exceedence probabilities at streamgages in New Hampshire, a new generalized skew coefficient map covering the State was developed. The standard error of the data on new map is 0.298. To improve estimates of flood discharges at selected recurrence intervals for 20 streamgages with short-term records (10 to 15 years), record extension using the two-station comparison technique was applied. The two-station comparison method uses data from a streamgage with long-term record to adjust the frequency characteristics at a streamgage with a short-term record. A technique for adjusting a flood-discharge frequency curve computed from a streamgage record with results from the regression equations is described in this report. Also, a technique is described for estimating flood discharge at a selected recurrence interval for an ungaged site upstream or downstream from a streamgage using a drainage-area adjustment. The final regression equations and the flood-discharge frequency data used in this study will be available in StreamStats. StreamStats is a World Wide Web application providing automated regression-equation solutions for user-selected sites on streams.
Ghasemi, Jahan B; Safavi-Sohi, Reihaneh; Barbosa, Euzébio G
2012-02-01
A quasi 4D-QSAR has been carried out on a series of potent Gram-negative LpxC inhibitors. This approach makes use of the molecular dynamics (MD) trajectories and topology information retrieved from the GROMACS package. This new methodology is based on the generation of a conformational ensemble profile, CEP, for each compound instead of only one conformation, followed by the calculation intermolecular interaction energies at each grid point considering probes and all aligned conformations resulting from MD simulations. These interaction energies are independent variables employed in a QSAR analysis. The comparison of the proposed methodology to comparative molecular field analysis (CoMFA) formalism was performed. This methodology explores jointly the main features of CoMFA and 4D-QSAR models. Step-wise multiple linear regression was used for the selection of the most informative variables. After variable selection, multiple linear regression (MLR) and partial least squares (PLS) methods used for building the regression models. Leave-N-out cross-validation (LNO), and Y-randomization were performed in order to confirm the robustness of the model in addition to analysis of the independent test set. Best models provided the following statistics: [Formula in text] (PLS) and [Formula in text] (MLR). Docking study was applied to investigate the major interactions in protein-ligand complex with CDOCKER algorithm. Visualization of the descriptors of the best model helps us to interpret the model from the chemical point of view, supporting the applicability of this new approach in rational drug design.
Determining which phenotypes underlie a pleiotropic signal
Majumdar, Arunabha; Haldar, Tanushree; Witte, John S.
2016-01-01
Discovering pleiotropic loci is important to understand the biological basis of seemingly distinct phenotypes. Most methods for assessing pleiotropy only test for the overall association between genetic variants and multiple phenotypes. To determine which specific traits are pleiotropic, we evaluate via simulation and application three different strategies. The first is model selection techniques based on the inverse regression of genotype on phenotypes. The second is a subset-based meta-analysis ASSET [Bhattacharjee et al., 2012], which provides an optimal subset of non-null traits. And the third is a modified Benjamini-Hochberg (B-H) procedure of controlling the expected false discovery rate [Benjamini and Hochberg, 1995] in the framework of phenome-wide association study. From our simulations we see that an inverse regression based approach MultiPhen [O’Reilly et al., 2012] is more powerful than ASSET for detecting overall pleiotropic association, except for when all the phenotypes are associated and have genetic effects in the same direction. For determining which specific traits are pleiotropic, the modified B-H procedure performs consistently better than the other two methods. The inverse regression based selection methods perform competitively with the modified B-H procedure only when the phenotypes are weakly correlated. The efficiency of ASSET is observed to lie below and in between the efficiency of the other two methods when the traits are weakly and strongly correlated, respectively. In our application to a large GWAS, we find that the modified B-H procedure also performs well, indicating that this may be an optimal approach for determining the traits underlying a pleiotropic signal. PMID:27238845
Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R
2009-12-01
To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.
Total Phosphorus Loads for Selected Tributaries to Sebago Lake, Maine
Hodgkins, Glenn A.
2001-01-01
The streamflow and water-quality datacollection networks of the Portland Water District (PWD) and the U.S. Geological Survey (USGS) as of February 2000 were analyzed in terms of their applicability for estimating total phosphorus loads for selected tributaries to Sebago Lake in southern Maine. The long-term unit-area mean annual flows for the Songo River and for small, ungaged tributaries are similar to the long-term unit-area mean annual flows for the Crooked River and other gaged tributaries to Sebago Lake, based on a regression equation that estimates mean annual streamflows in Maine. Unit-area peak streamflows of Sebago Lake tributaries can be quite different, based on a regression equation that estimates peak streamflows for Maine. Crooked River had a statistically significant positive relation (Kendall's Tau test, p=0.0004) between streamflow and total phosphorus concentration. Panther Run had a statistically significant negative relation (p=0.0015). Significant positive relations may indicate contributions from nonpoint sources or sediment resuspension, whereas significant negative relations may indicate dilution of point sources. Total phosphorus concentrations were significantly larger in the Crooked River than in the Songo River (Wilcoxon rank-sum test, p<0.0001). Evidence was insufficient, however, to indicate that phosphorus concentrations from medium-sized drainage basins, at a significance level of 0.05, were different from each other or that concentrations in small-sized drainage basins were different from each other (Kruskal-Wallis test, p= 0.0980, 0.1265). All large- and medium-sized drainage basins were sampled for total phosphorus approximately monthly. Although not all small drainage basins were sampled, they may be well represented by the small drainage basins that were sampled. If the tributaries gaged by PWD had adequate streamflow data, the current PWD tributary monitoring program would probably produce total phosphorus loading data that would represent all gaged and ungaged tributaries to Sebago Lake. Outside the PWD tributary-monitoring program, the largest ungaged tributary to Sebago Lake contains 1.5 percent of the area draining to the lake. In the absence of unique point or nonpoint sources of phosphorus, ungaged tributaries are unlikely to have total phosphorus concentrations that differ significantly from those in the small tributaries that have concentration data. The regression method, also known as the rating-curve method, was used to estimate the annual total phosphorus load for Crooked River, Northwest River, and Rich Mill Pond Outlet for water years 1996-98. The MOVE.1 method was used to estimate daily streamflows for the regression method at Northwest River and Rich Mill Pond Outlet, where streamflows were not continuously monitored. An averaging method also was used to compute annual loads at the three sites. The difference between the regression estimate and the averaging estimate for each of the three tributaries was consistent with what was expected from previous studies.
Medical school dropout--testing at admission versus selection by highest grades as predictors.
O'Neill, Lotte; Hartvigsen, Jan; Wallstedt, Birgitta; Korsholm, Lars; Eika, Berit
2011-11-01
Very few studies have reported on the effect of admission tests on medical school dropout. The main aim of this study was to evaluate the predictive validity of non-grade-based admission testing versus grade-based admission relative to subsequent dropout. This prospective cohort study followed six cohorts of medical students admitted to the medical school at the University of Southern Denmark during 2002-2007 (n=1544). Half of the students were admitted based on their prior achievement of highest grades (Strategy 1) and the other half took a composite non-grade-based admission test (Strategy 2). Educational as well as social predictor variables (doctor-parent, origin, parenthood, parents living together, parent on benefit, university-educated parents) were also examined. The outcome of interest was students' dropout status at 2 years after admission. Multivariate logistic regression analysis was used to model dropout. Strategy 2 (admission test) students had a lower relative risk for dropping out of medical school within 2 years of admission (odds ratio 0.56, 95% confidence interval 0.39-0.80). Only the admission strategy, the type of qualifying examination and the priority given to the programme on the national application forms contributed significantly to the dropout model. Social variables did not predict dropout and neither did Strategy 2 admission test scores. Selection by admission testing appeared to have an independent, protective effect on dropout in this setting. © Blackwell Publishing Ltd 2011.
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
2016-02-01
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Logistic regression trees for initial selection of interesting loci in case-control studies
Nickolov, Radoslav Z; Milanov, Valentin B
2007-01-01
Modern genetic epidemiology faces the challenge of dealing with hundreds of thousands of genetic markers. The selection of a small initial subset of interesting markers for further investigation can greatly facilitate genetic studies. In this contribution we suggest the use of a logistic regression tree algorithm known as logistic tree with unbiased selection. Using the simulated data provided for Genetic Analysis Workshop 15, we show how this algorithm, with incorporation of multifactor dimensionality reduction method, can reduce an initial large pool of markers to a small set that includes the interesting markers with high probability. PMID:18466557
The ecological drivers of nuptial color evolution in darters (Percidae: Etheostomatinae).
Ciccotto, Patrick J; Mendelson, Tamra C
2016-04-01
Closely related animal lineages often vary in male coloration, and ecological selection is hypothesized to shape this variation. The role of ecological selection in inhibiting male color has been documented extensively at the population level, but relatively few studies have investigated the evolution of male coloration across a clade of closely related species. Darters are a diverse group of fishes that vary in the presence of elaborate male nuptial coloration, with some species exhibiting vivid color patterns and others mostly or entirely achromatic. We used phylogenetic logistic regression to test for correlations between the presence/absence of color traits across darter species and the ecological conditions in which these species occur. Environmental variables were correlated with the presence of nuptial color in darters with colorful species tending to inhabit environments that would support fewer predators and potentially transmit a broader spectrum of natural light compared to species lacking male coloration. We also tested the color preferences of a common darter predator, largemouth bass, and found that it exhibits a strong preference for red, providing further evidence of predation as a source of selection on color evolution in darters. Ecological selection therefore appears to be an important factor in dictating the presence or absence of male coloration in this group of fishes. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Feature Grouping and Selection Over an Undirected Graph.
Yang, Sen; Yuan, Lei; Lai, Ying-Cheng; Shen, Xiaotong; Wonka, Peter; Ye, Jieping
2012-01-01
High-dimensional regression/classification continues to be an important and challenging problem, especially when features are highly correlated. Feature selection, combined with additional structure information on the features has been considered to be promising in promoting regression/classification performance. Graph-guided fused lasso (GFlasso) has recently been proposed to facilitate feature selection and graph structure exploitation, when features exhibit certain graph structures. However, the formulation in GFlasso relies on pairwise sample correlations to perform feature grouping, which could introduce additional estimation bias. In this paper, we propose three new feature grouping and selection methods to resolve this issue. The first method employs a convex function to penalize the pairwise l ∞ norm of connected regression/classification coefficients, achieving simultaneous feature grouping and selection. The second method improves the first one by utilizing a non-convex function to reduce the estimation bias. The third one is the extension of the second method using a truncated l 1 regularization to further reduce the estimation bias. The proposed methods combine feature grouping and feature selection to enhance estimation accuracy. We employ the alternating direction method of multipliers (ADMM) and difference of convex functions (DC) programming to solve the proposed formulations. Our experimental results on synthetic data and two real datasets demonstrate the effectiveness of the proposed methods.
The need to control for regression to the mean in social psychology studies.
Yu, Rongjun; Chen, Li
2014-01-01
It is common in repeated measurements for extreme values at the first measurement to approach the mean at the subsequent measurement, a phenomenon called regression to the mean (RTM). If RTM is not fully controlled, it will lead to erroneous conclusions. The wide use of repeated measurements in social psychology creates a risk that an RTM effect will influence results. However, insufficient attention is paid to RTM in most social psychological research. Notable cases include studies on the phenomena of social conformity and unrealistic optimism (Klucharev et al., 2009, 2011; Sharot et al., 2011, 2012b; Campbell-Meiklejohn et al., 2012; Kim et al., 2012; Garrett and Sharot, 2014). In Study 1, 13 university students rated and re-rated the facial attractiveness of a series of female faces as a test of the social conformity effect (Klucharev et al., 2009). In Study 2, 15 university students estimated and re-estimated their risk of experiencing a series of adverse life events as a test of the unrealistic optimism effect (Sharot et al., 2011). Although these studies used methodologies similar to those used in earlier research, the social conformity and unrealistic optimism effects were no longer evident after controlling for RTM. Based on these findings we suggest several ways to control for the RTM effect in social psychology studies, such as adding the initial rating as a covariate in regression analysis, selecting a subset of stimuli for which the participant' initial ratings were matched across experimental conditions, and using a control group.
Sun, Yu; Reynolds, Hayley M; Wraith, Darren; Williams, Scott; Finnegan, Mary E; Mitchell, Catherine; Murphy, Declan; Haworth, Annette
2018-04-26
There are currently no methods to estimate cell density in the prostate. This study aimed to develop predictive models to estimate prostate cell density from multiparametric magnetic resonance imaging (mpMRI) data at a voxel level using machine learning techniques. In vivo mpMRI data were collected from 30 patients before radical prostatectomy. Sequences included T2-weighted imaging, diffusion-weighted imaging and dynamic contrast-enhanced imaging. Ground truth cell density maps were computed from histology and co-registered with mpMRI. Feature extraction and selection were performed on mpMRI data. Final models were fitted using three regression algorithms including multivariate adaptive regression spline (MARS), polynomial regression (PR) and generalised additive model (GAM). Model parameters were optimised using leave-one-out cross-validation on the training data and model performance was evaluated on test data using root mean square error (RMSE) measurements. Predictive models to estimate voxel-wise prostate cell density were successfully trained and tested using the three algorithms. The best model (GAM) achieved a RMSE of 1.06 (± 0.06) × 10 3 cells/mm 2 and a relative deviation of 13.3 ± 0.8%. Prostate cell density can be quantitatively estimated non-invasively from mpMRI data using high-quality co-registered data at a voxel level. These cell density predictions could be used for tissue classification, treatment response evaluation and personalised radiotherapy.
QSAR modeling of flotation collectors using principal components extracted from topological indices.
Natarajan, R; Nirdosh, Inderjit; Basak, Subhash C; Mills, Denise R
2002-01-01
Several topological indices were calculated for substituted-cupferrons that were tested as collectors for the froth flotation of uranium. The principal component analysis (PCA) was used for data reduction. Seven principal components (PC) were found to account for 98.6% of the variance among the computed indices. The principal components thus extracted were used in stepwise regression analyses to construct regression models for the prediction of separation efficiencies (Es) of the collectors. A two-parameter model with a correlation coefficient of 0.889 and a three-parameter model with a correlation coefficient of 0.913 were formed. PCs were found to be better than partition coefficient to form regression equations, and inclusion of an electronic parameter such as Hammett sigma or quantum mechanically derived electronic charges on the chelating atoms did not improve the correlation coefficient significantly. The method was extended to model the separation efficiencies of mercaptobenzothiazoles (MBT) and aminothiophenols (ATP) used in the flotation of lead and zinc ores, respectively. Five principal components were found to explain 99% of the data variability in each series. A three-parameter equation with correlation coefficient of 0.985 and a two-parameter equation with correlation coefficient of 0.926 were obtained for MBT and ATP, respectively. The amenability of separation efficiencies of chelating collectors to QSAR modeling using PCs based on topological indices might lead to the selection of collectors for synthesis and testing from a virtual database.
NASA Astrophysics Data System (ADS)
de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.
2018-04-01
A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.
L2-Boosting algorithm applied to high-dimensional problems in genomic selection.
González-Recio, Oscar; Weigel, Kent A; Gianola, Daniel; Naya, Hugo; Rosa, Guilherme J M
2010-06-01
The L(2)-Boosting algorithm is one of the most promising machine-learning techniques that has appeared in recent decades. It may be applied to high-dimensional problems such as whole-genome studies, and it is relatively simple from a computational point of view. In this study, we used this algorithm in a genomic selection context to make predictions of yet to be observed outcomes. Two data sets were used: (1) productive lifetime predicted transmitting abilities from 4702 Holstein sires genotyped for 32 611 single nucleotide polymorphisms (SNPs) derived from the Illumina BovineSNP50 BeadChip, and (2) progeny averages of food conversion rate, pre-corrected by environmental and mate effects, in 394 broilers genotyped for 3481 SNPs. Each of these data sets was split into training and testing sets, the latter comprising dairy or broiler sires whose ancestors were in the training set. Two weak learners, ordinary least squares (OLS) and non-parametric (NP) regression were used for the L2-Boosting algorithm, to provide a stringent evaluation of the procedure. This algorithm was compared with BL [Bayesian LASSO (least absolute shrinkage and selection operator)] and BayesA regression. Learning tasks were carried out in the training set, whereas validation of the models was performed in the testing set. Pearson correlations between predicted and observed responses in the dairy cattle (broiler) data set were 0.65 (0.33), 0.53 (0.37), 0.66 (0.26) and 0.63 (0.27) for OLS-Boosting, NP-Boosting, BL and BayesA, respectively. The smallest bias and mean-squared errors (MSEs) were obtained with OLS-Boosting in both the dairy cattle (0.08 and 1.08, respectively) and broiler (-0.011 and 0.006) data sets, respectively. In the dairy cattle data set, the BL was more accurate (bias=0.10 and MSE=1.10) than BayesA (bias=1.26 and MSE=2.81), whereas no differences between these two methods were found in the broiler data set. L2-Boosting with a suitable learner was found to be a competitive alternative for genomic selection applications, providing high accuracy and low bias in genomic-assisted evaluations with a relatively short computational time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reyhan, M; Yue, N
Purpose: To validate an automated image processing algorithm designed to detect the center of radiochromic film used for in vivo film dosimetry against the current gold standard of manual selection. Methods: An image processing algorithm was developed to automatically select the region of interest (ROI) in *.tiff images that contain multiple pieces of radiochromic film (0.5x1.3cm{sup 2}). After a user has linked a calibration file to the processing algorithm and selected a *.tiff file for processing, an ROI is automatically detected for all films by a combination of thresholding and erosion, which removes edges and any additional markings for orientation.more » Calibration is applied to the mean pixel values from the ROIs and a *.tiff image is output displaying the original image with an overlay of the ROIs and the measured doses. Validation of the algorithm was determined by comparing in vivo dose determined using the current gold standard (manually drawn ROIs) versus automated ROIs for n=420 scanned films. Bland-Altman analysis, paired t-test, and linear regression were performed to demonstrate agreement between the processes. Results: The measured doses ranged from 0.2-886.6cGy. Bland-Altman analysis of the two techniques (automatic minus manual) revealed a bias of -0.28cGy and a 95% confidence interval of (5.5cGy,-6.1cGy). These values demonstrate excellent agreement between the two techniques. Paired t-test results showed no statistical differences between the two techniques, p=0.98. Linear regression with a forced zero intercept demonstrated that Automatic=0.997*Manual, with a Pearson correlation coefficient of 0.999. The minimal differences between the two techniques may be explained by the fact that the hand drawn ROIs were not identical to the automatically selected ones. The average processing time was 6.7seconds in Matlab on an IntelCore2Duo processor. Conclusion: An automated image processing algorithm has been developed and validated, which will help minimize user interaction and processing time of radiochromic film used for in vivo dosimetry.« less
Development of automatic body condition scoring using a low-cost 3-dimensional Kinect camera.
Spoliansky, Roii; Edan, Yael; Parmet, Yisrael; Halachmi, Ilan
2016-09-01
Body condition scoring (BCS) is a farm-management tool for estimating dairy cows' energy reserves. Today, BCS is performed manually by experts. This paper presents a 3-dimensional algorithm that provides a topographical understanding of the cow's body to estimate BCS. An automatic BCS system consisting of a Kinect camera (Microsoft Corp., Redmond, WA) triggered by a passive infrared motion detector was designed and implemented. Image processing and regression algorithms were developed and included the following steps: (1) image restoration, the removal of noise; (2) object recognition and separation, identification and separation of the cows; (3) movie and image selection, selection of movies and frames that include the relevant data; (4) image rotation, alignment of the cow parallel to the x-axis; and (5) image cropping and normalization, removal of irrelevant data, setting the image size to 150×200 pixels, and normalizing image values. All steps were performed automatically, including image selection and classification. Fourteen individual features per cow, derived from the cows' topography, were automatically extracted from the movies and from the farm's herd-management records. These features appear to be measurable in a commercial farm. Manual BCS was performed by a trained expert and compared with the output of the training set. A regression model was developed, correlating the features with the manual BCS references. Data were acquired for 4 d, resulting in a database of 422 movies of 101 cows. Movies containing cows' back ends were automatically selected (389 movies). The data were divided into a training set of 81 cows and a test set of 20 cows; both sets included the identical full range of BCS classes. Accuracy tests gave a mean absolute error of 0.26, median absolute error of 0.19, and coefficient of determination of 0.75, with 100% correct classification within 1 step and 91% correct classification within a half step for BCS classes. Results indicated good repeatability, with all standard deviations under 0.33. The algorithm is independent of the background and requires 10 cows for training with approximately 30 movies of 4 s each. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Genetic analysis of partial egg production records in Japanese quail using random regression models.
Abou Khadiga, G; Mahmoud, B Y F; Farahat, G S; Emam, A M; El-Full, E A
2017-08-01
The main objectives of this study were to detect the most appropriate random regression model (RRM) to fit the data of monthly egg production in 2 lines (selected and control) of Japanese quail and to test the consistency of different criteria of model choice. Data from 1,200 female Japanese quails for the first 5 months of egg production from 4 consecutive generations of an egg line selected for egg production in the first month (EP1) was analyzed. Eight RRMs with different orders of Legendre polynomials were compared to determine the proper model for analysis. All criteria of model choice suggested that the adequate model included the second-order Legendre polynomials for fixed effects, and the third-order for additive genetic effects and permanent environmental effects. Predictive ability of the best model was the highest among all models (ρ = 0.987). According to the best model fitted to the data, estimates of heritability were relatively low to moderate (0.10 to 0.17) showed a descending pattern from the first to the fifth month of production. A similar pattern was observed for permanent environmental effects with greater estimates in the first (0.36) and second (0.23) months of production than heritability estimates. Genetic correlations between separate production periods were higher (0.18 to 0.93) than their phenotypic counterparts (0.15 to 0.87). The superiority of the selected line over the control was observed through significant (P < 0.05) linear contrast estimates. Significant (P < 0.05) estimates of covariate effect (age at sexual maturity) showed a decreased pattern with greater impact on egg production in earlier ages (first and second months) than later ones. A methodology based on random regression animal models can be recommended for genetic evaluation of egg production in Japanese quail. © 2017 Poultry Science Association Inc.
Correction of Selection Bias in Survey Data: Is the Statistical Cure Worse Than the Bias?
Hanley, James A
2017-04-01
In previous articles in the American Journal of Epidemiology (Am J Epidemiol. 2013;177(5):431-442) and American Journal of Public Health (Am J Public Health. 2013;103(10):1895-1901), Masters et al. reported age-specific hazard ratios for the contrasts in mortality rates between obesity categories. They corrected the observed hazard ratios for selection bias caused by what they postulated was the nonrepresentativeness of the participants in the National Health Interview Study that increased with age, obesity, and ill health. However, it is possible that their regression approach to remove the alleged bias has not produced, and in general cannot produce, sensible hazard ratio estimates. First, we must consider how many nonparticipants there might have been in each category of obesity and of age at entry and how much higher the mortality rates would have to be in nonparticipants than in participants in these same categories. What plausible set of numerical values would convert the ("biased") decreasing-with-age hazard ratios seen in the data into the ("unbiased") increasing-with-age ratios that they computed? Can these values be encapsulated in (and can sensible values be recovered from) one additional internal variable in a regression model? Second, one must examine the age pattern of the hazard ratios that have been adjusted for selection. Without the correction, the hazard ratios are attenuated with increasing age. With it, the hazard ratios at older ages are considerably higher, but those at younger ages are well below one. Third, one must test whether the regression approach suggested by Masters et al. would correct the nonrepresentativeness that increased with age and ill health that I introduced into real and hypothetical data sets. I found that the approach did not recover the hazard ratio patterns present in the unselected data sets: the corrections overshot the target at older ages and undershot it at lower ages.
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P.; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS.
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Walker, J.F.
1993-01-01
Selected statistical techniques were applied to three urban watersheds in Texas and Minnesota and three rural watersheds in Illinois. For the urban watersheds, single- and paired-site data-collection strategies were considered. The paired-site strategy was much more effective than the singlesite strategy for detecting changes. Analysis of storm load regression residuals demonstrated the potential utility of regressions for variability reduction. For the rural watersheds, none of the selected techniques were effective at identifying changes, primarily due to a small degree of management-practice implementation, potential errors introduced through the estimation of storm load, and small sample sizes. A Monte Carlo sensitivity analysis was used to determine the percent change in water chemistry that could be detected for each watershed. In most instances, the use of regressions improved the ability to detect changes.
Zheng, Qi; Peng, Limin
2016-01-01
Quantile regression provides a flexible platform for evaluating covariate effects on different segments of the conditional distribution of response. As the effects of covariates may change with quantile level, contemporaneously examining a spectrum of quantiles is expected to have a better capacity to identify variables with either partial or full effects on the response distribution, as compared to focusing on a single quantile. Under this motivation, we study a general adaptively weighted LASSO penalization strategy in the quantile regression setting, where a continuum of quantile index is considered and coefficients are allowed to vary with quantile index. We establish the oracle properties of the resulting estimator of coefficient function. Furthermore, we formally investigate a BIC-type uniform tuning parameter selector and show that it can ensure consistent model selection. Our numerical studies confirm the theoretical findings and illustrate an application of the new variable selection procedure. PMID:28008212
Small-Scale Hybrid Rocket Test Stand & Characterization of Swirl Injectors
NASA Astrophysics Data System (ADS)
Summers, Matt H.
Derived from the necessity to increase testing capabilities of hybrid rocket motor (HRM) propulsion systems for Daedalus Astronautics at Arizona State University, a small-scale motor and test stand were designed and developed to characterize all components of the system. The motor is designed for simple integration and setup, such that both the forward-end enclosure and end cap can be easily removed for rapid integration of components during testing. Each of the components of the motor is removable allowing for a broad range of testing capabilities. While examining injectors and their potential it is thought ideal to obtain the highest regression rates and overall motor performance possible. The oxidizer and fuel are N2O and hydroxyl-terminated polybutadiene (HTPB), respectively, due to previous experience and simplicity. The injector designs, selected for the same reasons, are designed such that they vary only in the swirl angle. This system provides the platform for characterizing the effects of varying said swirl angle on HRM performance.
Kawasaki Disease Increases the Incidence of Myopia.
Kung, Yung-Jen; Wei, Chang-Ching; Chen, Liuh An; Chen, Jiin Yi; Chang, Ching-Yao; Lin, Chao-Jen; Lim, Yun-Ping; Tien, Peng-Tai; Chen, Hsuan-Ju; Huang, Yong-San; Lin, Hui-Ju; Wan, Lei
2017-01-01
The prevalence of myopia has rapidly increased in recent decades and has led to a considerable global public health concern. In this study, we elucidate the relationship between Kawasaki disease (KD) and the incidence of myopia. We used Taiwan's National Health Insurance Research Database to conduct a population-based cohort study. We identified patients diagnosed with KD and individuals without KD who were selected by frequency matched based on sex, age, and the index year. The Cox proportional hazards regression model was used to estimate the hazard ratio and 95% confidence intervals for the comparison of the 2 cohorts. The log-rank test was used to test the incidence of myopia in the 2 cohorts. A total of 532 patients were included in the KD cohort and 2128 in the non-KD cohort. The risk of myopia (hazard ratio, 1.31; 95% confidence interval, 1.08-1.58; P < 0.01) was higher among patients with KD than among those in the non-KD cohort. The Cox proportional hazards regression model showed that irrespective of age, gender, and urbanization, Kawasaki disease was an independent risk factor for myopia. Patients with Kawasaki disease exhibited a substantially higher risk for developing myopia.
A Novel Multiobjective Evolutionary Algorithm Based on Regression Analysis
Song, Zhiming; Wang, Maocai; Dai, Guangming; Vasile, Massimiliano
2015-01-01
As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m − 1)-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA) is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m − 1)-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper. PMID:25874246
Effects of greening and community reuse of vacant lots on crime
Kondo, Michelle; Hohl, Bernadette; Han, SeungHoon; Branas, Charles
2016-01-01
The Youngstown Neighborhood Development Corporation initiated a ‘Lots of Green’ programme to reuse vacant land in 2010. We performed a difference-in-differences analysis of the effects of this programme on crime in and around newly treated lots, in comparison to crimes in and around randomly selected and matched, untreated vacant lot controls. The effects of two types of vacant lot treatments on crime were tested: a cleaning and greening ‘stabilisation’ treatment and a ‘community reuse’ treatment mostly involving community gardens. The combined effects of both types of vacant lot treatments were also tested. After adjustment for various sociodemographic factors, linear and Poisson regression models demonstrated statistically significant reductions in all crime classes for at least one lot treatment type. Regression models adjusted for spatial autocorrelation found the most consistent significant reductions in burglaries around stabilisation lots, and in assaults around community reuse lots. Spill-over crime reduction effects were found in contiguous areas around newly treated lots. Significant increases in motor vehicle thefts around both types of lots were also found after they had been greened. Community-initiated vacant lot greening may have a greater impact on reducing more serious, violent crimes. PMID:28529389
Kim, Minji; Kim, Won-Baek; Koo, Kyoung Yoon; Kim, Bo Ram; Kim, Doohyun; Lee, Seoyoun; Son, Hong Joo; Hwang, Dae Youn; Kim, Dong Seob; Lee, Chung Yeoul; Lee, Heeseob
2017-04-28
This study was conducted to evaluate the hyaluronidase (HAase) inhibition activity of Asparagus cochinchinesis (AC) extracts following fermentation by Weissella cibaria through response surface methodology. To optimize the HAase inhibition activity, a central composite design was introduced based on four variables: the concentration of AC extract ( X 1 : 1-5%), amount of starter culture ( X 2 : 1-5%), pH ( X 3 : 4-8), and fermentation time ( X 4 : 0-10 days). The experimental data were fitted to quadratic regression equations, the accuracy of the equations was analyzed by ANOVA, and the regression coefficients for the surface quadratic model of HAase inhibition activity in the fermented AC extract were estimated by the F test and the corresponding p values. The HAase inhibition activity indicated that fermentation time was most significant among the parameters within the conditions tested. To validate the model, two different conditions among those generated by the Design Expert program were selected. Under both conditions, predicted and experimental data agreed well. Moreover, the content of protodioscin (a well-known compound related to anti-inflammation activity) was elevated after fermentation of the AC extract at the optimized fermentation condition.
Laurie, Cathy C.; Chasalow, Scott D.; LeDeaux, John R.; McCarroll, Robert; Bush, David; Hauge, Brian; Lai, Chaoqiang; Clark, Darryl; Rocheford, Torbert R.; Dudley, John W.
2004-01-01
In one of the longest-running experiments in biology, researchers at the University of Illinois have selected for altered composition of the maize kernel since 1896. Here we use an association study to infer the genetic basis of dramatic changes that occurred in response to selection for changes in oil concentration. The study population was produced by a cross between the high- and low-selection lines at generation 70, followed by 10 generations of random mating and the derivation of 500 lines by selfing. These lines were genotyped for 488 genetic markers and the oil concentration was evaluated in replicated field trials. Three methods of analysis were tested in simulations for ability to detect quantitative trait loci (QTL). The most effective method was model selection in multiple regression. This method detected ∼50 QTL accounting for ∼50% of the genetic variance, suggesting that >50 QTL are involved. The QTL effect estimates are small and largely additive. About 20% of the QTL have negative effects (i.e., not predicted by the parental difference), which is consistent with hitchhiking and small population size during selection. The large number of QTL detected accounts for the smooth and sustained response to selection throughout the twentieth century. PMID:15611182
Algorithm For Solution Of Subset-Regression Problems
NASA Technical Reports Server (NTRS)
Verhaegen, Michel
1991-01-01
Reliable and flexible algorithm for solution of subset-regression problem performs QR decomposition with new column-pivoting strategy, enables selection of subset directly from originally defined regression parameters. This feature, in combination with number of extensions, makes algorithm very flexible for use in analysis of subset-regression problems in which parameters have physical meanings. Also extended to enable joint processing of columns contaminated by noise with those free of noise, without using scaling techniques.
Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A
2015-01-01
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. PMID:26126540
Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A
2015-12-01
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.
Prado, Elizabeth L; Hartini, Sri; Rahmawati, Atik; Ismayani, Elfa; Hidayati, Astri; Hikmah, Nurul; Muadz, Husni; Apriatni, Mandri S; Ullman, Michael T; Shankar, Anuraj H; Alcock, Katherine J
2010-03-01
Evaluating the impact of nutrition interventions on developmental outcomes in developing countries can be challenging since most assessment tests have been produced in and for developed country settings. Such tests may not be valid measures of children's abilities when used in a new context. We present several principles for the selection, adaptation, and evaluation of tests assessing the developmental outcomes of nutrition interventions in developing countries where standard assessment tests do not exist. We then report the application of these principles for a nutrition trial on the Indonesian island of Lombok. Three hundred children age 22-55 months in Lombok participated in a series of pilot tests for the purpose of test adaptation and evaluation. Four hundred and eighty-seven 42-month-old children in Lombok were tested on the finalized test battery. The developmental assessment tests were adapted to the local context and evaluated for a number of psychometric properties, including convergent and discriminant validity, which were measured based on multiple regression models with maternal education, depression, and age predicting each test score. The adapted tests demonstrated satisfactory psychometric properties and the expected pattern of relationships with the three maternal variables. Maternal education significantly predicted all scores but one, maternal depression predicted socio-emotional competence, socio-emotional problems, and vocabulary, while maternal age predicted socio-emotional competence only. Following the methodological principles we present resulted in tests that were appropriate for children in Lombok and informative for evaluating the developmental outcomes of nutritional supplementation in the research context. Following this approach in future studies will help to determine which interventions most effectively improve child development in developing countries.
Martin, N H; Ranieri, M L; Murphy, S C; Ralyea, R D; Wiedmann, M; Boor, K J
2011-03-01
Analytical tools that accurately predict the performance of raw milk following its manufacture into commercial food products are of economic interest to the dairy industry. To evaluate the ability of currently applied raw milk microbiological tests to predict the quality of commercially pasteurized fluid milk products, samples of raw milk and 2% fat pasteurized milk were obtained from 4 New York State fluid milk processors for a 1-yr period. Raw milk samples were examined using a variety of tests commonly applied to raw milk, including somatic cell count, standard plate count, psychrotrophic bacteria count, ropy milk test, coliform count, preliminary incubation count, laboratory pasteurization count, and spore pasteurization count. Differential and selective media were used to identify groups of bacteria present in raw milk. Pasteurized milk samples were held at 6°C for 21 d and evaluated for standard plate count, coliform count, and sensory quality throughout shelf-life. Bacterial isolates from select raw and pasteurized milk tests were identified using 16S ribosomal DNA sequencing. Linear regression analysis of raw milk test results versus results reflecting pasteurized milk quality consistently showed low R(2) values (<0.45); the majority of R(2) values were <0.25, indicating small relationship between the results from the raw milk tests and results from tests used to evaluate pasteurized milk quality. Our findings suggest the need for new raw milk tests that measure the specific biological barriers that limit shelf-life and quality of fluid milk products. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System weremore » used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. - Highlights: • Hypertrophy (H) and hypertrophic carcinogenesis (C) were studied by toxicogenomics. • Important genes for H and C were selected by logistic ridge regression analysis. • Amino acid biosynthesis and oxidative responses may be involved in C. • Predictive models for H and C provided 94.8% and 82.7% accuracy, respectively. • The identified genes could be useful for assessment of liver hypertrophy.« less
ERIC Educational Resources Information Center
Mitchell, Don C.; Shen, Xingjia; Green, Matthew J.; Hodgson, Timothy L.
2008-01-01
When people read temporarily ambiguous sentences, there is often an increased prevalence of regressive eye-movements launched from the word that resolves the ambiguity. Traditionally, such regressions have been interpreted at least in part as reflecting readers' efforts to re-read and reconfigure earlier material, as exemplified by the Selective…
Investigation of Genetic Variants Associated with Alzheimer Disease in Parkinson Disease Cognition.
Barrett, Matthew J; Koeppel, Alexander F; Flanigan, Joseph L; Turner, Stephen D; Worrall, Bradford B
2016-01-01
Meta-analysis of genome-wide association studies have implicated multiple single nucleotide polymorphisms (SNPs) and associated genes with Alzheimer disease. The role of these SNPs in cognitive impairment in Parkinson disease (PD) remains incompletely evaluated. The objective of this study was to test alleles associated with risk of Alzheimer disease for association with cognitive impairment in Parkinson disease (PD). Two datasets with PD subjects accessed through the NIH database of Genotypes and Phenotypes contained both single nucleotide polymorphism (SNP) arrays and mini-mental state exam (MMSE) scores. Genetic data underwent rigorous quality control and we selected SNPs for genes associated with AD other than APOE. We constructed logistic regression and ordinal regression models, adjusted for sex, age at MMSE, and duration of PD, to assess the association between selected SNPs and MMSE score. In one dataset, PICALM rs3851179 was associated with cognitive impairment (MMSE < 24) in PD subjects > 70 years old (OR = 2.3; adjusted p-value = 0.017; n = 250) but not in PD subjects ≤ 70 years old. Our finding suggests that PICALM rs3851179 could contribute to cognitive impairment in older patients with PD. It is important that future studies consider the interaction of age and genetic risk factors in the development of cognitive impairment in PD.
Determinants of Prelacteal Feeding in Rural Northern India
Roy, Manas Pratim; Mohan, Uday; Singh, Shivendra Kumar; Singh, Vijay Kumar; Srivastava, Anand Kumar
2014-01-01
Background: Prelacteal feeding is an underestimated problem in a developing country like India, where infant mortality rate is quite high. The present study tried to find out the factors determining prelacteal feeding in rural areas of north India. Methods: A crosssectional study was conducted among recently delivered women of rural Uttar Pradesh, India. Multistage random sampling was used for selecting villages. From them, 352 recently delivered women were selected as the subjects, following systematic random sampling. Chi-square test and logistic regression were used to find out the predictors for prelacteal feeding. Results: Overall, 40.1% of mothers gave prelacteal feeding to their newborn. Factors significantly associated with such practice, after simple logistic regression, were age, caste, socioeconomic status, and place of delivery. At multivariate level, age (odds ratio (OR) = 1.76, 95% confidence interval (CI) = 1.13-2.74), caste and place of delivery (OR = 2.23, 95% CI = 1.21-4.10) were found to determine prelacteal feeding significantly, indicating that young age, high caste, and home deliveries could affect the practice positively. Conclusions: The problem of prelacteal feeding is still prevalent in rural India. Age, caste, and place of delivery were associated with the problem. For ensuring neonatal health, the problem should be addressed with due gravity, with emphasis on exclusive breast feeding. PMID:24932400
Development and evaluation of an electromagnetic hypersensitivity questionnaire for Japanese people
Tokiya, Mikiko; Mizuki, Masami; Miyata, Mikio; Kanatani, Kumiko T.; Takagi, Airi; Tsurikisawa, Naomi; Kame, Setsuko; Katoh, Takahiko; Tsujiuchi, Takuya; Kumano, Hiroaki
2016-01-01
The purpose of the present study was to evaluate the validity and reliability of a Japanese version of an electromagnetic hypersensitivity (EHS) questionnaire, originally developed by Eltiti et al. in the United Kingdom. Using this Japanese EHS questionnaire, surveys were conducted on 1306 controls and 127 self‐selected EHS subjects in Japan. Principal component analysis of controls revealed eight principal symptom groups, namely, nervous, skin‐related, head‐related, auditory and vestibular, musculoskeletal, allergy‐related, sensory, and heart/chest‐related. The reliability of the Japanese EHS questionnaire was confirmed by high to moderate intraclass correlation coefficients in a test–retest analysis, and high Cronbach's α coefficients (0.853–0.953) from each subscale. A comparison of scores of each subscale between self‐selected EHS subjects and age‐ and sex‐matched controls using bivariate logistic regression analysis, Mann–Whitney U‐ and χ 2 tests, verified the validity of the questionnaire. This study demonstrated that the Japanese EHS questionnaire is reliable and valid, and can be used for surveillance of EHS individuals in Japan. Furthermore, based on multiple logistic regression and receiver operating characteristic analyses, we propose specific preliminary criteria for screening EHS individuals in Japan. Bioelectromagnetics. 37:353–372, 2016. © 2016 The Authors. Bioelectromagnetics Published by Wiley Periodicals, Inc. PMID:27324106
Ponsoda, Vicente; Martínez, Kenia; Pineda-Pardo, José A; Abad, Francisco J; Olea, Julio; Román, Francisco J; Barbey, Aron K; Colom, Roberto
2017-02-01
Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp 38:803-816, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
The use of modelling to evaluate and adapt strategies for animal disease control.
Saegerman, C; Porter, S R; Humblet, M F
2011-08-01
Disease is often associated with debilitating clinical signs, disorders or production losses in animals and/or humans, leading to severe socio-economic repercussions. This explains the high priority that national health authorities and international organisations give to selecting control strategies for and the eradication of specific diseases. When a control strategy is selected and implemented, an effective method of evaluating its efficacy is through modelling. To illustrate the usefulness of models in evaluating control strategies, the authors describe several examples in detail, including three examples of classification and regression tree modelling to evaluate and improve the early detection of disease: West Nile fever in equids, bovine spongiform encephalopathy (BSE) and multifactorial diseases, such as colony collapse disorder (CCD) in the United States. Also examined are regression modelling to evaluate skin test practices and the efficacy of an awareness campaign for bovine tuberculosis (bTB); mechanistic modelling to monitor the progress of a control strategy for BSE; and statistical nationwide modelling to analyse the spatio-temporal dynamics of bTB and search for potential risk factors that could be used to target surveillance measures more effectively. In the accurate application of models, an interdisciplinary rather than a multidisciplinary approach is required, with the fewest assumptions possible.
Naguib, Ibrahim A; Abdelrahman, Maha M; El Ghobashy, Mohamed R; Ali, Nesma A
2016-01-01
Two accurate, sensitive, and selective stability-indicating methods are developed and validated for simultaneous quantitative determination of agomelatine (AGM) and its forced degradation products (Deg I and Deg II), whether in pure forms or in pharmaceutical formulations. Partial least-squares regression (PLSR) and spectral residual augmented classical least-squares (SRACLS) are two chemometric models that are being subjected to a comparative study through handling UV spectral data in range (215-350 nm). For proper analysis, a three-factor, four-level experimental design was established, resulting in a training set consisting of 16 mixtures containing different ratios of interfering species. An independent test set consisting of eight mixtures was used to validate the prediction ability of the suggested models. The results presented indicate the ability of mentioned multivariate calibration models to analyze AGM, Deg I, and Deg II with high selectivity and accuracy. The analysis results of the pharmaceutical formulations were statistically compared to the reference HPLC method, with no significant differences observed regarding accuracy and precision. The SRACLS model gives comparable results to the PLSR model; however, it keeps the qualitative spectral information of the classical least-squares algorithm for analyzed components.
Relation between trinucleotide GAA repeat length and sensory neuropathy in Friedreich's ataxia.
Santoro, L; De Michele, G; Perretti, A; Crisci, C; Cocozza, S; Cavalcanti, F; Ragno, M; Monticelli, A; Filla, A; Caruso, G
1999-01-01
To verify if GAA expansion size in Friedreich's ataxia could account for the severity of sensory neuropathy. Retrospective study of 56 patients with Friedreich's ataxia selected according to homozygosity for GAA expansion and availability of electrophysiological findings. Orthodromic sensory conduction velocity in the median nerve was available in all patients and that of the tibial nerve in 46 of them. Data of sural nerve biopsy and of a morphometric analysis were available in 12 of the selected patients. The sensory action potential amplitude at the wrist (wSAP) and at the medial malleolus (m mal SAP) and the percentage of myelinated fibres with diameter larger than 7, 9, and 11 microm in the sural nerve were correlated with disease duration and GAA expansion size on the shorter (GAA1) and larger (GAA2) expanded allele in each pair. Pearson's correlation test and stepwise multiple regression were used for statistical analysis. A significant inverse correlation between GAA1 size and wSAP, m mal SAP, and percentage of myelinated fibres was found. Stepwise multiple regression showed that GAA1 size significantly affects electrophysiological and morphometric data, whereas duration of disease has no effect. The data suggest that the severity of the sensory neuropathy is probably genetically determined and that it is not progressive.
Igne, Benoît; de Juan, Anna; Jaumot, Joaquim; Lallemand, Jordane; Preys, Sébastien; Drennen, James K; Anderson, Carl A
2014-10-01
The implementation of a blend monitoring and control method based on a process analytical technology such as near infrared spectroscopy requires the selection and optimization of numerous criteria that will affect the monitoring outputs and expected blend end-point. Using a five component formulation, the present article contrasts the modeling strategies and end-point determination of a traditional quantitative method based on the prediction of the blend parameters employing partial least-squares regression with a qualitative strategy based on principal component analysis and Hotelling's T(2) and residual distance to the model, called Prototype. The possibility to monitor and control blend homogeneity with multivariate curve resolution was also assessed. The implementation of the above methods in the presence of designed experiments (with variation of the amount of active ingredient and excipients) and with normal operating condition samples (nominal concentrations of the active ingredient and excipients) was tested. The impact of criteria used to stop the blends (related to precision and/or accuracy) was assessed. Results demonstrated that while all methods showed similarities in their outputs, some approaches were preferred for decision making. The selectivity of regression based methods was also contrasted with the capacity of qualitative methods to determine the homogeneity of the entire formulation. Copyright © 2014. Published by Elsevier B.V.
Predicting summer monsoon of Bhutan based on SST and teleconnection indices
NASA Astrophysics Data System (ADS)
Dorji, Singay; Herath, Srikantha; Mishra, Binaya Kumar; Chophel, Ugyen
2018-02-01
The paper uses a statistical method of predicting summer monsoon over Bhutan using the ocean-atmospheric circulation variables of sea surface temperature (SST), mean sea-level pressure (MSLP), and selected teleconnection indices. The predictors are selected based on the correlation. They are the SST and MSLP of the Bay of Bengal and the Arabian Sea and the MSLP of Bangladesh and northeast India. The Northern Hemisphere teleconnections of East Atlantic Pattern (EA), West Pacific Pattern (WP), Pacific/North American Pattern, and East Atlantic/West Russia Pattern (EA/WR). The rainfall station data are grouped into two regions with principal components analysis and Ward's hierarchical clustering algorithm. A support vector machine for regression model is proposed to predict the monsoon. The model shows improved skills over traditional linear regression. The model was able to predict the summer monsoon for the test data from 2011 to 2015 with a total monthly root mean squared error of 112 mm for region A and 33 mm for region B. Model could also forecast the 2016 monsoon of the South Asia Monsoon Outlook of World Meteorological Organization (WMO) for Bhutan. The reliance on agriculture and hydropower economy makes the prediction of summer monsoon highly valuable information for farmers and various other sectors. The proposed method can predict summer monsoon for operational forecasting.
A Plate Waste Evaluation of the Farm to School Program.
Kropp, Jaclyn D; Abarca-Orozco, Saul J; Israel, Glenn D; Diehl, David C; Galindo-Gonzalez, Sebastian; Headrick, Lauren B; Shelnutt, Karla P
2018-04-01
To investigate the impacts of the Farm to School (FTS) Program on the selection and consumption of fruits and vegetables. Plate waste data were recorded using the visual inspection method before and after implementation of the program. Six elementary schools in Florida: 3 treatment and 3 control schools. A total of 11,262 meal observations of National School Lunch Program (NSLP) participants in grades 1-5. The FTS Program, specifically local procurement of NSLP offerings, began in treatment schools in November, 2015 after the researchers collected preintervention data. The NSLP participants' selection and consumption of fruits and vegetables. Data were analyzed using Mann-Whitney U and proportions tests and difference-in-difference regressions. The NSLP participants at the treatment schools consumed, on average, 0.061 (P = .002) more servings of vegetables and 0.055 (P = .05) more servings of fruit after implementation of the FTS Program. When school-level fixed effects are included, ordinary least squares and tobit regression results indicated that NSLP participants at the treatment schools respectively consumed 0.107 (P < .001) and 0.086 (P < .001) more servings of vegetables, on average, after implementation of the FTS Program. Local procurement positively affected healthy eating. Copyright © 2017 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Prevention of motor‐vehicle deaths by changing vehicle factors
Robertson, Leon S
2007-01-01
Objective To estimate the effect of changing vehicle factors to reduce mortality in a comprehensive study. Design/methods Odds of death in the United States during 2000–2005 were analyzed, involving specific makes and models of 1999–2005 model year cars, minivans, and sport utility vehicles using logistic regression after selection of factors to be included by examination of least‐squares correlations of vehicle factors to maximize independence of predictors. Based on the regression coefficients, percentages of deaths preventable by changes in selected factors were calculated. Correlations of vehicle characteristics to environmental and behavioral risk factors were also examined to assess any potential confounding. Results Deaths in the studied vehicles would have been 42% lower had all had electronic stability control (ESC) systems. Improved crashworthiness as measured by offset frontal and side crash tests would have produced an additional 28% reduction, and static stability improvement would have reduced the deaths 11%. Although weight–power that reduces fuel economy is associated with lower risk to drivers, it increases risk of deaths to pedestrians and bicyclists but has an overall minor effect compared to the other factors. Conclusion A large majority of motor‐vehicle‐related fatalities could be avoided by universal adoption of the most effective technologies. PMID:17916886
Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Chen, Charles; Porth, Ilga; El-Kassaby, Yousry A
2015-05-09
Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.
NASA Technical Reports Server (NTRS)
Beck, L. R.; Rodriguez, M. H.; Dister, S. W.; Rodriguez, A. D.; Washino, R. K.; Roberts, D. R.; Spanner, M. A.
1997-01-01
A blind test of two remote sensing-based models for predicting adult populations of Anopheles albimanus in villages, an indicator of malaria transmission risk, was conducted in southern Chiapas, Mexico. One model was developed using a discriminant analysis approach, while the other was based on regression analysis. The models were developed in 1992 for an area around Tapachula, Chiapas, using Landsat Thematic Mapper (TM) satellite data and geographic information system functions. Using two remotely sensed landscape elements, the discriminant model was able to successfully distinguish between villages with high and low An. albimanus abundance with an overall accuracy of 90%. To test the predictive capability of the models, multitemporal TM data were used to generate a landscape map of the Huixtla area, northwest of Tapachula, where the models were used to predict risk for 40 villages. The resulting predictions were not disclosed until the end of the test. Independently, An. albimanus abundance data were collected in the 40 randomly selected villages for which the predictions had been made. These data were subsequently used to assess the models' accuracies. The discriminant model accurately predicted 79% of the high-abundance villages and 50% of the low-abundance villages, for an overall accuracy of 70%. The regression model correctly identified seven of the 10 villages with the highest mosquito abundance. This test demonstrated that remote sensing-based models generated for one area can be used successfully in another, comparable area.
Huynh-Tran, V H; Gilbert, H; David, I
2017-11-01
The objective of the present study was to compare a random regression model, usually used in genetic analyses of longitudinal data, with the structured antedependence (SAD) model to study the longitudinal feed conversion ratio (FCR) in growing Large White pigs and to propose criteria for animal selection when used for genetic evaluation. The study was based on data from 11,790 weekly FCR measures collected on 1,186 Large White male growing pigs. Random regression (RR) using orthogonal polynomial Legendre and SAD models was used to estimate genetic parameters and predict FCR-based EBV for each of the 10 wk of the test. The results demonstrated that the best SAD model (1 order of antedependence of degree 2 and a polynomial of degree 2 for the innovation variance for the genetic and permanent environmental effects, i.e., 12 parameters) provided a better fit for the data than RR with a quadratic function for the genetic and permanent environmental effects (13 parameters), with Bayesian information criteria values of -10,060 and -9,838, respectively. Heritabilities with the SAD model were higher than those of RR over the first 7 wk of the test. Genetic correlations between weeks were higher than 0.68 for short intervals between weeks and decreased to 0.08 for the SAD model and -0.39 for RR for the longest intervals. These differences in genetic parameters showed that, contrary to the RR approach, the SAD model does not suffer from border effect problems and can handle genetic correlations that tend to 0. Summarized breeding values were proposed for each approach as linear combinations of the individual weekly EBV weighted by the coefficients of the first or second eigenvector computed from the genetic covariance matrix of the additive genetic effects. These summarized breeding values isolated EBV trajectories over time, capturing either the average general value or the slope of the trajectory. Finally, applying the SAD model over a reduced period of time suggested that similar selection choices would result from the use of the records from the first 8 wk of the test. To conclude, the SAD model performed well for the genetic evaluation of longitudinal phenotypes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cardenas, C; The University of Texas Graduate School of Biomedical Sciences, Houston, TX; Wong, A
Purpose: To develop and test population-based machine learning algorithms for delineating high-dose clinical target volumes (CTVs) in H&N tumors. Automating and standardizing the contouring of CTVs can reduce both physician contouring time and inter-physician variability, which is one of the largest sources of uncertainty in H&N radiotherapy. Methods: Twenty-five node-negative patients treated with definitive radiotherapy were selected (6 right base of tongue, 11 left and 9 right tonsil). All patients had GTV and CTVs manually contoured by an experienced radiation oncologist prior to treatment. This contouring process, which is driven by anatomical, pathological, and patient specific information, typically results inmore » non-uniform margin expansions about the GTV. Therefore, we tested two methods to delineate high-dose CTV given a manually-contoured GTV: (1) regression-support vector machines(SVM) and (2) classification-SVM. These models were trained and tested on each patient group using leave-one-out cross-validation. The volume difference(VD) and Dice similarity coefficient(DSC) between the manual and auto-contoured CTV were calculated to evaluate the results. Distances from GTV-to-CTV were computed about each patient’s GTV and these distances, in addition to distances from GTV to surrounding anatomy in the expansion direction, were utilized in the regression-SVM method. The classification-SVM method used categorical voxel-information (GTV, selected anatomical structures, else) from a 3×3×3cm3 ROI centered about the voxel to classify voxels as CTV. Results: Volumes for the auto-contoured CTVs ranged from 17.1 to 149.1cc and 17.4 to 151.9cc; the average(range) VD between manual and auto-contoured CTV were 0.93 (0.48–1.59) and 1.16(0.48–1.97); while average(range) DSC values were 0.75(0.59–0.88) and 0.74(0.59–0.81) for the regression-SVM and classification-SVM methods, respectively. Conclusion: We developed two novel machine learning methods to delineate high-dose CTV for H&N patients. Both methods showed promising results that hint to a solution to the standardization of the contouring process of clinical target volumes. Varian Medical Systems grant.« less
An Update on Statistical Boosting in Biomedicine.
Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf
2017-01-01
Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.
Marriage Advantages in Perinatal Health: Evidence of Marriage Selection or Marriage Protection?
Kane, Jennifer B.
2015-01-01
Marriage is a social tie associated with health advantages for adults and their children, as lower rates of preterm birth and low birth weight are observed among married women. This study tests two competing hypotheses explaining these marriage advantages—marriage protection versus marriage selection—using a sample of recent births to single, cohabiting, and married women from the National Survey of Family Growth, 2006–10. Propensity score matching and fixed effects regression results demonstrate support for marriage selection, as a rich set of early life selection factors account for all of the cohabiting-married disparity and part of the single-married disparity. Subsequent analyses demonstrate prenatal smoking mediates the adjusted single-married disparity in birth weight, lending some support for the marriage protection perspective. Study findings sharpen our understanding of why and how marriage matters for child well-being, and provide insight into preconception and prenatal factors describing intergenerational transmissions of inequality via birth weight. PMID:26778858
Testing the influenza–tuberculosis selective mortality hypothesis with Union Army data⋆
Noymer, Andrew
2009-01-01
Using Cox regression, this paper shows a weak association between having tuberculosis and dying from influenza among Union Army veterans in late nineteenth-century America. It has been suggested elsewhere [Noymer, A. and M. Garenne (2000). The 1918 influenza epidemic’s effects on sex differentials in mortality in the United States. Population and Development Review 26(3), 565–581.] that the 1918 influenza pandemic accelerated the decline of tuberculosis, by killing many people with tuberculosis. The question remains whether individuals with tuberculosis were at greater risk of influenza death, or if the 1918/post-1918 phenomenon arose from the sheer number of deaths in the influenza pandemic. The present findings, from microdata, cautiously point toward an explanation of Noymer and Garenne’s selection effect in terms of age-overlap of the 1918 pandemic mortality and tuberculosis morbidity, a phenomenon I term “passive selection”. Another way to think of this is selection at the cohort, as opposed to individual, level. PMID:19304361
McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
2009-01-01
Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817
Chinese time trade-off values for EQ-5D health states.
Liu, Gordon G; Wu, Hongyan; Li, Minghui; Gao, Chen; Luo, Nan
2014-07-01
To generate a Chinese general population-based three-level EuroQol five-dimensios (EQ-5D-3L) social value set using the time trade-off method. The study sample was drawn from five cities in China: Beijing, Guangzhou, Shenyang, Chengdu, and Nanjing, using a quota sampling method. Utility values for a subset of 97 health states defined by the EQ-5D-3L descriptive system were directly elicited from the study sample using a modified Measurement and Valuation of Health protocol, with each respondent valuing 13 of the health states. The utility values for all 243 EQ-5D-3L health states were estimated on the basis of econometric models at both individual and aggregate levels. Various linear regression models using different model specifications were examined to determine the best model using predefined model selection criteria. The N3 model based on ordinary least square regression at the aggregate level yielded the best model fit, with a mean absolute error of 0.020, 7 and 0 states for which prediction errors were greater than 0.05 and 0.10, respectively, in absolute magnitude. This model passed tests for model misspecification (F = 2.7; P = 0.0509, Ramsey Regression Equation Specification Error Test), heteroskedasticity (χ(2) = 0.97; P = 0.3254, Breusch-Pagan/Cook-Weisberg test), and normality of the residuals (χ(2) = 1.285; P = 0.5259, Jarque-Bera test). The range of the predicted values (-0.149 to 0.887) was similar to those estimated in other countries. The study successfully developed Chinese utility values for EQ-5D-3L health states using the time trade-off method. It is the first attempt ever to develop a standardized instrument for quantifying quality-adjusted life-years in China. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Empirical evidence of the importance of comparative studies of diagnostic test accuracy.
Takwoingi, Yemisi; Leeflang, Mariska M G; Deeks, Jonathan J
2013-04-02
Systematic reviews that "compare" the accuracy of 2 or more tests often include different sets of studies for each test. To investigate the availability of direct comparative studies of test accuracy and to assess whether summary estimates of accuracy differ between meta-analyses of noncomparative and comparative studies. Systematic reviews in any language from the Database of Abstracts of Reviews of Effects and the Cochrane Database of Systematic Reviews from 1994 to October 2012. 1 of 2 assessors selected reviews that evaluated at least 2 tests and identified meta-analyses that included both noncomparative studies and comparative studies. 1 of 3 assessors extracted data about review and study characteristics and test performance. 248 reviews compared test accuracy; of the 6915 studies, 2113 (31%) were comparative. Thirty-six reviews (with 52 meta-analyses) had adequate studies to compare results of noncomparative and comparative studies by using a hierarchical summary receiver-operating characteristic meta-regression model for each test comparison. In 10 meta-analyses, noncomparative studies ranked tests in the opposite order of comparative studies. A total of 25 meta-analyses showed more than a 2-fold discrepancy in the relative diagnostic odds ratio between noncomparative and comparative studies. Differences in accuracy estimates between noncomparative and comparative studies were greater than expected by chance (P < 0.001). A paucity of comparative studies limited exploration of direction in bias. Evidence derived from noncomparative studies often differs from that derived from comparative studies. Robustly designed studies in which all patients receive all tests or are randomly assigned to receive one or other of the tests should be more routinely undertaken and are preferred for evidence to guide test selection. National Institute for Health Research (United Kingdom).
Burr, Jaime F; Jamnik, Roni K; Baker, Joseph; Macpherson, Alison; Gledhill, Norman; McGuire, E J
2008-09-01
The primary purpose of this study was to determine the fitness variables with the highest capability for predicting hockey playing potential at the elite level as determined by entry draft selection order. We also examined the differences associated with the predictive abilities of the test components among playing positions. The secondary purpose of this study was to update the physiological profile of contemporary hockey players including positional differences. Fitness test results conducted by our laboratory at the National Hockey League Entry Draft combine were compared with draft selection order on a total of 853 players. Regression models revealed peak anaerobic power output to be important for higher draft round selection in all positions; however, the degree of importance of this measurement varied with playing position. The body index, which is a composite score of height, lean mass, and muscular development, was similarly important in all models, with differing influence by position. Removal of the goalies' data increased predictive capacity, suggesting that talent identification using physical fitness testing of this sort may be more appropriate for skating players. Standing long jump was identified as a significant predictor variable for forwards and defense and could be a useful surrogate for assessing overall hockey potential. Significant differences exist between the physiological profiles of current players based on playing position. There are also positional differences in the relative importance of anthropometric and fitness measures of off-ice hockey tests in relation to draft order. Physical fitness measures and anthropometric data are valuable in helping predict hockey playing potential. Emphasis on anthropometry should be used when comparing elite-level forwards, whereas peak anaerobic power and fatigue rate are more useful for differentiating between defense.
NASA Astrophysics Data System (ADS)
Chen, Hui; Tan, Chao; Lin, Zan; Wu, Tong
2018-01-01
Milk is among the most popular nutrient source worldwide, which is of great interest due to its beneficial medicinal properties. The feasibility of the classification of milk powder samples with respect to their brands and the determination of protein concentration is investigated by NIR spectroscopy along with chemometrics. Two datasets were prepared for experiment. One contains 179 samples of four brands for classification and the other contains 30 samples for quantitative analysis. Principal component analysis (PCA) was used for exploratory analysis. Based on an effective model-independent variable selection method, i.e., minimal-redundancy maximal-relevance (MRMR), only 18 variables were selected to construct a partial least-square discriminant analysis (PLS-DA) model. On the test set, the PLS-DA model based on the selected variable set was compared with the full-spectrum PLS-DA model, both of which achieved 100% accuracy. In quantitative analysis, the partial least-square regression (PLSR) model constructed by the selected subset of 260 variables outperforms significantly the full-spectrum model. It seems that the combination of NIR spectroscopy, MRMR and PLS-DA or PLSR is a powerful tool for classifying different brands of milk and determining the protein content.
Omnibus risk assessment via accelerated failure time kernel machine modeling.
Sinnott, Jennifer A; Cai, Tianxi
2013-12-01
Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai, Tonini, and Lin, 2011). In this article, we derive testing and prediction methods for KM regression under the accelerated failure time (AFT) model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. © 2013, The International Biometric Society.
Development of a miniature solid propellant rocket motor for use in plume simulation studies
NASA Technical Reports Server (NTRS)
Baran, W. J.
1974-01-01
A miniature solid propellant rocket motor has been developed to be used in a program to determine those parameters which must be duplicated in a cold gas flow to produce aerodynamic effects on an experimental model similar to those produced by hot, particle-laden exhaust plumes. Phenomena encountered during the testing of the miniature solid propellant motors included erosive propellant burning caused by high flow velocities parallel to the propellant surface, regressive propellant burning as a result of exposed propellant edges, the deposition of aluminum oxide on the nozzle surfaces sufficient to cause aerodynamic nozzle throat geometry changes, and thermal erosion of the nozzle throat at high chamber pressures. A series of tests was conducted to establish the stability of the rocket chamber pressure and the repeatibility of test conditions. Data are presented which define the tests selected to represent the final test matrix. Qualitative observations are also presented concerning the phenomena experienced based on the results of a large number or rocket tests not directly applicable to the final test matrix.
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
Cross Validation of Selection of Variables in Multiple Regression.
1979-12-01
55 vii CROSS VALIDATION OF SELECTION OF VARIABLES IN MULTIPLE REGRESSION I Introduction Background Long term DoD planning gcals...028545024 .31109000 BF * SS - .008700618 .0471961 Constant - .70977903 85.146786 55 had adequate predictive capabilities; the other two models (the...71ZCO F111D Control 54 73EGO FlIID Computer, General Purpose 55 73EPO FII1D Converter-Multiplexer 56 73HAO flllD Stabilizer Platform 57 73HCO F1ID
NASA Astrophysics Data System (ADS)
Kolomiets, V. I.
2018-03-01
The influence of complex influence of climatic factors (temperature, humidity) and electric mode (supply voltage) on the corrosion resistance of metallization of integrated circuits has been considered. The regression dependence of the average time of trouble-free operation t on the mentioned factors has been established in the form of a modified Arrhenius equation that is adequate in a wide range of factor values and is suitable for selecting accelerated test modes. A technique for evaluating the corrosion resistance of aluminum metallization of depressurized CMOS integrated circuits has been proposed.
NASA Astrophysics Data System (ADS)
Deshmukh, A. A.; Kuthe, S. A.; Palikundwar, U. A.
2018-05-01
In the present paper, the consequences of variation in compositions on the electronegativity (ΔX), atomic radius difference (δ) and the thermal stability (ΔTx) of Mg-Ni-Y bulk metallic glasses (BMGs) are evaluated. In order to understand the effect of variation in compositions on ΔX, δ and ΔTx, regression analysis is performed on the experimentally available data. A linear correlation between both δ and ΔX with regression coefficient 0.93 is observed. Further, compositional variation is performed with δ and then it is correlated to the ΔTx by deriving subsequent equations. It is observed that concentration of Mg, Ni and Y are directly proportional to the δ with regression coefficients 0.93, 0.93 and 0.50 respectively. The positive slope of Ni and Y stated that ΔTx will increase if it has more contribution from both Ni and Y. On the other hand negative slope stated that composition of Mg should be selected in such a way that it will have more stability with Ni and Y. The results obtained from mathematical calculations are also tested by regression analysis of ΔTx with the compositions of individual elements in the alloy. These results conclude that there is a strong dependence of ΔTx of the alloy on the compositions of the constituting elements in the alloy.
Zuniga, Jorge M; Housh, Terry J; Camic, Clayton L; Bergstrom, Haley C; Schmidt, Richard J; Johnson, Glen O
2014-09-01
The purpose of this study was to examine the effect of ramp and step incremental cycle ergometer tests on the assessment of the anaerobic threshold (AT) using 3 different computerized regression-based algorithms. Thirteen healthy adults (mean age and body mass [SD] = 23.4 [3.3] years and body mass = 71.7 [11.1] kg) visited the laboratory on separate occasions. Two-way repeated measures analyses of variance with appropriate follow-up procedures were used to analyze the data. The step protocol resulted in greater mean values across algorithms than the ramp protocol for the V[Combining Dot Above]O2 (step = 1.7 [0.6] L·min and ramp = 1.5 [0.4] L·min) and heart rate (HR) (step = 133 [21] b·min and ramp = 124 [15] b·min) at the AT. There were no significant mean differences, however, in power outputs at the AT between the step (115.2 [44.3] W) and the ramp (112.2 [31.2] W) protocols. Furthermore, there were no significant mean differences for V[Combining Dot Above]O2, HR, or power output across protocols among the 3 computerized regression-based algorithms used to estimate the AT. The current findings suggested that the protocol selection, but not the regression-based algorithms can affect the assessment of the V[Combining Dot Above]O2 and HR at the AT.
Beltrami, V; Buonsanto, A; Di Nuzzo, D; Lattanzio, R
1995-01-01
A correlation between the personality profile and the clinical history in lung cancer patients was studied. Selection of cases included in the sample only surgical patients with a medium educational level and a tested capability to understand a specific questionnaire. One hundred and seventy patients were selected and the so-called C.R.I.C.S. (Clinical-Rated Inventory of Character Style) was applied. Score variations were recorded after curative resection as well as during relapse. Changes in the character profile pattern were found in all subjects who experienced the disease and its surgical treatment. These changes occurred either in "regression"-with an increase of schizoid, narcissistic or hysterical aspects-or in a "positive evolution", with a decrease of paranoid traits and into a depressive position. The two groups of responses demonstrated a similar percentage.
AHADI, H.; JOMEHRI, F.; RAHGOZAR, M.
2013-01-01
Summary Objectives. Despite advances in screening and treatment during past several Decades, cervical cancer remains a major health problem for Iranian women. Recent researches have focused on factors related to development of health behavior in an effort to design effective early interventions. The current study aimed to investigate the role of attachment styles on cervix cancer screening barriers among women of BandarAbbas-Iran. Methods. In an analytic-cross sectional study, 681 women aged 21-65 referring to health centers were selected randomly and after completing written informed consents were investigated by Revised Adult Attachment Scale (RAAS) (Collins and Read), Pap smear screening barriers and demographic data questionnaire. The data were analyzed by Pearson correlation coefficient, linear regressions and chi-square test. Results. The results showed significant association between attachment styles and screening barriers. There was a negative significant relation between secure attachment style and screening barriers and there was a positive significant association between insecure attachment style (anxiety and avoidant) and screening barriers. The regression analysis indicated that insecure attachment style (avoidant) were predictors of barriers to the Pap smear screening test in this regard. There was a significant association between age and residential area and participation in Pap smear test. Conclusions. Insecure attachment style is associated with hazardous risk behaviors and these results can be useful for health service providers in preventive planning of screening and identification of people susceptible to risk and the design of the intervention. PMID:24779284
Jacob, Robin; Somers, Marie-Andree; Zhu, Pei; Bloom, Howard
2016-06-01
In this article, we examine whether a well-executed comparative interrupted time series (CITS) design can produce valid inferences about the effectiveness of a school-level intervention. This article also explores the trade-off between bias reduction and precision loss across different methods of selecting comparison groups for the CITS design and assesses whether choosing matched comparison schools based only on preintervention test scores is sufficient to produce internally valid impact estimates. We conduct a validation study of the CITS design based on the federal Reading First program as implemented in one state using results from a regression discontinuity design as a causal benchmark. Our results contribute to the growing base of evidence regarding the validity of nonexperimental designs. We demonstrate that the CITS design can, in our example, produce internally valid estimates of program impacts when multiple years of preintervention outcome data (test scores in the present case) are available and when a set of reasonable criteria are used to select comparison organizations (schools in the present case). © The Author(s) 2016.
Go for broke: The role of somatic states when asked to lose in the Iowa Gambling Task.
Wright, Rebecca J; Rakow, Tim; Russo, Riccardo
2017-02-01
The Somatic Marker Hypothesis (SMH) posits that somatic states develop and guide advantageous decision making by "marking" disadvantageous options (i.e., arousal increases when poor options are considered). This assumption was tested using the standard Iowa Gambling Task (IGT) in which participants win/lose money by selecting among four decks of cards, and an alternative version, identical in both structure and payoffs, but with the aim changed to lose as much money as possible. This "lose" version of the IGT reverses which decks are advantageous/disadvantageous; and so reverses which decks should be marked by somatic responses - which we assessed via skin conductance (SC). Participants learned to pick advantageously in the original (Win) IGT and in the (new) Lose IGT. Using multilevel regression, some variability in anticipatory SC across blocks was found but no consistent effect of anticipatory SC on disadvantageous deck selections. Thus, while we successfully developed a new way to test the central claims of the SMH, we did not find consistent support for the SMH. Copyright © 2016 The Author(s). Published by Elsevier B.V. All rights reserved.
Accounting for measurement error in log regression models with applications to accelerated testing.
Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M
2018-01-01
In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.
Gall, Stefanie; Müller, Ivan; Walter, Cheryl; Seelig, Harald; Steenkamp, Liana; Pühse, Uwe; du Randt, Rosa; Smith, Danielle; Adams, Larissa; Nqweniso, Siphesihle; Yap, Peiling; Ludyga, Sebastian; Steinmann, Peter; Utzinger, Jürg; Gerber, Markus
2017-05-01
Socioeconomically deprived children are at increased risk of ill-health associated with sedentary behavior, malnutrition, and helminth infection. The resulting reduced physical fitness, growth retardation, and impaired cognitive abilities may impede children's capacity to pay attention. The present study examines how socioeconomic status (SES), parasitic worm infections, stunting, food insecurity, and physical fitness are associated with selective attention and academic achievement in school-aged children. The study cohort included 835 children, aged 8-12 years, from eight primary schools in socioeconomically disadvantaged neighborhoods of Port Elizabeth, South Africa. The d2-test was utilized to assess selective attention. This is a paper and pencil letter-cancellation test consisting of randomly mixed letters d and p with one to four single and/or double quotation marks either over and/or under each letter. Children were invited to mark only the letters d that have double quotation marks. Cardiorespiratory fitness was assessed via the 20 m shuttle run test and muscle strength using the grip strength test. The Kato-Katz thick smear technique was employed to detect helminth eggs in stool samples. SES and food insecurity were determined with a pre-tested questionnaire, while end of year school results were used as an indicator of academic achievement. Children infected with soil-transmitted helminths had lower selective attention, lower school grades (academic achievement scores), and lower grip strength (all p<0.05). In a multiple regression model, low selective attention was associated with soil-transmitted helminth infection (p<0.05) and low shuttle run performance (p<0.001), whereas higher academic achievement was observed in children without soil-transmitted helminth infection (p<0.001) and with higher shuttle run performance (p<0.05). Soil-transmitted helminth infections and low physical fitness appear to hamper children's capacity to pay attention and thereby impede their academic performance. Poor academic achievement will make it difficult for children to realize their full potential, perpetuating a vicious cycle of poverty and poor health. ClinicalTrials.gov ISRCTN68411960.
Müller, Ivan; Walter, Cheryl; Seelig, Harald; Steenkamp, Liana; Pühse, Uwe; du Randt, Rosa; Smith, Danielle; Adams, Larissa; Nqweniso, Siphesihle; Yap, Peiling; Ludyga, Sebastian; Steinmann, Peter; Utzinger, Jürg; Gerber, Markus
2017-01-01
Background Socioeconomically deprived children are at increased risk of ill-health associated with sedentary behavior, malnutrition, and helminth infection. The resulting reduced physical fitness, growth retardation, and impaired cognitive abilities may impede children’s capacity to pay attention. The present study examines how socioeconomic status (SES), parasitic worm infections, stunting, food insecurity, and physical fitness are associated with selective attention and academic achievement in school-aged children. Methodology The study cohort included 835 children, aged 8–12 years, from eight primary schools in socioeconomically disadvantaged neighborhoods of Port Elizabeth, South Africa. The d2-test was utilized to assess selective attention. This is a paper and pencil letter-cancellation test consisting of randomly mixed letters d and p with one to four single and/or double quotation marks either over and/or under each letter. Children were invited to mark only the letters d that have double quotation marks. Cardiorespiratory fitness was assessed via the 20 m shuttle run test and muscle strength using the grip strength test. The Kato-Katz thick smear technique was employed to detect helminth eggs in stool samples. SES and food insecurity were determined with a pre-tested questionnaire, while end of year school results were used as an indicator of academic achievement. Principal findings Children infected with soil-transmitted helminths had lower selective attention, lower school grades (academic achievement scores), and lower grip strength (all p<0.05). In a multiple regression model, low selective attention was associated with soil-transmitted helminth infection (p<0.05) and low shuttle run performance (p<0.001), whereas higher academic achievement was observed in children without soil-transmitted helminth infection (p<0.001) and with higher shuttle run performance (p<0.05). Conclusions/Significance Soil-transmitted helminth infections and low physical fitness appear to hamper children’s capacity to pay attention and thereby impede their academic performance. Poor academic achievement will make it difficult for children to realize their full potential, perpetuating a vicious cycle of poverty and poor health. Trial registration ClinicalTrials.gov ISRCTN68411960 PMID:28481890
Fenlon, Caroline; O'Grady, Luke; Butler, Stephen; Doherty, Michael L; Dunnion, John
2017-01-01
Herd fertility in pasture-based dairy farms is a key driver of farm economics. Models for predicting nulliparous reproductive outcomes are rare, but age, genetics, weight, and BCS have been identified as factors influencing heifer conception. The aim of this study was to create a simulation model of heifer conception to service with thorough evaluation. Artificial Insemination service records from two research herds and ten commercial herds were provided to build and evaluate the models. All were managed as spring-calving pasture-based systems. The factors studied were related to age, genetics, and time of service. The data were split into training and testing sets and bootstrapping was used to train the models. Logistic regression (with and without random effects) and generalised additive modelling were selected as the model-building techniques. Two types of evaluation were used to test the predictive ability of the models: discrimination and calibration. Discrimination, which includes sensitivity, specificity, accuracy and ROC analysis, measures a model's ability to distinguish between positive and negative outcomes. Calibration measures the accuracy of the predicted probabilities with the Hosmer-Lemeshow goodness-of-fit, calibration plot and calibration error. After data cleaning and the removal of services with missing values, 1396 services remained to train the models and 597 were left for testing. Age, breed, genetic predicted transmitting ability for calving interval, month and year were significant in the multivariate models. The regression models also included an interaction between age and month. Year within herd was a random effect in the mixed regression model. Overall prediction accuracy was between 77.1% and 78.9%. All three models had very high sensitivity, but low specificity. The two regression models were very well-calibrated. The mean absolute calibration errors were all below 4%. Because the models were not adept at identifying unsuccessful services, they are not suggested for use in predicting the outcome of individual heifer services. Instead, they are useful for the comparison of services with different covariate values or as sub-models in whole-farm simulations. The mixed regression model was identified as the best model for prediction, as the random effects can be ignored and the other variables can be easily obtained or simulated.
An Empirical Study of Eight Nonparametric Tests in Hierarchical Regression.
ERIC Educational Resources Information Center
Harwell, Michael; Serlin, Ronald C.
When normality does not hold, nonparametric tests represent an important data-analytic alternative to parametric tests. However, the use of nonparametric tests in educational research has been limited by the absence of easily performed tests for complex experimental designs and analyses, such as factorial designs and multiple regression analyses,…
Chaurasia, Ashok; Harel, Ofer
2015-02-10
Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.
Dipnall, Joanna F.
2016-01-01
Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin. PMID:26848571
Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny
2016-01-01
Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.
Stone, Wesley W.; Gilliom, Robert J.; Crawford, Charles G.
2008-01-01
Regression models were developed for predicting annual maximum and selected annual maximum moving-average concentrations of atrazine in streams using the Watershed Regressions for Pesticides (WARP) methodology developed by the National Water-Quality Assessment Program (NAWQA) of the U.S. Geological Survey (USGS). The current effort builds on the original WARP models, which were based on the annual mean and selected percentiles of the annual frequency distribution of atrazine concentrations. Estimates of annual maximum and annual maximum moving-average concentrations for selected durations are needed to characterize the levels of atrazine and other pesticides for comparison to specific water-quality benchmarks for evaluation of potential concerns regarding human health or aquatic life. Separate regression models were derived for the annual maximum and annual maximum 21-day, 60-day, and 90-day moving-average concentrations. Development of the regression models used the same explanatory variables, transformations, model development data, model validation data, and regression methods as those used in the original development of WARP. The models accounted for 72 to 75 percent of the variability in the concentration statistics among the 112 sampling sites used for model development. Predicted concentration statistics from the four models were within a factor of 10 of the observed concentration statistics for most of the model development and validation sites. Overall, performance of the models for the development and validation sites supports the application of the WARP models for predicting annual maximum and selected annual maximum moving-average atrazine concentration in streams and provides a framework to interpret the predictions in terms of uncertainty. For streams with inadequate direct measurements of atrazine concentrations, the WARP model predictions for the annual maximum and the annual maximum moving-average atrazine concentrations can be used to characterize the probable levels of atrazine for comparison to specific water-quality benchmarks. Sites with a high probability of exceeding a benchmark for human health or aquatic life can be prioritized for monitoring.
An improved portmanteau test for autocorrelated errors in interrupted time-series regression models.
Huitema, Bradley E; McKean, Joseph W
2007-08-01
A new portmanteau test for autocorrelation among the errors of interrupted time-series regression models is proposed. Simulation results demonstrate that the inferential properties of the proposed Q(H-M) test statistic are considerably more satisfactory than those of the well known Ljung-Box test and moderately better than those of the Box-Pierce test. These conclusions generally hold for a wide variety of autoregressive (AR), moving averages (MA), and ARMA error processes that are associated with time-series regression models of the form described in Huitema and McKean (2000a, 2000b).
Use and interpretation of logistic regression in habitat-selection studies
Keating, Kim A.; Cherry, Steve
2004-01-01
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.
Continuum radiation from active galactic nuclei: A statistical study
NASA Technical Reports Server (NTRS)
Isobe, T.; Feigelson, E. D.; Singh, K. P.; Kembhavi, A.
1986-01-01
The physics of the continuum spectrum of active galactic nuclei (AGNs) was examined using a large data set and rigorous statistical methods. A data base was constructed for 469 objects which include radio selected quasars, optically selected quasars, X-ray selected AGNs, BL Lac objects, and optically unidentified compact radio sources. Each object has measurements of its radio, optical, X-ray core continuum luminosity, though many of them are upper limits. Since many radio sources have extended components, the core component were carefully selected out from the total radio luminosity. With survival analysis statistical methods, which can treat upper limits correctly, these data can yield better statistical results than those previously obtained. A variety of statistical tests are performed, such as the comparison of the luminosity functions in different subsamples, and linear regressions of luminosities in different bands. Interpretation of the results leads to the following tentative conclusions: the main emission mechanism of optically selected quasars and X-ray selected AGNs is thermal, while that of BL Lac objects is synchrotron; radio selected quasars may have two different emission mechanisms in the X-ray band; BL Lac objects appear to be special cases of the radio selected quasars; some compact radio sources show the possibility of synchrotron self-Compton (SSC) in the optical band; and the spectral index between the optical and the X-ray bands depends on the optical luminosity.
Fatemi, Mohammad Hossein; Ghorbanzad'e, Mehdi
2009-11-01
Quantitative structure-property relationship models for the prediction of the nematic transition temperature (T (N)) were developed by using multilinear regression analysis and a feedforward artificial neural network (ANN). A collection of 42 thermotropic liquid crystals was chosen as the data set. The data set was divided into three sets: for training, and an internal and external test set. Training and internal test sets were used for ANN model development, and the external test set was used for evaluation of the predictive power of the model. In order to build the models, a set of six descriptors were selected by the best multilinear regression procedure of the CODESSA program. These descriptors were: atomic charge weighted partial negatively charged surface area, relative negative charged surface area, polarity parameter/square distance, minimum most negative atomic partial charge, molecular volume, and the A component of moment of inertia, which encode geometrical and electronic characteristics of molecules. These descriptors were used as inputs to ANN. The optimized ANN model had 6:6:1 topology. The standard errors in the calculation of T (N) for the training, internal, and external test sets using the ANN model were 1.012, 4.910, and 4.070, respectively. To further evaluate the ANN model, a crossvalidation test was performed, which produced the statistic Q (2) = 0.9796 and standard deviation of 2.67 based on predicted residual sum of square. Also, the diversity test was performed to ensure the model's stability and prove its predictive capability. The obtained results reveal the suitability of ANN for the prediction of T (N) for liquid crystals using molecular structural descriptors.
Sala, Isabel; Illán-Gala, Ignacio; Alcolea, Daniel; Sánchez-Saudinós, Ma Belén; Salgado, Sergio Andrés; Morenas-Rodríguez, Estrella; Subirana, Andrea; Videla, Laura; Clarimón, Jordi; Carmona-Iragui, María; Ribosa-Nogué, Roser; Blesa, Rafael; Fortea, Juan; Lleó, Alberto
2017-01-01
Episodic memory impairment is the core feature of typical Alzheimer's disease. To evaluate the performance of two commonly used verbal memory tests to detect mild cognitive impairment due to Alzheimer's disease (MCI-AD) and to predict progression to Alzheimer's disease dementia (AD-d). Prospective study of MCI patients in a tertiary memory disorder unit. Patients underwent an extensive neuropsychological battery including two tests of declarative verbal memory: The Free and Cued Selective Reminding Test (FCSRT) and the word list learning task from the Consortium to Establish a Registry for Alzheimer's disease (CERAD-WL). Cerebrospinal fluid (CSF) was obtained from all patients and MCI-AD was defined by means of the t-Tau/Aβ1-42 ratio. Logistic regression analyses tested whether the combination of FCSRT and CERAD-WL measures significantly improved the prediction of MCI-AD. Progression to AD-d was analyzed in a Cox regression model. A total of 202 MCI patients with a mean follow-up of 34.2±24.2 months were included and 98 (48.5%) met the criteria for MCI-AD. The combination of FCSRT and CERAD-WL measures improved MCI-AD classification accuracy based on CSF biomarkers. Both tests yielded similar global predictive values (59.9-65.3% and 59.4-62.8% for FCSRT and CERAD-WL, respectively). MCI-AD patients with deficits in both FCSRT and CERAD-WL had a faster progression to AD-d than patients with deficits in only one test. The combination of FCSRT and CERAD-WL improves the classification of MCI-AD and defines different prognostic profiles. These findings have important implications for clinical practice and the design of clinical trials.
Asquith, William H.; Slade, R.M.
1999-01-01
The U.S. Geological Survey, in cooperation with the Texas Department of Transportation, has developed a computer program to estimate peak-streamflow frequency for ungaged sites in natural basins in Texas. Peak-streamflow frequency refers to the peak streamflows for recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Peak-streamflow frequency estimates are needed by planners, managers, and design engineers for flood-plain management; for objective assessment of flood risk; for cost-effective design of roads and bridges; and also for the desin of culverts, dams, levees, and other flood-control structures. The program estimates peak-streamflow frequency using a site-specific approach and a multivariate generalized least-squares linear regression. A site-specific approach differs from a traditional regional regression approach by developing unique equations to estimate peak-streamflow frequency specifically for the ungaged site. The stations included in the regression are selected using an informal cluster analysis that compares the basin characteristics of the ungaged site to the basin characteristics of all the stations in the data base. The program provides several choices for selecting the stations. Selecting the stations using cluster analysis ensures that the stations included in the regression will have the most pertinent information about flooding characteristics of the ungaged site and therefore provide the basis for potentially improved peak-streamflow frequency estimation. An evaluation of the site-specific approach in estimating peak-streamflow frequency for gaged sites indicates that the site-specific approach is at least as accurate as a traditional regional regression approach.
Determinants of customer satisfaction with hospitals: a managerial model.
Andaleeb, S S
1998-01-01
States that rapid changes in the environment have exerted significant pressures on hospitals to incorporate patient satisfaction in their strategic stance and quest for market share and long-term viability. This study proposes and tests a five-factor model that explains considerable variation in customer satisfaction with hospitals. These factors include communication with patients, competence of the staff, their demeanour, quality of the facilities, and perceived costs; they also represent strategic concepts that managers can address in their bid to remain competitive. A probability sample was selected and a multiple regression model used to test the hypotheses. The results indicate that all five variables were significant in the model and explained 62 per cent of the variation in the dependent variable. Managerial implications of the proposed model are discussed.
Random Bits Forest: a Strong Classifier/Regressor for Big Data
NASA Astrophysics Data System (ADS)
Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li
2016-07-01
Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).
Zabelina, Darya L; O'Leary, Daniel; Pornpattananangkul, Narun; Nusslock, Robin; Beeman, Mark
2015-03-01
Creativity has previously been linked with atypical attention, but it is not clear what aspects of attention, or what types of creativity are associated. Here we investigated specific neural markers of a very early form of attention, namely sensory gating, indexed by the P50 ERP, and how it relates to two measures of creativity: divergent thinking and real-world creative achievement. Data from 84 participants revealed that divergent thinking (assessed with the Torrance Test of Creative Thinking) was associated with selective sensory gating, whereas real-world creative achievement was associated with "leaky" sensory gating, both in zero-order correlations and when controlling for academic test scores in a regression. Thus both creativity measures related to sensory gating, but in opposite directions. Additionally, divergent thinking and real-world creative achievement did not interact in predicting P50 sensory gating, suggesting that these two creativity measures orthogonally relate to P50 sensory gating. Finally, the ERP effect was specific to the P50 - neither divergent thinking nor creative achievement were related to later components, such as the N100 and P200. Overall results suggest that leaky sensory gating may help people integrate ideas that are outside of focus of attention, leading to creativity in the real world; whereas divergent thinking, measured by divergent thinking tests which emphasize numerous responses within a limited time, may require selective sensory processing more than previously thought. Copyright © 2015 Elsevier Ltd. All rights reserved.
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.
Farhate, Camila Viana Vieira; Souza, Zigomar Menezes de; Oliveira, Stanley Robson de Medeiros; Tavares, Rose Luiza Moraes; Carvalho, João Luís Nunes
2018-01-01
Soil CO2 emissions are regarded as one of the largest flows of the global carbon cycle and small changes in their magnitude can have a large effect on the CO2 concentration in the atmosphere. Thus, a better understanding of this attribute would enable the identification of promoters and the development of strategies to mitigate the risks of climate change. Therefore, our study aimed at using data mining techniques to predict the soil CO2 emission induced by crop management in sugarcane areas in Brazil. To do so, we used different variable selection methods (correlation, chi-square, wrapper) and classification (Decision tree, Bayesian models, neural networks, support vector machine, bagging with logistic regression), and finally we tested the efficiency of different approaches through the Receiver Operating Characteristic (ROC) curve. The original dataset consisted of 19 variables (18 independent variables and one dependent (or response) variable). The association between cover crop and minimum tillage are effective strategies to promote the mitigation of soil CO2 emissions, in which the average CO2 emissions are 63 kg ha-1 day-1. The variables soil moisture, soil temperature (Ts), rainfall, pH, and organic carbon were most frequently selected for soil CO2 emission classification using different methods for attribute selection. According to the results of the ROC curve, the best approaches for soil CO2 emission classification were the following: (I)-the Multilayer Perceptron classifier with attribute selection through the wrapper method, that presented rate of false positive of 13,50%, true positive of 94,20% area under the curve (AUC) of 89,90% (II)-the Bagging classifier with logistic regression with attribute selection through the Chi-square method, that presented rate of false positive of 13,50%, true positive of 94,20% AUC of 89,90%. However, the (I) approach stands out in relation to (II) for its higher positive class accuracy (high CO2 emission) and lower computational cost.
de Souza, Zigomar Menezes; Oliveira, Stanley Robson de Medeiros; Tavares, Rose Luiza Moraes; Carvalho, João Luís Nunes
2018-01-01
Soil CO2 emissions are regarded as one of the largest flows of the global carbon cycle and small changes in their magnitude can have a large effect on the CO2 concentration in the atmosphere. Thus, a better understanding of this attribute would enable the identification of promoters and the development of strategies to mitigate the risks of climate change. Therefore, our study aimed at using data mining techniques to predict the soil CO2 emission induced by crop management in sugarcane areas in Brazil. To do so, we used different variable selection methods (correlation, chi-square, wrapper) and classification (Decision tree, Bayesian models, neural networks, support vector machine, bagging with logistic regression), and finally we tested the efficiency of different approaches through the Receiver Operating Characteristic (ROC) curve. The original dataset consisted of 19 variables (18 independent variables and one dependent (or response) variable). The association between cover crop and minimum tillage are effective strategies to promote the mitigation of soil CO2 emissions, in which the average CO2 emissions are 63 kg ha-1 day-1. The variables soil moisture, soil temperature (Ts), rainfall, pH, and organic carbon were most frequently selected for soil CO2 emission classification using different methods for attribute selection. According to the results of the ROC curve, the best approaches for soil CO2 emission classification were the following: (I)–the Multilayer Perceptron classifier with attribute selection through the wrapper method, that presented rate of false positive of 13,50%, true positive of 94,20% area under the curve (AUC) of 89,90% (II)–the Bagging classifier with logistic regression with attribute selection through the Chi-square method, that presented rate of false positive of 13,50%, true positive of 94,20% AUC of 89,90%. However, the (I) approach stands out in relation to (II) for its higher positive class accuracy (high CO2 emission) and lower computational cost. PMID:29513765
Application-level regression testing framework using Jenkins
Budiardja, Reuben; Bouvet, Timothy; Arnold, Galen
2017-09-26
Monitoring and testing for regression of large-scale systems such as the NCSA's Blue Waters supercomputer are challenging tasks. In this paper, we describe the solution we came up with to perform those tasks. The goal was to find an automated solution for running user-level regression tests to evaluate system usability and performance. Jenkins, an automation server software, was chosen for its versatility, large user base, and multitude of plugins including collecting data and plotting test results over time. We also describe our Jenkins deployment to launch and monitor jobs on remote HPC system, perform authentication with one-time password, and integratemore » with our LDAP server for its authorization. We show some use cases and describe our best practices for successfully using Jenkins as a user-level system-wide regression testing and monitoring framework for large supercomputer systems.« less
Application-level regression testing framework using Jenkins
DOE Office of Scientific and Technical Information (OSTI.GOV)
Budiardja, Reuben; Bouvet, Timothy; Arnold, Galen
Monitoring and testing for regression of large-scale systems such as the NCSA's Blue Waters supercomputer are challenging tasks. In this paper, we describe the solution we came up with to perform those tasks. The goal was to find an automated solution for running user-level regression tests to evaluate system usability and performance. Jenkins, an automation server software, was chosen for its versatility, large user base, and multitude of plugins including collecting data and plotting test results over time. We also describe our Jenkins deployment to launch and monitor jobs on remote HPC system, perform authentication with one-time password, and integratemore » with our LDAP server for its authorization. We show some use cases and describe our best practices for successfully using Jenkins as a user-level system-wide regression testing and monitoring framework for large supercomputer systems.« less
A probabilistic and multi-objective analysis of lexicase selection and ε-lexicase selection.
Cava, William La; Helmuth, Thomas; Spector, Lee; Moore, Jason H
2018-05-10
Lexicase selection is a parent selection method that considers training cases individually, rather than in aggregate, when performing parent selection. Whereas previous work has demonstrated the ability of lexicase selection to solve difficult problems in program synthesis and symbolic regression, the central goal of this paper is to develop the theoretical underpinnings that explain its performance. To this end, we derive an analytical formula that gives the expected probabilities of selection under lexicase selection, given a population and its behavior. In addition, we expand upon the relation of lexicase selection to many-objective optimization methods to describe the behavior of lexicase selection, which is to select individuals on the boundaries of Pareto fronts in high-dimensional space. We show analytically why lexicase selection performs more poorly for certain sizes of population and training cases, and show why it has been shown to perform more poorly in continuous error spaces. To address this last concern, we propose new variants of ε-lexicase selection, a method that modifies the pass condition in lexicase selection to allow near-elite individuals to pass cases, thereby improving selection performance with continuous errors. We show that ε-lexicase outperforms several diversity-maintenance strategies on a number of real-world and synthetic regression problems.
A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test
NASA Technical Reports Server (NTRS)
Messer, Bradley
2007-01-01
Propulsion ground test facilities face the daily challenge of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Over the last decade NASA s propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and exceeded the capabilities of numerous test facility and test article components. A logistic regression mathematical modeling technique has been developed to predict the probability of successfully completing a rocket propulsion test. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),.., X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure of accomplishing a full duration test. The use of logistic regression modeling is not new; however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from this type of model provide project managers with insight and confidence into the effectiveness of rocket propulsion ground testing.
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
[Cervical screening: toward a new paradigm?].
Lavoué, V; Bergeron, C; Riethmuller, D; Daraï, E; Mergui, J-L; Baldauf, J-J; Gondry, J; Douvier, S; Lopès, P; de Reilhac, P; Quéreux, C; Letombe, B; Marchetta, J; Boulanger, J-C; Levêque, J
2010-04-01
Analysis of the trials which compare the virologic testing (HPV testing) and the cytology in the cervical screening. The MedLine database was consulted using the Keywords: "cervical screening", "pap smear", "liquid based cytology", "HPV testing", "adults", "adolescents", "cervical intraepithelial neoplasia (CIN)", "uterine cervix cancer". Articles were selected according their concern about the debate of the uterine cervix cancer screening in France. The HPV testing seems interesting allowing a decreasing delay in the diagnosis of CIN (more diagnosis of CIN2+ in the first round and less during the second one). But, when the two rounds are added, the number of CIN2+ are identical in the two arms (cytology and HPV testing) in all the trials (except the Italian NTCC trial). A negative HPV testing protects the women much longer than cytology can do: a delay of five years between two rounds seems ideal. The HPV testing alone increases the detection rate of cervical lesions, which could regress spontaneously and may induce an overtreatment, especially in the youngest population: a triage is necessary and the cytology appears to be the best way to select the candidates for colposcopy in case of positive HPV testing and cytology. The HPV infection presents some particularities in adolescent females: for this reason, the HPV testing should not be used in this special population. In vaccinated women, a consensus for the screening is necessary. The health care providers in France have to understand the characteristics of the HPV testing: its advantages compared to the cytologic screening are only evident in case of an organization of the screening in France and even in Europe. (c) 2010 Elsevier Masson SAS. All rights reserved.
Importance of spatial autocorrelation in modeling bird distributions at a continental scale
Bahn, V.; O'Connor, R.J.; Krohn, W.B.
2006-01-01
Spatial autocorrelation in species' distributions has been recognized as inflating the probability of a type I error in hypotheses tests, causing biases in variable selection, and violating the assumption of independence of error terms in models such as correlation or regression. However, it remains unclear whether these problems occur at all spatial resolutions and extents, and under which conditions spatially explicit modeling techniques are superior. Our goal was to determine whether spatial models were superior at large extents and across many different species. In addition, we investigated the importance of purely spatial effects in distribution patterns relative to the variation that could be explained through environmental conditions. We studied distribution patterns of 108 bird species in the conterminous United States using ten years of data from the Breeding Bird Survey. We compared the performance of spatially explicit regression models with non-spatial regression models using Akaike's information criterion. In addition, we partitioned the variance in species distributions into an environmental, a pure spatial and a shared component. The spatially-explicit conditional autoregressive regression models strongly outperformed the ordinary least squares regression models. In addition, partialling out the spatial component underlying the species' distributions showed that an average of 17% of the explained variation could be attributed to purely spatial effects independent of the spatial autocorrelation induced by the underlying environmental variables. We concluded that location in the range and neighborhood play an important role in the distribution of species. Spatially explicit models are expected to yield better predictions especially for mobile species such as birds, even in coarse-grained models with a large extent. ?? Ecography.
Vindimian, Éric; Garric, Jeanne; Flammarion, Patrick; Thybaud, Éric; Babut, Marc
1999-10-01
The evaluation of the ecotoxicity of effluents requires a battery of biological tests on several species. In order to derive a summary parameter from such a battery, a single endpoint was calculated for all the tests: the EC10, obtained by nonlinear regression, with bootstrap evaluation of the confidence intervals. Principal component analysis was used to characterize and visualize the correlation between the tests. The table of the toxicity of the effluents was then submitted to a panel of experts, who classified the effluents according to the test results. Partial least squares (PLS) regression was used to fit the average value of the experts' judgements to the toxicity data, using a simple equation. Furthermore, PLS regression on partial data sets and other considerations resulted in an optimum battery, with two chronic tests and one acute test. The index is intended to be used for the classification of effluents based on their toxicity to aquatic species. Copyright © 1999 SETAC.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vindimian, E.; Garric, J.; Flammarion, P.
1999-10-01
The evaluation of the ecotoxicity of effluents requires a battery of biological tests on several species. In order to derive a summary parameter from such a battery, a single endpoint was calculated for all the tests: the EC10, obtained by nonlinear regression, with bootstrap evaluation of the confidence intervals. Principal component analysis was used to characterize and visualize the correlation between the tests. The table of the toxicity of the effluents was then submitted to a panel of experts, who classified the effluents according to the test results. Partial least squares (PLS) regression was used to fit the average valuemore » of the experts' judgments to the toxicity data, using a simple equation. Furthermore, PLS regression on partial data sets and other considerations resulted in an optimum battery, with two chronic tests and one acute test. The index is intended to be used for the classification of effluents based on their toxicity to aquatic species.« less