NASA Astrophysics Data System (ADS)
Das, Siddhartha; Siopsis, George; Weedbrook, Christian
2018-02-01
With the significant advancement in quantum computation during the past couple of decades, the exploration of machine-learning subroutines using quantum strategies has become increasingly popular. Gaussian process regression is a widely used technique in supervised classical machine learning. Here we introduce an algorithm for Gaussian process regression using continuous-variable quantum systems that can be realized with technology based on photonic quantum computers under certain assumptions regarding distribution of data and availability of efficient quantum access. Our algorithm shows that by using a continuous-variable quantum computer a dramatic speedup in computing Gaussian process regression can be achieved, i.e., the possibility of exponentially reducing the time to compute. Furthermore, our results also include a continuous-variable quantum-assisted singular value decomposition method of nonsparse low rank matrices and forms an important subroutine in our Gaussian process regression algorithm.
Interpreting Bivariate Regression Coefficients: Going beyond the Average
ERIC Educational Resources Information Center
Halcoussis, Dennis; Phillips, G. Michael
2010-01-01
Statistics, econometrics, investment analysis, and data analysis classes often review the calculation of several types of averages, including the arithmetic mean, geometric mean, harmonic mean, and various weighted averages. This note shows how each of these can be computed using a basic regression framework. By recognizing when a regression model…
An Effect Size for Regression Predictors in Meta-Analysis
ERIC Educational Resources Information Center
Aloe, Ariel M.; Becker, Betsy Jane
2012-01-01
A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…
NASA Technical Reports Server (NTRS)
Boyce, Lola; Bast, Callie C.
1992-01-01
The research included ongoing development of methodology that provides probabilistic lifetime strength of aerospace materials via computational simulation. A probabilistic material strength degradation model, in the form of a randomized multifactor interaction equation, is postulated for strength degradation of structural components of aerospace propulsion systems subjected to a number of effects or primative variables. These primative variable may include high temperature, fatigue or creep. In most cases, strength is reduced as a result of the action of a variable. This multifactor interaction strength degradation equation has been randomized and is included in the computer program, PROMISS. Also included in the research is the development of methodology to calibrate the above described constitutive equation using actual experimental materials data together with linear regression of that data, thereby predicting values for the empirical material constraints for each effect or primative variable. This regression methodology is included in the computer program, PROMISC. Actual experimental materials data were obtained from the open literature for materials typically of interest to those studying aerospace propulsion system components. Material data for Inconel 718 was analyzed using the developed methodology.
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.
Hu, Yi-Chung
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
An Interactive Version of MULR04 With Enhanced Graphic Capability
ERIC Educational Resources Information Center
Burkholder, Joel H.
1978-01-01
An existing computer program for computing multiple regression analyses is made interactive in order to alleviate core storage requirements. Also, some improvements in the graphics aspects of the program are included. (JKS)
Linear regression metamodeling as a tool to summarize and present simulation model results.
Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M
2013-10-01
Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
An improved multiple linear regression and data analysis computer program package
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Computational tools for exact conditional logistic regression.
Corcoran, C; Mehta, C; Patel, N; Senchaudhuri, P
Logistic regression analyses are often challenged by the inability of unconditional likelihood-based approximations to yield consistent, valid estimates and p-values for model parameters. This can be due to sparseness or separability in the data. Conditional logistic regression, though useful in such situations, can also be computationally unfeasible when the sample size or number of explanatory covariates is large. We review recent developments that allow efficient approximate conditional inference, including Monte Carlo sampling and saddlepoint approximations. We demonstrate through real examples that these methods enable the analysis of significantly larger and more complex data sets. We find in this investigation that for these moderately large data sets Monte Carlo seems a better alternative, as it provides unbiased estimates of the exact results and can be executed in less CPU time than can the single saddlepoint approximation. Moreover, the double saddlepoint approximation, while computationally the easiest to obtain, offers little practical advantage. It produces unreliable results and cannot be computed when a maximum likelihood solution does not exist. Copyright 2001 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Boyce, Lola; Bast, Callie C.; Trimble, Greg A.
1992-01-01
This report presents the results of a fourth year effort of a research program, conducted for NASA-LeRC by the University of Texas at San Antonio (UTSA). The research included on-going development of methodology that provides probabilistic lifetime strength of aerospace materials via computational simulation. A probabilistic material strength degradation model, in the form of a randomized multifactor interaction equation, is postulated for strength degradation of structural components of aerospace propulsion systems subject to a number of effects or primitive variables. These primitive variables may include high temperature, fatigue or creep. In most cases, strength is reduced as a result of the action of a variable. This multifactor interaction strength degradation equation has been randomized and is included in the computer program, PROMISS. Also included in the research is the development of methodology to calibrate the above-described constitutive equation using actual experimental materials data together with regression analysis of that data, thereby predicting values for the empirical material constants for each effect or primitive variable. This regression methodology is included in the computer program, PROMISC. Actual experimental materials data were obtained from industry and the open literature for materials typically for applications in aerospace propulsion system components. Material data for Inconel 718 has been analyzed using the developed methodology.
NASA Technical Reports Server (NTRS)
Boyce, Lola; Bast, Callie C.; Trimble, Greg A.
1992-01-01
The results of a fourth year effort of a research program conducted for NASA-LeRC by The University of Texas at San Antonio (UTSA) are presented. The research included on-going development of methodology that provides probabilistic lifetime strength of aerospace materials via computational simulation. A probabilistic material strength degradation model, in the form of a randomized multifactor interaction equation, is postulated for strength degradation of structural components of aerospace propulsion systems subjected to a number of effects or primitive variables. These primitive variables may include high temperature, fatigue, or creep. In most cases, strength is reduced as a result of the action of a variable. This multifactor interaction strength degradation equation was randomized and is included in the computer program, PROMISC. Also included in the research is the development of methodology to calibrate the above-described constitutive equation using actual experimental materials data together with regression analysis of that data, thereby predicting values for the empirical material constants for each effect or primitive variable. This regression methodology is included in the computer program, PROMISC. Actual experimental materials data were obtained from industry and the open literature for materials typically for applications in aerospace propulsion system components. Material data for Inconel 718 was analyzed using the developed methodology.
Flood-frequency prediction methods for unregulated streams of Tennessee, 2000
Law, George S.; Tasker, Gary D.
2003-01-01
Up-to-date flood-frequency prediction methods for unregulated, ungaged rivers and streams of Tennessee have been developed. Prediction methods include the regional-regression method and the newer region-of-influence method. The prediction methods were developed using stream-gage records from unregulated streams draining basins having from 1 percent to about 30 percent total impervious area. These methods, however, should not be used in heavily developed or storm-sewered basins with impervious areas greater than 10 percent. The methods can be used to estimate 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence-interval floods of most unregulated rural streams in Tennessee. A computer application was developed that automates the calculation of flood frequency for unregulated, ungaged rivers and streams of Tennessee. Regional-regression equations were derived by using both single-variable and multivariable regional-regression analysis. Contributing drainage area is the explanatory variable used in the single-variable equations. Contributing drainage area, main-channel slope, and a climate factor are the explanatory variables used in the multivariable equations. Deleted-residual standard error for the single-variable equations ranged from 32 to 65 percent. Deleted-residual standard error for the multivariable equations ranged from 31 to 63 percent. These equations are included in the computer application to allow easy comparison of results produced by the different methods. The region-of-influence method calculates multivariable regression equations for each ungaged site and recurrence interval using basin characteristics from 60 similar sites selected from the study area. Explanatory variables that may be used in regression equations computed by the region-of-influence method include contributing drainage area, main-channel slope, a climate factor, and a physiographic-region factor. Deleted-residual standard error for the region-of-influence method tended to be only slightly smaller than those for the regional-regression method and ranged from 27 to 62 percent.
ERIC Educational Resources Information Center
Deignan, Gerard M.; And Others
This report contains a comparative analysis of the differential effectiveness of computer-assisted instruction (CAI), programmed instructional text (PIT), and lecture methods of instruction in three medical courses--Medical Laboratory, Radiology, and Dental. The summative evaluation includes (1) multiple regression analyses conducted to predict…
Multiple regression technique for Pth degree polynominals with and without linear cross products
NASA Technical Reports Server (NTRS)
Davis, J. W.
1973-01-01
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
Spectroscopic analysis and control
Tate; , James D.; Reed, Christopher J.; Domke, Christopher H.; Le, Linh; Seasholtz, Mary Beth; Weber, Andy; Lipp, Charles
2017-04-18
Apparatus for spectroscopic analysis which includes a tunable diode laser spectrometer having a digital output signal and a digital computer for receiving the digital output signal from the spectrometer, the digital computer programmed to process the digital output signal using a multivariate regression algorithm. In addition, a spectroscopic method of analysis using such apparatus. Finally, a method for controlling an ethylene cracker hydrogenator.
Probabilistic lifetime strength of aerospace materials via computational simulation
NASA Technical Reports Server (NTRS)
Boyce, Lola; Keating, Jerome P.; Lovelace, Thomas B.; Bast, Callie C.
1991-01-01
The results of a second year effort of a research program are presented. The research included development of methodology that provides probabilistic lifetime strength of aerospace materials via computational simulation. A probabilistic phenomenological constitutive relationship, in the form of a randomized multifactor interaction equation, is postulated for strength degradation of structural components of aerospace propulsion systems subjected to a number of effects of primitive variables. These primitive variables often originate in the environment and may include stress from loading, temperature, chemical, or radiation attack. This multifactor interaction constitutive equation is included in the computer program, PROMISS. Also included in the research is the development of methodology to calibrate the constitutive equation using actual experimental materials data together with the multiple linear regression of that data.
Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.
2009-01-01
In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
NiftyNet: a deep-learning platform for medical imaging.
Gibson, Eli; Li, Wenqi; Sudre, Carole; Fidon, Lucas; Shakir, Dzhoshkun I; Wang, Guotai; Eaton-Rosen, Zach; Gray, Robert; Doel, Tom; Hu, Yipeng; Whyntie, Tom; Nachev, Parashkev; Modat, Marc; Barratt, Dean C; Ourselin, Sébastien; Cardoso, M Jorge; Vercauteren, Tom
2018-05-01
Medical image analysis and computer-assisted intervention problems are increasingly being addressed with deep-learning-based solutions. Established deep-learning platforms are flexible but do not provide specific functionality for medical image analysis and adapting them for this domain of application requires substantial implementation effort. Consequently, there has been substantial duplication of effort and incompatible infrastructure developed across many research groups. This work presents the open-source NiftyNet platform for deep learning in medical imaging. The ambition of NiftyNet is to accelerate and simplify the development of these solutions, and to provide a common mechanism for disseminating research outputs for the community to use, adapt and build upon. The NiftyNet infrastructure provides a modular deep-learning pipeline for a range of medical imaging applications including segmentation, regression, image generation and representation learning applications. Components of the NiftyNet pipeline including data loading, data augmentation, network architectures, loss functions and evaluation metrics are tailored to, and take advantage of, the idiosyncracies of medical image analysis and computer-assisted intervention. NiftyNet is built on the TensorFlow framework and supports features such as TensorBoard visualization of 2D and 3D images and computational graphs by default. We present three illustrative medical image analysis applications built using NiftyNet infrastructure: (1) segmentation of multiple abdominal organs from computed tomography; (2) image regression to predict computed tomography attenuation maps from brain magnetic resonance images; and (3) generation of simulated ultrasound images for specified anatomical poses. The NiftyNet infrastructure enables researchers to rapidly develop and distribute deep learning solutions for segmentation, regression, image generation and representation learning applications, or extend the platform to new applications. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.
Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952
Ridge: a computer program for calculating ridge regression estimates
Donald E. Hilt; Donald W. Seegrist
1977-01-01
Least-squares coefficients for multiple-regression models may be unstable when the independent variables are highly correlated. Ridge regression is a biased estimation procedure that produces stable estimates of the coefficients. Ridge regression is discussed, and a computer program for calculating the ridge coefficients is presented.
Study of Personnel Attrition and Revocation within U.S. Marine Corps Air Traffic Control Specialties
2012-03-01
Entrance Processing Stations (MEPS) and recruit depots, to include non-cognitive testing, such as Navy Computer Adaptive Personality Scales ( NCAPS ...Revocation, Selection, MOS, Regression, Probit, dProbit, STATA, Statistics, Marginal Effects, ASVAB, AFQT, Composite Scores, Screening, NCAPS 15. NUMBER...Navy Computer Adaptive Personality Scales ( NCAPS ), during recruitment. It is also recommended that an economic analysis be conducted comparing the
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
Jennings, M.E.; Thomas, W.O.; Riggs, H.C.
1994-01-01
For many years, the U.S. Geological Survey (USGS) has been involved in the development of regional regression equations for estimating flood magnitude and frequency at ungaged sites. These regression equations are used to transfer flood characteristics from gaged to ungaged sites through the use of watershed and climatic characteristics as explanatory or predictor variables. Generally these equations have been developed on a statewide or metropolitan area basis as part of cooperative study programs with specific State Departments of Transportation or specific cities. The USGS, in cooperation with the Federal Highway Administration and the Federal Emergency Management Agency, has compiled all the current (as of September 1993) statewide and metropolitan area regression equations into a micro-computer program titled the National Flood Frequency Program.This program includes regression equations for estimating flood-peak discharges and techniques for estimating a typical flood hydrograph for a given recurrence interval peak discharge for unregulated rural and urban watersheds. These techniques should be useful to engineers and hydrologists for planning and design applications. This report summarizes the statewide regression equations for rural watersheds in each State, summarizes the applicable metropolitan area or statewide regression equations for urban watersheds, describes the National Flood Frequency Program for making these computations, and provides much of the reference information on the extrapolation variables needed to run the program.
REVEAL: An Extensible Reduced Order Model Builder for Simulation and Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, Khushbu; Sharma, Poorva; Ma, Jinliang
2013-04-30
Many science domains need to build computationally efficient and accurate representations of high fidelity, computationally expensive simulations. These computationally efficient versions are known as reduced-order models. This paper presents the design and implementation of a novel reduced-order model (ROM) builder, the REVEAL toolset. This toolset generates ROMs based on science- and engineering-domain specific simulations executed on high performance computing (HPC) platforms. The toolset encompasses a range of sampling and regression methods that can be used to generate a ROM, automatically quantifies the ROM accuracy, and provides support for an iterative approach to improve ROM accuracy. REVEAL is designed to bemore » extensible in order to utilize the core functionality with any simulator that has published input and output formats. It also defines programmatic interfaces to include new sampling and regression techniques so that users can ‘mix and match’ mathematical techniques to best suit the characteristics of their model. In this paper, we describe the architecture of REVEAL and demonstrate its usage with a computational fluid dynamics model used in carbon capture.« less
NASA Technical Reports Server (NTRS)
Gaston, S.; Wertheim, M.; Orourke, J. A.
1973-01-01
Summary, consolidation and analysis of specifications, manufacturing process and test controls, and performance results for OAO-2 and OAO-3 lot 20 Amp-Hr sealed nickel cadmium cells and batteries are reported. Correlation of improvements in control requirements with performance is a key feature. Updates for a cell/battery computer model to improve performance prediction capability are included. Applicability of regression analysis computer techniques to relate process controls to performance is checked.
STATLIB: NSWC Library of Statistical Programs and Subroutines
1989-08-01
Uncorrelated Weighted Polynomial Regression 41 .WEPORC Correlated Weighted Polynomial Regression 45 MROP Multiple Regression Using Orthogonal Polynomials ...could not and should not be con- NSWC TR 89-97 verted to the new general purpose computer (the current CDC 995). Some were designed tu compute...personal computers. They are referred to as SPSSPC+, BMDPC, and SASPC and in general are less comprehensive than their mainframe counterparts. The basic
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Burns, A.W.
1988-01-01
This report describes an interactive-accounting model used to simulate streamflow, chemical-constituent concentrations and loads, and water-supply operations in a river basin. The model uses regression equations to compute flow from incremental (internode) drainage areas. Conservative chemical constituents (typically dissolved solids) also are computed from regression equations. Both flow and water quality loads are accumulated downstream. Optionally, the model simulates the water use and the simplified groundwater systems of a basin. Water users include agricultural, municipal, industrial, and in-stream users , and reservoir operators. Water users list their potential water sources, including direct diversions, groundwater pumpage, interbasin imports, or reservoir releases, in the order in which they will be used. Direct diversions conform to basinwide water law priorities. The model is interactive, and although the input data exist in files, the user can modify them interactively. A major feature of the model is its color-graphic-output options. This report includes a description of the model, organizational charts of subroutines, and examples of the graphics. Detailed format instructions for the input data, example files of input data, definitions of program variables, and listing of the FORTRAN source code are Attachments to the report. (USGS)
Bootstrap Methods: A Very Leisurely Look.
ERIC Educational Resources Information Center
Hinkle, Dennis E.; Winstead, Wayland H.
The Bootstrap method, a computer-intensive statistical method of estimation, is illustrated using a simple and efficient Statistical Analysis System (SAS) routine. The utility of the method for generating unknown parameters, including standard errors for simple statistics, regression coefficients, discriminant function coefficients, and factor…
SPSS macros to compare any two fitted values from a regression model.
Weaver, Bruce; Dubois, Sacha
2012-12-01
In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.
Southard, Rodney E.
2013-01-01
The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical estimates on one of these streams can be calculated at an ungaged location that has a drainage area that is between 40 percent of the drainage area of the farthest upstream streamgage and within 150 percent of the drainage area of the farthest downstream streamgage along the stream of interest. The second method may be used on any stream with a streamgage that has operated for 10 years or longer and for which anthropogenic effects have not changed the low-flow characteristics at the ungaged location since collection of the streamflow data. A ratio of drainage area of the stream at the ungaged location to the drainage area of the stream at the streamgage was computed to estimate the statistic at the ungaged location. The range of applicability is between 40- and 150-percent of the drainage area of the streamgage, and the ungaged location must be located on the same stream as the streamgage. The third method uses regional regression equations to estimate selected low-flow frequency statistics for unregulated streams in Missouri. This report presents regression equations to estimate frequency statistics for the 10-year recurrence interval and for the N-day durations of 1, 2, 3, 7, 10, 30, and 60 days. Basin and climatic characteristics were computed using geographic information system software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses based on existing digital geospatial data and previous studies. Spatial analyses for geographical bias in the predictive accuracy of the regional regression equations defined three low-flow regions with the State representing the three major physiographic provinces in Missouri. Region 1 includes the Central Lowlands, Region 2 includes the Ozark Plateaus, and Region 3 includes the Mississippi Alluvial Plain. A total of 207 streamgages were used in the regression analyses for the regional equations. Of the 207 U.S. Geological Survey streamgages, 77 were located in Region 1, 120 were located in Region 2, and 10 were located in Region 3. Streamgages located outside of Missouri were selected to extend the range of data used for the independent variables in the regression analyses. Streamgages included in the regression analyses had 10 or more years of record and were considered to be affected minimally by anthropogenic activities or trends. Regional regression analyses identified three characteristics as statistically significant for the development of regional equations. For Region 1, drainage area, longest flow path, and streamflow-variability index were statistically significant. The range in the standard error of estimate for Region 1 is 79.6 to 94.2 percent. For Region 2, drainage area and streamflow variability index were statistically significant, and the range in the standard error of estimate is 48.2 to 72.1 percent. For Region 3, drainage area and streamflow-variability index also were statistically significant with a range in the standard error of estimate of 48.1 to 96.2 percent. Limitations on the use of estimating low-flow frequency statistics at ungaged locations are dependent on the method used. The first method outlined for use in Missouri, power curve equations, were developed to estimate the selected statistics for ungaged locations on 28 selected streams with multiple streamgages located on the same stream. A second method uses a drainage-area ratio to compute statistics at an ungaged location using data from a single streamgage on the same stream with 10 or more years of record. Ungaged locations on these streams may use the ratio of the drainage area at an ungaged location to the drainage area at a streamgage location to scale the selected statistic value from the streamgage location to the ungaged location. This method can be used if the drainage area of the ungaged location is within 40 to 150 percent of the streamgage drainage area. The third method is the use of the regional regression equations. The limits for the use of these equations are based on the ranges of the characteristics used as independent variables and that streams must be affected minimally by anthropogenic activities.
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications
Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric
2016-01-01
Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
A simulation study on Bayesian Ridge regression models for several collinearity levels
NASA Astrophysics Data System (ADS)
Efendi, Achmad; Effrihan
2017-12-01
When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.
Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors.
Woodard, Dawn B; Crainiceanu, Ciprian; Ruppert, David
2013-01-01
We propose a new method for regression using a parsimonious and scientifically interpretable representation of functional predictors. Our approach is designed for data that exhibit features such as spikes, dips, and plateaus whose frequency, location, size, and shape varies stochastically across subjects. We propose Bayesian inference of the joint functional and exposure models, and give a method for efficient computation. We contrast our approach with existing state-of-the-art methods for regression with functional predictors, and show that our method is more effective and efficient for data that include features occurring at varying locations. We apply our methodology to a large and complex dataset from the Sleep Heart Health Study, to quantify the association between sleep characteristics and health outcomes. Software and technical appendices are provided in online supplemental materials.
Cantekin, Kenan; Sekerci, Ahmet Ercan; Buyuk, Suleyman Kutalmis
2013-12-01
Computed tomography (CT) is capable of providing accurate and measurable 3-dimensional images of the third molar. The aims of this study were to analyze the development of the mandibular third molar and its relation to chronological age and to create new reference data for a group of Turkish participants aged 9 to 25 years on the basis of cone-beam CT images. All data were obtained from the patients' records including medical, social, and dental anamnesis and cone-beam CT images of 752 patients. Linear regression analysis was performed to obtain regression formulas for dental age calculation with chronological age and to determine the coefficient of determination (r) for each sex. Statistical analysis showed a strong correlation between age and third-molar development for the males (r2 = 0.80) and the females (r2 = 0.78). Computed tomographic images are clinically useful for accurate and reliable estimation of dental ages of children and youth.
Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G
2015-01-01
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer’s disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM. PMID:26900412
Combustion Processes in Hybrid Rocket Engines
NASA Technical Reports Server (NTRS)
Venkateswaran,S.; Merkle, C. L.
1996-01-01
In recent years, there has been a resurgence of interest in the development of hybrid rocket engines for advanced launch vehicle applications. Hybrid propulsion systems use a solid fuel such as hydroxyl-terminated polybutadiene (HTPB) along with a gaseous/liquid oxidizer. The performance of hybrid combustors depends on the convective and radiative heat fluxes to the fuel surface, the rate of pyrolysis in the solid phase, and the turbulent combustion processes in the gaseous phases. These processes in combination specify the regression rates of the fuel surface and thereby the utilization efficiency of the fuel. In this paper, we employ computational fluid dynamics (CFD) techniques in order to gain a quantitative understanding of the physical trends in hybrid rocket combustors. The computational modeling is tailored to ongoing experiments at Penn State that employ a two dimensional slab burner configuration. The coordinated computational/experimental effort enables model validation while providing an understanding of the experimental observations. Computations to date have included the full length geometry with and with the aft nozzle section as well as shorter length domains for extensive parametric characterization. HTPB is sed as the fuel with 1,3 butadiene being taken as the gaseous product of the pyrolysis. Pure gaseous oxygen is taken as the oxidizer. The fuel regression rate is specified using an Arrhenius rate reaction, which the fuel surface temperature is given by an energy balance involving gas-phase convection and radiation as well as thermal conduction in the solid-phase. For the gas-phase combustion, a two step global reaction is used. The standard kappa - epsilon model is used for turbulence closure. Radiation is presently treated using a simple diffusion approximation which is valid for large optical path lengths, representative of radiation from soot particles. Computational results are obtained to determine the trends in the fuel burning or regression rates as a function of the head-end oxidizer mass flux, G=rho(e)U(e), and the chamber pressure. Furthermore, computation of the full slab burner configuration has also been obtained for various stages of the burn. Comparisons with available experimental data from small scale tests conducted by General Dynamics-Thiokol-Rocketdyne suggest reasonable agreement in the predicted regression rates. Future work will include: (1) a model for soot generation in the flame for more quantitative radiative transfer modelling, (2) a parametric study of combustion efficiency, and (3) transient calculations to help determine the possible mechanisms responsible for combustion instability in hybrid rocket motors.
An analysis of ratings: A guide to RMRATE
Thomas C. Brown; Terry C. Daniel; Herbert W. Schroeder; Glen E. Brink
1990-01-01
This report describes RMRATE, a computer program for analyzing rating judgments. RMRATE scales ratings using several scaling procedures, and compares the resulting scale values. The scaling procedures include the median and simple mean, standardized values, scale values based on Thurstone's Law of Categorical Judgment, and regression-based values. RMRATE also...
Computationally efficient algorithm for Gaussian Process regression in case of structured samples
NASA Astrophysics Data System (ADS)
Belyaev, M.; Burnaev, E.; Kapushev, Y.
2016-04-01
Surrogate modeling is widely used in many engineering problems. Data sets often have Cartesian product structure (for instance factorial design of experiments with missing points). In such case the size of the data set can be very large. Therefore, one of the most popular algorithms for approximation-Gaussian Process regression-can be hardly applied due to its computational complexity. In this paper a computationally efficient approach for constructing Gaussian Process regression in case of data sets with Cartesian product structure is presented. Efficiency is achieved by using a special structure of the data set and operations with tensors. Proposed algorithm has low computational as well as memory complexity compared to existing algorithms. In this work we also introduce a regularization procedure allowing to take into account anisotropy of the data set and avoid degeneracy of regression model.
Deletion Diagnostics for Alternating Logistic Regressions
Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.
2013-01-01
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
UCODE, a computer code for universal inverse modeling
Poeter, E.P.; Hill, M.C.
1999-01-01
This article presents the US Geological Survey computer program UCODE, which was developed in collaboration with the US Army Corps of Engineers Waterways Experiment Station and the International Ground Water Modeling Center of the Colorado School of Mines. UCODE performs inverse modeling, posed as a parameter-estimation problem, using nonlinear regression. Any application model or set of models can be used; the only requirement is that they have numerical (ASCII or text only) input and output files and that the numbers in these files have sufficient significant digits. Application models can include preprocessors and postprocessors as well as models related to the processes of interest (physical, chemical and so on), making UCODE extremely powerful for model calibration. Estimated parameters can be defined flexibly with user-specified functions. Observations to be matched in the regression can be any quantity for which a simulated equivalent value can be produced, thus simulated equivalent values are calculated using values that appear in the application model output files and can be manipulated with additive and multiplicative functions, if necessary. Prior, or direct, information on estimated parameters also can be included in the regression. The nonlinear regression problem is solved by minimizing a weighted least-squares objective function with respect to the parameter values using a modified Gauss-Newton method. Sensitivities needed for the method are calculated approximately by forward or central differences and problems and solutions related to this approximation are discussed. Statistics are calculated and printed for use in (1) diagnosing inadequate data or identifying parameters that probably cannot be estimated with the available data, (2) evaluating estimated parameter values, (3) evaluating the model representation of the actual processes and (4) quantifying the uncertainty of model simulated values. UCODE is intended for use on any computer operating system: it consists of algorithms programmed in perl, a freeware language designed for text manipulation and Fortran90, which efficiently performs numerical calculations.
5 CFR 591.219 - How does OPM compute shelter price indexes?
Code of Federal Regulations, 2014 CFR
2014-01-01
... estimates in hedonic regressions (a type of multiple regression) to compute for each COLA survey area the price index for rental and/or rental equivalent units of comparable quality and size between the COLA...
5 CFR 591.219 - How does OPM compute shelter price indexes?
Code of Federal Regulations, 2011 CFR
2011-01-01
... estimates in hedonic regressions (a type of multiple regression) to compute for each COLA survey area the price index for rental and/or rental equivalent units of comparable quality and size between the COLA...
5 CFR 591.219 - How does OPM compute shelter price indexes?
Code of Federal Regulations, 2013 CFR
2013-01-01
... estimates in hedonic regressions (a type of multiple regression) to compute for each COLA survey area the price index for rental and/or rental equivalent units of comparable quality and size between the COLA...
5 CFR 591.219 - How does OPM compute shelter price indexes?
Code of Federal Regulations, 2012 CFR
2012-01-01
... estimates in hedonic regressions (a type of multiple regression) to compute for each COLA survey area the price index for rental and/or rental equivalent units of comparable quality and size between the COLA...
Byun, Bo-Ram; Kim, Yong-Il; Yamaguchi, Tetsutaro; Maki, Koutaro; Son, Woo-Sung
2015-01-01
This study was aimed to examine the correlation between skeletal maturation status and parameters from the odontoid process/body of the second vertebra and the bodies of third and fourth cervical vertebrae and simultaneously build multiple regression models to be able to estimate skeletal maturation status in Korean girls. Hand-wrist radiographs and cone beam computed tomography (CBCT) images were obtained from 74 Korean girls (6-18 years of age). CBCT-generated cervical vertebral maturation (CVM) was used to demarcate the odontoid process and the body of the second cervical vertebra, based on the dentocentral synchondrosis. Correlation coefficient analysis and multiple linear regression analysis were used for each parameter of the cervical vertebrae (P < 0.05). Forty-seven of 64 parameters from CBCT-generated CVM (independent variables) exhibited statistically significant correlations (P < 0.05). The multiple regression model with the greatest R (2) had six parameters (PH2/W2, UW2/W2, (OH+AH2)/LW2, UW3/LW3, D3, and H4/W4) as independent variables with a variance inflation factor (VIF) of <2. CBCT-generated CVM was able to include parameters from the second cervical vertebral body and odontoid process, respectively, for the multiple regression models. This suggests that quantitative analysis might be used to estimate skeletal maturation status.
Calibration Experiments for a Computer Vision Oyster Volume Estimation System
ERIC Educational Resources Information Center
Chang, G. Andy; Kerns, G. Jay; Lee, D. J.; Stanek, Gary L.
2009-01-01
Calibration is a technique that is commonly used in science and engineering research that requires calibrating measurement tools for obtaining more accurate measurements. It is an important technique in various industries. In many situations, calibration is an application of linear regression, and is a good topic to be included when explaining and…
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Lee, Casey J.; Murphy, Jennifer C.; Crawford, Charles G.; Deacon, Jeffrey R.
2017-10-24
The U.S. Geological Survey publishes information on concentrations and loads of water-quality constituents at 111 sites across the United States as part of the U.S. Geological Survey National Water Quality Network (NWQN). This report details historical and updated methods for computing water-quality loads at NWQN sites. The primary updates to historical load estimation methods include (1) an adaptation to methods for computing loads to the Gulf of Mexico; (2) the inclusion of loads computed using the Weighted Regressions on Time, Discharge, and Season (WRTDS) method; and (3) the inclusion of loads computed using continuous water-quality data. Loads computed using WRTDS and continuous water-quality data are provided along with those computed using historical methods. Various aspects of method updates are evaluated in this report to help users of water-quality loading data determine which estimation methods best suit their particular application.
Eash, David A.; Barnes, Kimberlee K.; O'Shea, Padraic S.
2016-09-19
A statewide study was led to develop regression equations for estimating three selected spring and three selected fall low-flow frequency statistics for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include spring (April through June) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years and fall (October through December) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years. Estimates of the three selected spring statistics are provided for 241 U.S. Geological Survey continuous-record streamgages, and estimates of the three selected fall statistics are provided for 238 of these streamgages, using data through June 2014. Because only 9 years of fall streamflow record were available, three streamgages included in the development of the spring regression equations were not included in the development of the fall regression equations. Because of regulation, diversion, or urbanization, 30 of the 241 streamgages were not included in the development of the regression equations. The study area includes Iowa and adjacent areas within 50 miles of the Iowa border. Because trend analyses indicated statistically significant positive trends when considering the period of record for most of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. Geographic information system software was used to measure 63 selected basin characteristics for each of the 211streamgages used to develop the regional regression equations. The study area was divided into three low-flow regions that were defined in a previous study for the development of regional regression equations.Because several streamgages included in the development of regional regression equations have estimates of zero flow calculated from observed streamflow for selected spring and fall low-flow frequency statistics, the final equations for the three low-flow regions were developed using two types of regression analyses—left-censored and generalized-least-squares regression analyses. A total of 211 streamgages were included in the development of nine spring regression equations—three equations for each of the three low-flow regions. A total of 208 streamgages were included in the development of nine fall regression equations—three equations for each of the three low-flow regions. A censoring threshold was used to develop 15 left-censored regression equations to estimate the three fall low-flow frequency statistics for each of the three low-flow regions and to estimate the three spring low-flow frequency statistics for the southern and northwest regions. For the northeast region, generalized-least-squares regression was used to develop three equations to estimate the three spring low-flow frequency statistics. For the northeast region, average standard errors of prediction range from 32.4 to 48.4 percent for the spring equations and average standard errors of estimate range from 56.4 to 73.8 percent for the fall equations. For the northwest region, average standard errors of estimate range from 58.9 to 62.1 percent for the spring equations and from 83.2 to 109.4 percent for the fall equations. For the southern region, average standard errors of estimate range from 43.2 to 64.0 percent for the spring equations and from 78.1 to 78.7 percent for the fall equations.The regression equations are applicable only to stream sites in Iowa with low flows not substantially affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. The regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system application. StreamStats allows users to click on any ungaged stream site and compute estimates of the six selected spring and fall low-flow statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged site are provided. StreamStats also allows users to click on any Iowa streamgage to obtain computed estimates for the six selected spring and fall low-flow statistics.
ERIC Educational Resources Information Center
Kozbelt, Aaron; Dexter, Scott; Dolese, Melissa; Meredith, Daniel; Ostrofsky, Justin
2015-01-01
We applied computer-based text analyses of regressive imagery to verbal protocols of individuals engaged in creative problem-solving in two domains: visual art (23 experts, 23 novices) and computer programming (14 experts, 14 novices). Percentages of words involving primary process and secondary process thought, plus emotion-related words, were…
SCI model structure determination program (OSR) user's guide. [optimal subset regression
NASA Technical Reports Server (NTRS)
1979-01-01
The computer program, OSR (Optimal Subset Regression) which estimates models for rotorcraft body and rotor force and moment coefficients is described. The technique used is based on the subset regression algorithm. Given time histories of aerodynamic coefficients, aerodynamic variables, and control inputs, the program computes correlation between various time histories. The model structure determination is based on these correlations. Inputs and outputs of the program are given.
Ferrell, Gloria M.
2001-01-01
Transport rates for total solids, total nitrogen, total phosphorus, biochemical oxygen demand, chromium, copper, lead, nickel, and zinc during 1994–98 were computed for six stormwater-monitoring sites in Mecklenburg County, North Carolina. These six stormwater-monitoring sites were operated by the Mecklenburg County Department of Environmental Protection, in cooperation with the City of Charlotte, and are located near the mouths of major streams. Constituent transport at the six study sites generally was dominated by nonpoint sources, except for nitrogen and phosphorus at two sites located downstream from the outfalls of major municipal wastewater-treatment plants.To relate land use to constituent transport, regression equations to predict constituent yield were developed by using water-quality data from a previous study of nine stormwater-monitoring sites on small streams in Mecklenburg County. The drainage basins of these nine stormwater sites have relatively homogeneous land-use characteristics compared to the six study sites. Mean annual construction activity, based on building permit files, was estimated for all stormwater-monitoring sites and included as an explanatory variable in the regression equations. These regression equations were used to predict constituent yield for the six study sites. Predicted yields generally were in agreement with computed yields. In addition, yields were predicted by using regression equations derived from a national urban water-quality database. Yields predicted from the regional regression equations generally were about an order of magnitude lower than computed yields.Regression analysis indicated that construction activity was a major contributor to transport of the constituents evaluated in this study except for total nitrogen and biochemical oxygen demand. Transport of total nitrogen and biochemical oxygen demand was dominated by point-source contributions. The two study basins that had the largest amounts of construction activity also had the highest total solids yields (1,300 and 1,500 tons per square mile per year). The highest total phosphorus yields (3.2 and 1.7 tons per square mile per year) attributable to nonpoint sources also occurred in these basins. Concentrations of chromium, copper, lead, nickel, and zinc were positively correlated with total solids concentrations at most of the study sites (Pearson product-moment correlation >0.50). The site having the highest median concentrations of chromium, copper, and nickel also was the site having the highest computed yield for total solids.
Rasmussen, Patrick P.; Eslick, Patrick J.; Ziegler, Andrew C.
2016-08-11
Water from the Little Arkansas River is used as source water for artificial recharge of the Equus Beds aquifer, one of the primary water-supply sources for the city of Wichita, Kansas. The U.S. Geological Survey has operated two continuous real-time water-quality monitoring stations since 1995 on the Little Arkansas River in Kansas. Regression models were developed to establish relations between discretely sampled constituent concentrations and continuously measured physical properties to compute concentrations of those constituents of interest. Site-specific regression models were originally published in 2000 for the near Halstead and near Sedgwick U.S. Geological Survey streamgaging stations and the site-specific regression models were then updated in 2003. This report updates those regression models using discrete and continuous data collected during May 1998 through August 2014. In addition to the constituents listed in the 2003 update, new regression models were developed for total organic carbon. The real-time computations of water-quality concentrations and loads are available at http://nrtwq.usgs.gov. The water-quality information in this report is important to the city of Wichita because water-quality information allows for real-time quantification and characterization of chemicals of concern (including chloride), in addition to nutrients, sediment, bacteria, and atrazine transported in the Little Arkansas River. The water-quality information in this report aids in the decision making for water treatment before artificial recharge.
Gorban, A N; Mirkes, E M; Zinovyev, A
2016-12-01
Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on L 1 norm or even sub-linear potentials corresponding to quasinorms L p (0
ERIC Educational Resources Information Center
Chakrabarti, Rajashri
2013-01-01
Florida's 1999 A-plus program was a consequential accountability program that embedded vouchers in an accountability regime. Under Florida rules, scores of students in several special education (ESE) and limited English proficient (LEP) categories were not included in the computation of school grades. One might expect these rules to induce F…
ERIC Educational Resources Information Center
Jeske, Debora; Roßnagell, Christian Stamov; Backhaus, Joy
2014-01-01
We examined the role of learner characteristics as predictors of four aspects of e-learning performance, including knowledge test performance, learning confidence, learning efficiency, and navigational effectiveness. We used both self reports and log file records to compute the relevant statistics. Regression analyses showed that both need for…
Boosting structured additive quantile regression for longitudinal childhood obesity data.
Fenske, Nora; Fahrmeir, Ludwig; Hothorn, Torsten; Rzehak, Peter; Höhle, Michael
2013-07-25
Childhood obesity and the investigation of its risk factors has become an important public health issue. Our work is based on and motivated by a German longitudinal study including 2,226 children with up to ten measurements on their body mass index (BMI) and risk factors from birth to the age of 10 years. We introduce boosting of structured additive quantile regression as a novel distribution-free approach for longitudinal quantile regression. The quantile-specific predictors of our model include conventional linear population effects, smooth nonlinear functional effects, varying-coefficient terms, and individual-specific effects, such as intercepts and slopes. Estimation is based on boosting, a computer intensive inference method for highly complex models. We propose a component-wise functional gradient descent boosting algorithm that allows for penalized estimation of the large variety of different effects, particularly leading to individual-specific effects shrunken toward zero. This concept allows us to flexibly estimate the nonlinear age curves of upper quantiles of the BMI distribution, both on population and on individual-specific level, adjusted for further risk factors and to detect age-varying effects of categorical risk factors. Our model approach can be regarded as the quantile regression analog of Gaussian additive mixed models (or structured additive mean regression models), and we compare both model classes with respect to our obesity data.
Optoelectronic instrumentation enhancement using data mining feedback for a 3D measurement system
NASA Astrophysics Data System (ADS)
Flores-Fuentes, Wendy; Sergiyenko, Oleg; Gonzalez-Navarro, Félix F.; Rivas-López, Moisés; Hernandez-Balbuena, Daniel; Rodríguez-Quiñonez, Julio C.; Tyrsa, Vera; Lindner, Lars
2016-12-01
3D measurement by a cyber-physical system based on optoelectronic scanning instrumentation has been enhanced by outliers and regression data mining feedback. The prototype has applications in (1) industrial manufacturing systems that include: robotic machinery, embedded vision, and motion control, (2) health care systems for measurement scanning, and (3) infrastructure by providing structural health monitoring. This paper presents new research performed in data processing of a 3D measurement vision sensing database. Outliers from multivariate data have been detected and removal to improve artificial intelligence regression algorithm results. Physical measurement error regression data has been used for 3D measurements error correction. Concluding, that the joint of physical phenomena, measurement and computation is an effectiveness action for feedback loops in the control of industrial, medical and civil tasks.
MULGRES: a computer program for stepwise multiple regression analysis
A. Jeff Martin
1971-01-01
MULGRES is a computer program source deck that is designed for multiple regression analysis employing the technique of stepwise deletion in the search for most significant variables. The features of the program, along with inputs and outputs, are briefly described, with a note on machine compatibility.
Method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1972-01-01
Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
Estimation of standard liver volume in Chinese adult living donors.
Fu-Gui, L; Lu-Nan, Y; Bo, L; Yong, Z; Tian-Fu, W; Ming-Qing, X; Wen-Tao, W; Zhe-Yu, C
2009-12-01
To determine a formula predicting the standard liver volume based on body surface area (BSA) or body weight in Chinese adults. A total of 115 consecutive right-lobe living donors not including the middle hepatic vein underwent right hemi-hepatectomy. No organs were used from prisoners, and no subjects were prisoners. Donor anthropometric data including age, gender, body weight, and body height were recorded prospectively. The weights and volumes of the right lobe liver grafts were measured at the back table. Liver weights and volumes were calculated from the right lobe graft weight and volume obtained at the back table, divided by the proportion of the right lobe on computed tomography. By simple linear regression analysis and stepwise multiple linear regression analysis, we correlated calculated liver volume and body height, body weight, or body surface area. The subjects had a mean age of 35.97 +/- 9.6 years, and a female-to-male ratio of 60:55. The mean volume of the right lobe was 727.47 +/- 136.17 mL, occupying 55.59% +/- 6.70% of the whole liver by computed tomography. The volume of the right lobe was 581.73 +/- 96.137 mL, and the estimated liver volume was 1053.08 +/- 167.56 mL. Females of the same body weight showed a slightly lower liver weight. By simple linear regression analysis and stepwise multiple linear regression analysis, a formula was derived based on body weight. All formulae except the Hong Kong formula overestimated liver volume compared to this formula. The formula of standard liver volume, SLV (mL) = 11.508 x body weight (kg) + 334.024, may be applied to estimate liver volumes in Chinese adults.
Magnitude and Frequency of Floods for Urban and Small Rural Streams in Georgia, 2008
Gotvald, Anthony J.; Knaak, Andrew E.
2011-01-01
A study was conducted that updated methods for estimating the magnitude and frequency of floods in ungaged urban basins in Georgia that are not substantially affected by regulation or tidal fluctuations. Annual peak-flow data for urban streams from September 2008 were analyzed for 50 streamgaging stations (streamgages) in Georgia and 6 streamgages on adjacent urban streams in Florida and South Carolina having 10 or more years of data. Flood-frequency estimates were computed for the 56 urban streamgages by fitting logarithms of annual peak flows for each streamgage to a Pearson Type III distribution. Additionally, basin characteristics for the streamgages were computed by using a geographical information system and computer algorithms. Regional regression analysis, using generalized least-squares regression, was used to develop a set of equations for estimating flows with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities for ungaged urban basins in Georgia. In addition to the 56 urban streamgages, 171 rural streamgages were included in the regression analysis to maintain continuity between flood estimates for urban and rural basins as the basin characteristics pertaining to urbanization approach zero. Because 21 of the rural streamgages have drainage areas less than 1 square mile, the set of equations developed for this study can also be used for estimating small ungaged rural streams in Georgia. Flood-frequency estimates and basin characteristics for 227 streamgages were combined to form the final database used in the regional regression analysis. Four hydrologic regions were developed for Georgia. The final equations are functions of drainage area and percentage of impervious area for three of the regions and drainage area, percentage of developed land, and mean basin slope for the fourth region. Average standard errors of prediction for these regression equations range from 20.0 to 74.5 percent.
Black, Nicola; Mullan, Barbara; Sharpe, Louise
2016-09-01
The current aim was to examine the effectiveness of behaviour change techniques (BCTs), theory and other characteristics in increasing the effectiveness of computer-delivered interventions (CDIs) to reduce alcohol consumption. Included were randomised studies with a primary aim of reducing alcohol consumption, which compared self-directed CDIs to assessment-only control groups. CDIs were coded for the use of 42 BCTs from an alcohol-specific taxonomy, the use of theory according to a theory coding scheme and general characteristics such as length of the CDI. Effectiveness of CDIs was assessed using random-effects meta-analysis and the association between the moderators and effect size was assessed using univariate and multivariate meta-regression. Ninety-three CDIs were included in at least one analysis and produced small, significant effects on five outcomes (d+ = 0.07-0.15). Larger effects occurred with some personal contact, provision of normative information or feedback on performance, prompting commitment or goal review, the social norms approach and in samples with more women. Smaller effects occurred when information on the consequences of alcohol consumption was provided. These findings can be used to inform both intervention- and theory-development. Intervention developers should focus on, including specific, effective techniques, rather than many techniques or more-elaborate approaches.
Spectral Regression Based Fault Feature Extraction for Bearing Accelerometer Sensor Signals
Xia, Zhanguo; Xia, Shixiong; Wan, Ling; Cai, Shiyu
2012-01-01
Bearings are not only the most important element but also a common source of failures in rotary machinery. Bearing fault prognosis technology has been receiving more and more attention recently, in particular because it plays an increasingly important role in avoiding the occurrence of accidents. Therein, fault feature extraction (FFE) of bearing accelerometer sensor signals is essential to highlight representative features of bearing conditions for machinery fault diagnosis and prognosis. This paper proposes a spectral regression (SR)-based approach for fault feature extraction from original features including time, frequency and time-frequency domain features of bearing accelerometer sensor signals. SR is a novel regression framework for efficient regularized subspace learning and feature extraction technology, and it uses the least squares method to obtain the best projection direction, rather than computing the density matrix of features, so it also has the advantage in dimensionality reduction. The effectiveness of the SR-based method is validated experimentally by applying the acquired vibration signals data to bearings. The experimental results indicate that SR can reduce the computation cost and preserve more structure information about different bearing faults and severities, and it is demonstrated that the proposed feature extraction scheme has an advantage over other similar approaches. PMID:23202017
Supporting Regularized Logistic Regression Privately and Efficiently.
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Supporting Regularized Logistic Regression Privately and Efficiently
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738
Byun, Bo-Ram; Kim, Yong-Il; Maki, Koutaro; Son, Woo-Sung
2015-01-01
This study was aimed to examine the correlation between skeletal maturation status and parameters from the odontoid process/body of the second vertebra and the bodies of third and fourth cervical vertebrae and simultaneously build multiple regression models to be able to estimate skeletal maturation status in Korean girls. Hand-wrist radiographs and cone beam computed tomography (CBCT) images were obtained from 74 Korean girls (6–18 years of age). CBCT-generated cervical vertebral maturation (CVM) was used to demarcate the odontoid process and the body of the second cervical vertebra, based on the dentocentral synchondrosis. Correlation coefficient analysis and multiple linear regression analysis were used for each parameter of the cervical vertebrae (P < 0.05). Forty-seven of 64 parameters from CBCT-generated CVM (independent variables) exhibited statistically significant correlations (P < 0.05). The multiple regression model with the greatest R 2 had six parameters (PH2/W2, UW2/W2, (OH+AH2)/LW2, UW3/LW3, D3, and H4/W4) as independent variables with a variance inflation factor (VIF) of <2. CBCT-generated CVM was able to include parameters from the second cervical vertebral body and odontoid process, respectively, for the multiple regression models. This suggests that quantitative analysis might be used to estimate skeletal maturation status. PMID:25878721
NASA Astrophysics Data System (ADS)
Nooruddin, Hasan A.; Anifowose, Fatai; Abdulraheem, Abdulazeez
2014-03-01
Soft computing techniques are recently becoming very popular in the oil industry. A number of computational intelligence-based predictive methods have been widely applied in the industry with high prediction capabilities. Some of the popular methods include feed-forward neural networks, radial basis function network, generalized regression neural network, functional networks, support vector regression and adaptive network fuzzy inference system. A comparative study among most popular soft computing techniques is presented using a large dataset published in literature describing multimodal pore systems in the Arab D formation. The inputs to the models are air porosity, grain density, and Thomeer parameters obtained using mercury injection capillary pressure profiles. Corrected air permeability is the target variable. Applying developed permeability models in recent reservoir characterization workflow ensures consistency between micro and macro scale information represented mainly by Thomeer parameters and absolute permeability. The dataset was divided into two parts with 80% of data used for training and 20% for testing. The target permeability variable was transformed to the logarithmic scale as a pre-processing step and to show better correlations with the input variables. Statistical and graphical analysis of the results including permeability cross-plots and detailed error measures were created. In general, the comparative study showed very close results among the developed models. The feed-forward neural network permeability model showed the lowest average relative error, average absolute relative error, standard deviations of error and root means squares making it the best model for such problems. Adaptive network fuzzy inference system also showed very good results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tharrington, Arnold N.
2015-09-09
The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.
Automating approximate Bayesian computation by local linear regression.
Thornton, Kevin R
2009-07-07
In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.
Burke, Adam; Peper, Erik
2002-01-01
Cumulative trauma disorder is a major health problem for adults. Despite a growing understanding of adult cumulative trauma disorder, however, little is known about the risks for younger populations. This investigation examined issues related to child/adolescent computer product use and upper body physical discomfort. A convenience sample of 212 students, grades 1-12, was interviewed at their homes by a college-age sibling or relative. One of the child's parents was also interviewed. A 22-item questionnaire was used for data-gathering. Questionnaire items included frequency and duration of use, type of computer products/games and input devices used, presence of physical discomfort, and parental concerns related to the child's computer use. Many students experienced physical discomfort attributed to computer use, such as wrist pain (30%) and back pain (15%). Specific computer activities-such as using a joystick or playing noneducational games-were significantly predictive of physical discomfort using logistic multiple regression. Many parents reported difficulty getting their children off the computer (46%) and that their children spent less time outdoors (35%). Computer product use within this cohort was associated with self-reported physical discomfort. Results suggest a need for more extensive study, including multiyear longitudinal surveys.
Feenstra, Heleen Em; Vermeulen, Ivar E; Murre, Jaap Mj; Schagen, Sanne B
2018-05-30
Online tests enable efficient self-administered assessments and consequently facilitate large-scale data collection for many fields of research. The Amsterdam Cognition Scan is a new online neuropsychological test battery that measures a broad variety of cognitive functions. The aims of this study were to evaluate the psychometric properties of the Amsterdam Cognition Scan and to establish regression-based normative data. The Amsterdam Cognition Scan was self-administrated twice from home-with an interval of 6 weeks-by 248 healthy Dutch-speaking adults aged 18 to 81 years. Test-retest reliability was moderate to high and comparable with that of equivalent traditional tests (intraclass correlation coefficients: .45 to .80; .83 for the Amsterdam Cognition Scan total score). Multiple regression analyses indicated that (1) participants' age negatively influenced all (12) cognitive measures, (2) gender was associated with performance on six measures, and (3) education level was positively associated with performance on four measures. In addition, we observed influences of tested computer skills and of self-reported amount of computer use on cognitive performance. Demographic characteristics that proved to influence Amsterdam Cognition Scan test performance were included in regression-based predictive formulas to establish demographically adjusted normative data. Initial results from a healthy adult sample indicate that the Amsterdam Cognition Scan has high usability and can give reliable measures of various generic cognitive ability areas. For future use, the influence of computer skills and experience should be further studied, and for repeated measurements, computer configuration should be consistent. The reported normative data allow for initial interpretation of Amsterdam Cognition Scan performances. ©Heleen EM Feenstra, Ivar E Vermeulen, Jaap MJ Murre, Sanne B Schagen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 30.05.2018.
Information-Decay Pursuit of Dynamic Parameters in Student Models
1994-04-01
simple worked-through example). Commercially available computer programs for structuring and using Bayesian inference include ERGO ( Noetic Systems...Tukey, J.W. (1977). Data analysis and Regression: A second course in statistics. Reading, MA: Addison-Wesley. Noetic Systems, Inc. (1991). ERGO...Naval Academy Division of Educational Studies Annapolis MD 21402-5002 Elmory Univerity Dr Janice Gifford 210 Fiabburne Bldg University of
ERIC Educational Resources Information Center
Culpepper, Steven Andrew
2012-01-01
The study of prediction bias is important and the last five decades include research studies that examined whether test scores differentially predict academic or employment performance. Previous studies used ordinary least squares (OLS) to assess whether groups differ in intercepts and slopes. This study shows that OLS yields inaccurate inferences…
A Selective Review of Group Selection in High-Dimensional Models
Huang, Jian; Breheny, Patrick; Ma, Shuangge
2013-01-01
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707
Krasikova, Dina V; Le, Huy; Bachura, Eric
2018-06-01
To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Feaster, Toby D.; Tasker, Gary D.
2002-01-01
Data from 167 streamflow-gaging stations in or near South Carolina with 10 or more years of record through September 30, 1999, were used to develop two methods for estimating the magnitude and frequency of floods in South Carolina for rural ungaged basins that are not significantly affected by regulation. Flood frequency estimates for 54 gaged sites in South Carolina were computed by fitting the water-year peak flows for each site to a log-Pearson Type III distribution. As part of the computation of flood-frequency estimates for gaged sites, new values for generalized skew coefficients were developed. Flood-frequency analyses also were made for gaging stations that drain basins from more than one physiographic province. The U.S. Geological Survey, in cooperation with the South Carolina Department of Transportation, updated these data from previous flood-frequency reports to aid officials who are active in floodplain management as well as those who design bridges, culverts, and levees, or other structures near streams where flooding is likely to occur. Regional regression analysis, using generalized least squares regression, was used to develop a set of predictive equations that can be used to estimate the 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year recurrence-interval flows for rural ungaged basins in the Blue Ridge, Piedmont, upper Coastal Plain, and lower Coastal Plain physiographic provinces of South Carolina. The predictive equations are all functions of drainage area. Average errors of prediction for these regression equations ranged from -16 to 19 percent for the 2-year recurrence-interval flow in the upper Coastal Plain to -34 to 52 percent for the 500-year recurrence interval flow in the lower Coastal Plain. A region-of-influence method also was developed that interactively estimates recurrence- interval flows for rural ungaged basins in the Blue Ridge of South Carolina. The region-of-influence method uses regression techniques to develop a unique relation between flow and basin characteristics for an individual watershed. This, then, can be used to estimate flows at ungaged sites. Because the computations required for this method are somewhat complex, a computer application was developed that performs the computations and compares the predictive errors for this method. The computer application includes the option of using the region-of-influence method, or the generalized least squares regression equations from this report to compute estimated flows and errors of prediction specific to each ungaged site. From a comparison of predictive errors using the region-of-influence method with those computed using the regional regression method, the region-of-influence method performed systematically better only in the Blue Ridge and is, therefore, not recommended for use in the other physiographic provinces. Peak-flow data for the South Carolina stations used in the regionalization study are provided in appendix A, which contains gaging station information, log-Pearson Type III statistics, information on stage-flow relations, and water-year peak stages and flows. For informational purposes, water-year peak-flow data for stations on regulated streams in South Carolina also are provided in appendix D. Other information pertaining to the regulated streams is provided in the text of the report.
Perri, Romina; Huta, Veronika; Pinchuk, Leonard; Pinchuk, Cindy; Ostry, David J; Lund, James P
2008-09-01
To determine if temporomandibular joint disorders (TMDs) are associated with extended computer use. People with chronic pain and extensive computer use were recruited by means of a newspaper advertisement. Those who responded to the ad were asked to complete an online survey, which included questions on computer use, medical history, pain symptoms, lifestyle and mood. Ninety-two people completed the online survey, but none of them responded to all questions in the survey. Of the 88 respondents who reported their sex, 49 (56%) were female. Most of the respondents had used computers for more than 5 hours per day for more than 5 years, and most believed that their pain was linked to computer use. The great majority had pain in the neck (73/89 [82%]) or shoulder (67/89 [75%]), but many (40/91 [44%]) also had symptoms of TMD. About half of the participants reported poor sleep and fatigue, and many linked their pain to negative effects on lifestyle and poor quality of life. Two multiple regressions, with duration of pain as the dependent variable, were carried out, one using the entire sample of respondents who had completed the necessary sections of the survey (n = 91) and the other using the subset of people with symptoms suggestive of TMD (n = 40). Duration of computer use was associated with duration of pain in both analyses, but 6 other independent variables (injury or arthritis, hours of daily computer use, stress, position of computer screen relative to the eyes, sex, and age) were without effect. In these regression analyses, the intercept was close to 0 years, which suggests that the pain began at about the same time as computer use. This web-based survey provides the first evidence that chronic pain in jaw muscles and other symptoms of TMD are associated with long-term, heavy use of computers. However, the great majority of people with these symptoms probably also suffer from pain in the shoulder and neck.
Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models
Jiang, Dingfeng; Huang, Jian
2013-01-01
Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of the concave penalized solutions in high-dimensional models, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm for computing the concave penalized solutions in generalized linear models. In contrast to the existing algorithms that use local quadratic or local linear approximation to the penalty function, the MMCD seeks to majorize the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy makes it possible to avoid the computation of a scaling factor in each update of the solutions, which improves the efficiency of coordinate descent. Under certain regularity conditions, we establish theoretical convergence property of the MMCD. We implement this algorithm for a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size. PMID:25309048
Computational approaches for predicting biomedical research collaborations.
Zhang, Qing; Yu, Hong
2014-01-01
Biomedical research is increasingly collaborative, and successful collaborations often produce high impact work. Computational approaches can be developed for automatically predicting biomedical research collaborations. Previous works of collaboration prediction mainly explored the topological structures of research collaboration networks, leaving out rich semantic information from the publications themselves. In this paper, we propose supervised machine learning approaches to predict research collaborations in the biomedical field. We explored both the semantic features extracted from author research interest profile and the author network topological features. We found that the most informative semantic features for author collaborations are related to research interest, including similarity of out-citing citations, similarity of abstracts. Of the four supervised machine learning models (naïve Bayes, naïve Bayes multinomial, SVMs, and logistic regression), the best performing model is logistic regression with an ROC ranging from 0.766 to 0.980 on different datasets. To our knowledge we are the first to study in depth how research interest and productivities can be used for collaboration prediction. Our approach is computationally efficient, scalable and yet simple to implement. The datasets of this study are available at https://github.com/qingzhanggithub/medline-collaboration-datasets.
Genome-wide regression and prediction with the BGLR statistical package.
Pérez, Paulino; de los Campos, Gustavo
2014-10-01
Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.
The Role of Parents and Related Factors on Adolescent Computer Use
Epstein, Jennifer A.
2012-01-01
Background Research suggested the importance of parents on their adolescents’ computer activity. Spending too much time on the computer for recreational purposes in particular has been found to be related to areas of public health concern in children/adolescents, including obesity and substance use. Design and Methods The goal of the research was to determine the association between recreational computer use and potentially linked factors (parental monitoring, social influences to use computers including parents, age of first computer use, self-control, and particular internet activities). Participants (aged 13-17 years and residing in the United States) were recruited via the Internet to complete an anonymous survey online using a survey tool. The target sample of 200 participants who completed the survey was achieved. The sample’s average age was 16 and was 63% girls. Results A set of regressions with recreational computer use as dependent variables were run. Conclusions Less parental monitoring, younger age at first computer use, listening or downloading music from the internet more frequently, using the internet for educational purposes less frequently, and parent’s use of the computer for pleasure were related to spending a greater percentage of time on non-school computer use. These findings suggest the importance of parental monitoring and parental computer use on their children’s own computer use, and the influence of some internet activities on adolescent computer use. Finally, programs aimed at parents to help them increase the age when their children start using computers and learn how to place limits on recreational computer use are needed. PMID:25170449
Testing Different Model Building Procedures Using Multiple Regression.
ERIC Educational Resources Information Center
Thayer, Jerome D.
The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…
5 CFR 591.219 - How does OPM compute shelter price indexes?
Code of Federal Regulations, 2010 CFR
2010-01-01
... estimates in hedonic regressions (a type of multiple regression) to compute for each COLA survey area the... and rental equivalence prices and/or estimates, OPM obtains for each unit surveyed information about... survey area and the Washington, DC, area. [67 FR 22340, May 3, 2002, as amended at 69 FR 59763, Oct. 6...
STX--Fortran-4 program for estimates of tree populations from 3P sample-tree-measurements
L. R. Grosenbaugh
1967-01-01
Describes how to use an improved and greatly expanded version of an earlier computer program (1964) that converts dendrometer measurements of 3P-sample trees to population values in terms of whatever units user desires. Many new options are available, including that of obtaining a product-yield and appraisal report based on regression coefficients supplied by user....
Crawford, John R; Garthwaite, Paul H; Denham, Annie K; Chelune, Gordon J
2012-12-01
Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because (a) not all psychologists are aware that regression equations can be built not only from raw data but also using only basic summary data for a sample, and (b) the computations involved are tedious and prone to error. In an attempt to overcome these barriers, Crawford and Garthwaite (2007) provided methods to build and apply simple linear regression models using summary statistics as data. In the present study, we extend this work to set out the steps required to build multiple regression models from sample summary statistics and the further steps required to compute the associated statistics for drawing inferences concerning an individual case. We also develop, describe, and make available a computer program that implements these methods. Although there are caveats associated with the use of the methods, these need to be balanced against pragmatic considerations and against the alternative of either entirely ignoring a pertinent data set or using it informally to provide a clinical "guesstimate." Upgraded versions of earlier programs for regression in the single case are also provided; these add the point and interval estimates of effect size developed in the present article.
Khan, Asaduzzaman; Western, Mark
The purpose of this study was to explore factors that facilitate or hinder effective use of computers in Australian general medical practice. This study is based on data extracted from a national telephone survey of 480 general practitioners (GPs) across Australia. Clinical functions performed by GPs using computers were examined using a zero-inflated Poisson (ZIP) regression modelling. About 17% of GPs were not using computer for any clinical function, while 18% reported using computers for all clinical functions. The ZIP model showed that computer anxiety was negatively associated with effective computer use, while practitioners' belief about usefulness of computers was positively associated with effective computer use. Being a female GP or working in partnership or group practice increased the odds of effectively using computers for clinical functions. To fully capitalise on the benefits of computer technology, GPs need to be convinced that this technology is useful and can make a difference.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
NASA Astrophysics Data System (ADS)
Basak, Subhash C.; Mills, Denise; Hawkins, Douglas M.
2008-06-01
A hierarchical classification study was carried out based on a set of 70 chemicals—35 which produce allergic contact dermatitis (ACD) and 35 which do not. This approach was implemented using a regular ridge regression computer code, followed by conversion of regression output to binary data values. The hierarchical descriptor classes used in the modeling include topostructural (TS), topochemical (TC), and quantum chemical (QC), all of which are based solely on chemical structure. The concordance, sensitivity, and specificity are reported. The model based on the TC descriptors was found to be the best, while the TS model was extremely poor.
Heimann, David C.; Rasmussen, Patrick P.; Cline, Teri L.; Pigue, Lori M.; Wagner, Holly R.
2010-01-01
Suspended-sediment data from 18 selected surface-water monitoring stations in the lower Missouri River Basin downstream from Gavins Point Dam were used in the computation of annual suspended-sediment and suspended-sand loads for 1976 through 2008. Three methods of suspended-sediment load determination were utilized and these included the subdivision method, regression of instantaneous turbidity with suspended-sediment concentrations at selected stations, and regression techniques using the Load Estimator (LOADEST) software. Characteristics of the suspended-sediment and streamflow data collected at the 18 monitoring stations and the tabulated annual suspended-sediment and suspended-sand loads and yields are presented.
Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T
2016-05-01
Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.
Ries(compiler), Kernell G.; With sections by Atkins, J. B.; Hummel, P.R.; Gray, Matthew J.; Dusenbury, R.; Jennings, M.E.; Kirby, W.H.; Riggs, H.C.; Sauer, V.B.; Thomas, W.O.
2007-01-01
The National Streamflow Statistics (NSS) Program is a computer program that should be useful to engineers, hydrologists, and others for planning, management, and design applications. NSS compiles all current U.S. Geological Survey (USGS) regional regression equations for estimating streamflow statistics at ungaged sites in an easy-to-use interface that operates on computers with Microsoft Windows operating systems. NSS expands on the functionality of the USGS National Flood Frequency Program, and replaces it. The regression equations included in NSS are used to transfer streamflow statistics from gaged to ungaged sites through the use of watershed and climatic characteristics as explanatory or predictor variables. Generally, the equations were developed on a statewide or metropolitan-area basis as part of cooperative study programs. Equations are available for estimating rural and urban flood-frequency statistics, such as the 1 00-year flood, for every state, for Puerto Rico, and for the island of Tutuila, American Samoa. Equations are available for estimating other statistics, such as the mean annual flow, monthly mean flows, flow-duration percentiles, and low-flow frequencies (such as the 7-day, 0-year low flow) for less than half of the states. All equations available for estimating streamflow statistics other than flood-frequency statistics assume rural (non-regulated, non-urbanized) conditions. The NSS output provides indicators of the accuracy of the estimated streamflow statistics. The indicators may include any combination of the standard error of estimate, the standard error of prediction, the equivalent years of record, or 90 percent prediction intervals, depending on what was provided by the authors of the equations. The program includes several other features that can be used only for flood-frequency estimation. These include the ability to generate flood-frequency plots, and plots of typical flood hydrographs for selected recurrence intervals, estimates of the probable maximum flood, extrapolation of the 500-year flood when an equation for estimating it is not available, and weighting techniques to improve flood-frequency estimates for gaging stations and ungaged sites on gaged streams. This report describes the regionalization techniques used to develop the equations in NSS and provides guidance on the applicability and limitations of the techniques. The report also includes a users manual and a summary of equations available for estimating basin lagtime, which is needed by the program to generate flood hydrographs. The NSS software and accompanying database, and the documentation for the regression equations included in NSS, are available on the Web at http://water.usgs.gov/software/.
Eash, David A.; Barnes, Kimberlee K.; Veilleux, Andrea G.
2013-01-01
A statewide study was performed to develop regional regression equations for estimating selected annual exceedance-probability statistics for ungaged stream sites in Iowa. The study area comprises streamgages located within Iowa and 50 miles beyond the State’s borders. Annual exceedance-probability estimates were computed for 518 streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data through 2010. The estimation of the selected statistics included a Bayesian weighted least-squares/generalized least-squares regression analysis to update regional skew coefficients for the 518 streamgages. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low flows. Also, geographic information system software was used to measure 59 selected basin characteristics for each streamgage. Regional regression analysis, using generalized least-squares regression, was used to develop a set of equations for each flood region in Iowa for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. A total of 394 streamgages were included in the development of regional regression equations for three flood regions (regions 1, 2, and 3) that were defined for Iowa based on landform regions and soil regions. Average standard errors of prediction range from 31.8 to 45.2 percent for flood region 1, 19.4 to 46.8 percent for flood region 2, and 26.5 to 43.1 percent for flood region 3. The pseudo coefficients of determination for the generalized least-squares equations range from 90.8 to 96.2 percent for flood region 1, 91.5 to 97.9 percent for flood region 2, and 92.4 to 96.0 percent for flood region 3. The regression equations are applicable only to stream sites in Iowa with flows not significantly affected by regulation, diversion, channelization, backwater, or urbanization and with basin characteristics within the range of those used to develop the equations. These regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the eight selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided by the Web-based tool. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these eight selected statistics are provided for the streamgage.
Ries, Kernell G.; Crouse, Michele Y.
2002-01-01
For many years, the U.S. Geological Survey (USGS) has been developing regional regression equations for estimating flood magnitude and frequency at ungaged sites. These regression equations are used to transfer flood characteristics from gaged to ungaged sites through the use of watershed and climatic characteristics as explanatory or predictor variables. Generally, these equations have been developed on a Statewide or metropolitan-area basis as part of cooperative study programs with specific State Departments of Transportation. In 1994, the USGS released a computer program titled the National Flood Frequency Program (NFF), which compiled all the USGS available regression equations for estimating the magnitude and frequency of floods in the United States and Puerto Rico. NFF was developed in cooperation with the Federal Highway Administration and the Federal Emergency Management Agency. Since the initial release of NFF, the USGS has produced new equations for many areas of the Nation. A new version of NFF has been developed that incorporates these new equations and provides additional functionality and ease of use. NFF version 3 provides regression-equation estimates of flood-peak discharges for unregulated rural and urban watersheds, flood-frequency plots, and plots of typical flood hydrographs for selected recurrence intervals. The Program also provides weighting techniques to improve estimates of flood-peak discharges for gaging stations and ungaged sites. The information provided by NFF should be useful to engineers and hydrologists for planning and design applications. This report describes the flood-regionalization techniques used in NFF and provides guidance on the applicability and limitations of the techniques. The NFF software and the documentation for the regression equations included in NFF are available at http://water.usgs.gov/software/nff.html.
Bisese, James A.
1995-01-01
Methods are presented for estimating the peak discharges of rural, unregulated streams in Virginia. A Pearson Type III distribution is fitted to the logarithms of the unregulated annual peak-discharge records from 363 stream-gaging stations in Virginia to estimate the peak discharge at these stations for recurrence intervals of 2 to 500 years. Peak-discharge characteristics for 284 unregulated stations are divided into eight regions based on physiographic province, and regressed on basin characteristics, including drainage area, main channel length, main channel slope, mean basin elevation, percentage of forest cover, mean annual precipitation, and maximum rainfall intensity. Regression equations for each region are computed by use of the generalized least-squares method, which accounts for spatial and temporal correlation between nearby gaging stations. This regression technique weights the significance of each station to the regional equation based on the length of records collected at each cation, the correlation between annual peak discharges among the stations, and the standard deviation of the annual peak discharge for each station.Drainage area proved to be the only significant explanatory variable in four regions, while other regions have as many as three significant variables. Standard errors of the regression equations range from 30 to 80 percent. Alternate equations using drainage area only are provided for the five regions with more than one significant explanatory variable.Methods and sample computations are provided to estimate peak discharges at gaged and engaged sites in Virginia for recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, and to adjust the regression estimates for sites on gaged streams where nearby gaging-station records are available.
Brenn, T; Arnesen, E
1985-01-01
For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.
Regression relation for pure quantum states and its implications for efficient computing.
Elsayed, Tarek A; Fine, Boris V
2013-02-15
We obtain a modified version of the Onsager regression relation for the expectation values of quantum-mechanical operators in pure quantum states of isolated many-body quantum systems. We use the insights gained from this relation to show that high-temperature time correlation functions in many-body quantum systems can be controllably computed without complete diagonalization of the Hamiltonians, using instead the direct integration of the Schrödinger equation for randomly sampled pure states. This method is also applicable to quantum quenches and other situations describable by time-dependent many-body Hamiltonians. The method implies exponential reduction of the computer memory requirement in comparison with the complete diagonalization. We illustrate the method by numerically computing infinite-temperature correlation functions for translationally invariant Heisenberg chains of up to 29 spins 1/2. Thereby, we also test the spin diffusion hypothesis and find it in a satisfactory agreement with the numerical results. Both the derivation of the modified regression relation and the justification of the computational method are based on the notion of quantum typicality.
Space shuttle propulsion parameter estimation using optional estimation techniques
NASA Technical Reports Server (NTRS)
1983-01-01
A regression analyses on tabular aerodynamic data provided. A representative aerodynamic model for coefficient estimation. It also reduced the storage requirements for the "normal' model used to check out the estimation algorithms. The results of the regression analyses are presented. The computer routines for the filter portion of the estimation algorithm and the :"bringing-up' of the SRB predictive program on the computer was developed. For the filter program, approximately 54 routines were developed. The routines were highly subsegmented to facilitate overlaying program segments within the partitioned storage space on the computer.
Asquith, William H.; Slade, R.M.
1999-01-01
The U.S. Geological Survey, in cooperation with the Texas Department of Transportation, has developed a computer program to estimate peak-streamflow frequency for ungaged sites in natural basins in Texas. Peak-streamflow frequency refers to the peak streamflows for recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Peak-streamflow frequency estimates are needed by planners, managers, and design engineers for flood-plain management; for objective assessment of flood risk; for cost-effective design of roads and bridges; and also for the desin of culverts, dams, levees, and other flood-control structures. The program estimates peak-streamflow frequency using a site-specific approach and a multivariate generalized least-squares linear regression. A site-specific approach differs from a traditional regional regression approach by developing unique equations to estimate peak-streamflow frequency specifically for the ungaged site. The stations included in the regression are selected using an informal cluster analysis that compares the basin characteristics of the ungaged site to the basin characteristics of all the stations in the data base. The program provides several choices for selecting the stations. Selecting the stations using cluster analysis ensures that the stations included in the regression will have the most pertinent information about flooding characteristics of the ungaged site and therefore provide the basis for potentially improved peak-streamflow frequency estimation. An evaluation of the site-specific approach in estimating peak-streamflow frequency for gaged sites indicates that the site-specific approach is at least as accurate as a traditional regional regression approach.
ERIC Educational Resources Information Center
Molenaar, Ivo W.
The technical problems involved in obtaining Bayesian model estimates for the regression parameters in m similar groups are studied. The available computer programs, BPREP (BASIC), and BAYREG, both written in FORTRAN, require an amount of computer processing that does not encourage regular use. These programs are analyzed so that the performance…
ERIC Educational Resources Information Center
Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas
2012-01-01
The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
2017-06-01
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
REGRESSION ANALYSIS OF SEA-SURFACE-TEMPERATURE PATTERNS FOR THE NORTH PACIFIC OCEAN.
SEA WATER, *SURFACE TEMPERATURE, *OCEANOGRAPHIC DATA, PACIFIC OCEAN, REGRESSION ANALYSIS , STATISTICAL ANALYSIS, UNDERWATER EQUIPMENT, DETECTION, UNDERWATER COMMUNICATIONS, DISTRIBUTION, THERMAL PROPERTIES, COMPUTERS.
Olson, Scott A.; with a section by Veilleux, Andrea G.
2014-01-01
This report provides estimates of flood discharges at selected annual exceedance probabilities (AEPs) for streamgages in and adjacent to Vermont and equations for estimating flood discharges at AEPs of 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent (recurrence intervals of 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-years, respectively) for ungaged, unregulated, rural streams in Vermont. The equations were developed using generalized least-squares regression. Flood-frequency and drainage-basin characteristics from 145 streamgages were used in developing the equations. The drainage-basin characteristics used as explanatory variables in the regression equations include drainage area, percentage of wetland area, and the basin-wide mean of the average annual precipitation. The average standard errors of prediction for estimating the flood discharges at the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent AEP with these equations are 34.9, 36.0, 38.7, 42.4, 44.9, 47.3, 50.7, and 55.1 percent, respectively. Flood discharges at selected AEPs for streamgages were computed by using the Expected Moments Algorithm. To improve estimates of the flood discharges for given exceedance probabilities at streamgages in Vermont, a new generalized skew coefficient was developed. The new generalized skew for the region is a constant, 0.44. The mean square error of the generalized skew coefficient is 0.078. This report describes a technique for using results from the regression equations to adjust an AEP discharge computed from a streamgage record. This report also describes a technique for using a drainage-area adjustment to estimate flood discharge at a selected AEP for an ungaged site upstream or downstream from a streamgage. The final regression equations and the flood-discharge frequency data used in this study will be available in StreamStats. StreamStats is a World Wide Web application providing automated regression-equation solutions for user-selected sites on streams.
Unified Computational Methods for Regression Analysis of Zero-Inflated and Bound-Inflated Data
Yang, Yan; Simpson, Douglas
2010-01-01
Bounded data with excess observations at the boundary are common in many areas of application. Various individual cases of inflated mixture models have been studied in the literature for bound-inflated data, yet the computational methods have been developed separately for each type of model. In this article we use a common framework for computing these models, and expand the range of models for both discrete and semi-continuous data with point inflation at the lower boundary. The quasi-Newton and EM algorithms are adapted and compared for estimation of model parameters. The numerical Hessian and generalized Louis method are investigated as means for computing standard errors after optimization. Correlated data are included in this framework via generalized estimating equations. The estimation of parameters and effectiveness of standard errors are demonstrated through simulation and in the analysis of data from an ultrasound bioeffect study. The unified approach enables reliable computation for a wide class of inflated mixture models and comparison of competing models. PMID:20228950
Biondi-Zoccai, Giuseppe; Mastrangeli, Simona; Romagnoli, Enrico; Peruzzi, Mariangela; Frati, Giacomo; Roever, Leonardo; Giordano, Arturo
2018-01-17
Atherosclerosis has major morbidity and mortality implications globally. While it has often been considered an irreversible degenerative process, recent evidence provides compelling proof that atherosclerosis can be reversed. Plaque regression is however difficult to appraise and quantify, with competing diagnostic methods available. Given the potential of evidence synthesis to provide clinical guidance, we aimed to review recent meta-analyses on diagnostic methods for atherosclerotic plaque regression. We identified 8 meta-analyses published between 2015 and 2017, including 79 studies and 14,442 patients, followed for a median of 12 months. They reported on atherosclerotic plaque regression appraised with carotid duplex ultrasound, coronary computed tomography, carotid magnetic resonance, coronary intravascular ultrasound, and coronary optical coherence tomography. Overall, all meta-analyses showed significant atherosclerotic plaque regression with lipid-lowering therapy, with the most notable effects on echogenicity, lipid-rich necrotic core volume, wall/plaque volume, dense calcium volume, and fibrous cap thickness. Significant interactions were found with concomitant changes in low density lipoprotein cholesterol, high density lipoprotein cholesterol, and C-reactive protein levels, and with ethnicity. Atherosclerotic plaque regression and conversion to a stable phenotype is possible with intensive medical therapy and can be demonstrated in patients using a variety of non-invasive and invasive imaging modalities.
High correlations between MRI brain volume measurements based on NeuroQuant® and FreeSurfer.
Ross, David E; Ochs, Alfred L; Tate, David F; Tokac, Umit; Seabaugh, John; Abildskov, Tracy J; Bigler, Erin D
2018-05-30
NeuroQuant ® (NQ) and FreeSurfer (FS) are commonly used computer-automated programs for measuring MRI brain volume. Previously they were reported to have high intermethod reliabilities but often large intermethod effect size differences. We hypothesized that linear transformations could be used to reduce the large effect sizes. This study was an extension of our previously reported study. We performed NQ and FS brain volume measurements on 60 subjects (including normal controls, patients with traumatic brain injury, and patients with Alzheimer's disease). We used two statistical approaches in parallel to develop methods for transforming FS volumes into NQ volumes: traditional linear regression, and Bayesian linear regression. For both methods, we used regression analyses to develop linear transformations of the FS volumes to make them more similar to the NQ volumes. The FS-to-NQ transformations based on traditional linear regression resulted in effect sizes which were small to moderate. The transformations based on Bayesian linear regression resulted in all effect sizes being trivially small. To our knowledge, this is the first report describing a method for transforming FS to NQ data so as to achieve high reliability and low effect size differences. Machine learning methods like Bayesian regression may be more useful than traditional methods. Copyright © 2018 Elsevier B.V. All rights reserved.
Modeling Longitudinal Data Containing Non-Normal Within Subject Errors
NASA Technical Reports Server (NTRS)
Feiveson, Alan; Glenn, Nancy L.
2013-01-01
The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
SPSS and SAS programming for the testing of mediation models.
Dudley, William N; Benuzillo, Jose G; Carrico, Mineh S
2004-01-01
Mediation modeling can explain the nature of the relation among three or more variables. In addition, it can be used to show how a variable mediates the relation between levels of intervention and outcome. The Sobel test, developed in 1990, provides a statistical method for determining the influence of a mediator on an intervention or outcome. Although interactive Web-based and stand-alone methods exist for computing the Sobel test, SPSS and SAS programs that automatically run the required regression analyses and computations increase the accessibility of mediation modeling to nursing researchers. To illustrate the utility of the Sobel test and to make this programming available to the Nursing Research audience in both SAS and SPSS. The history, logic, and technical aspects of mediation testing are introduced. The syntax files sobel.sps and sobel.sas, created to automate the computation of the regression analysis and test statistic, are available from the corresponding author. The reported programming allows the user to complete mediation testing with the user's own data in a single-step fashion. A technical manual included with the programming provides instruction on program use and interpretation of the output. Mediation modeling is a useful tool for describing the relation between three or more variables. Programming and manuals for using this model are made available.
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Lee, Donggil; Lee, Kyounghoon; Kim, Seonghun; Yang, Yongsu
2015-04-01
An automatic abalone grading algorithm that estimates abalone weights on the basis of computer vision using 2D images is developed and tested. The algorithm overcomes the problems experienced by conventional abalone grading methods that utilize manual sorting and mechanical automatic grading. To design an optimal algorithm, a regression formula and R(2) value were investigated by performing a regression analysis for each of total length, body width, thickness, view area, and actual volume against abalone weights. The R(2) value between the actual volume and abalone weight was 0.999, showing a relatively high correlation. As a result, to easily estimate the actual volumes of abalones based on computer vision, the volumes were calculated under the assumption that abalone shapes are half-oblate ellipsoids, and a regression formula was derived to estimate the volumes of abalones through linear regression analysis between the calculated and actual volumes. The final automatic abalone grading algorithm is designed using the abalone volume estimation regression formula derived from test results, and the actual volumes and abalone weights regression formula. In the range of abalones weighting from 16.51 to 128.01 g, the results of evaluation of the performance of algorithm via cross-validation indicate root mean square and worst-case prediction errors of are 2.8 and ±8 g, respectively. © 2015 Institute of Food Technologists®
Regression Analysis of Top of Descent Location for Idle-thrust Descents
NASA Technical Reports Server (NTRS)
Stell, Laurel; Bronsvoort, Jesper; McDonald, Greg
2013-01-01
In this paper, multiple regression analysis is used to model the top of descent (TOD) location of user-preferred descent trajectories computed by the flight management system (FMS) on over 1000 commercial flights into Melbourne, Australia. The independent variables cruise altitude, final altitude, cruise Mach, descent speed, wind, and engine type were also recorded or computed post-operations. Both first-order and second-order models are considered, where cross-validation, hypothesis testing, and additional analysis are used to compare models. This identifies the models that should give the smallest errors if used to predict TOD location for new data in the future. A model that is linear in TOD altitude, final altitude, descent speed, and wind gives an estimated standard deviation of 3.9 nmi for TOD location given the trajec- tory parameters, which means about 80% of predictions would have error less than 5 nmi in absolute value. This accuracy is better than demonstrated by other ground automation predictions using kinetic models. Furthermore, this approach would enable online learning of the model. Additional data or further knowl- edge of algorithms is necessary to conclude definitively that no second-order terms are appropriate. Possible applications of the linear model are described, including enabling arriving aircraft to fly optimized descents computed by the FMS even in congested airspace. In particular, a model for TOD location that is linear in the independent variables would enable decision support tool human-machine interfaces for which a kinetic approach would be computationally too slow.
NASA Technical Reports Server (NTRS)
Hohenemser, K. H.; Crews, S. T.
1972-01-01
A two bladed 16-inch hingeless rotor model was built and tested outside and inside a 24 by 24 inch wind tunnel test section at collective pitch settings up to 5 deg and rotor advance ratios up to .4. The rotor model has a simple eccentric mechanism to provide progressing or regressing cyclic pitch excitation. The flapping responses were compared to analytically determined responses which included flap-bending elasticity but excluded rotor wake effects. Substantial systematic deviations of the measured responses from the computed responses were found, which were interpreted as the effects of interaction of the blades with a rotating asymmetrical wake.
A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection
Sabourin, Jeremy A; Valdar, William; Nobel, Andrew B
2015-01-01
Summary We describe a simple, computationally effcient, permutation-based procedure for selecting the penalty parameter in LASSO penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), Scaled Sparse Linear Regression, and a selection method based on recently developed testing procedures for the LASSO. PMID:26243050
Imaging genetics approach to predict progression of Parkinson's diseases.
Mansu Kim; Seong-Jin Son; Hyunjin Park
2017-07-01
Imaging genetics is a tool to extract genetic variants associated with both clinical phenotypes and imaging information. The approach can extract additional genetic variants compared to conventional approaches to better investigate various diseased conditions. Here, we applied imaging genetics to study Parkinson's disease (PD). We aimed to extract significant features derived from imaging genetics and neuroimaging. We built a regression model based on extracted significant features combining genetics and neuroimaging to better predict clinical scores of PD progression (i.e. MDS-UPDRS). Our model yielded high correlation (r = 0.697, p <; 0.001) and low root mean squared error (8.36) between predicted and actual MDS-UPDRS scores. Neuroimaging (from 123 I-Ioflupane SPECT) predictors of regression model were computed from independent component analysis approach. Genetic features were computed using image genetics approach based on identified neuroimaging features as intermediate phenotypes. Joint modeling of neuroimaging and genetics could provide complementary information and thus have the potential to provide further insight into the pathophysiology of PD. Our model included newly found neuroimaging features and genetic variants which need further investigation.
Online EEG artifact removal for BCI applications by adaptive spatial filtering.
Guarnieri, Roberto; Marino, Marco; Barban, Federico; Ganzetti, Marco; Mantini, Dante
2018-06-28
The performance of brain computer interfaces (BCIs) based on electroencephalography (EEG) data strongly depends on the effective attenuation of artifacts that are mixed in the recordings. To address this problem, we have developed a novel online EEG artifact removal method for BCI applications, which combines blind source separation (BSS) and regression (REG) analysis. The BSS-REG method relies on the availability of a calibration dataset of limited duration for the initialization of a spatial filter using BSS. Online artifact removal is implemented by dynamically adjusting the spatial filter in the actual experiment, based on a linear regression technique. Our results showed that the BSS-REG method is capable of attenuating different kinds of artifacts, including ocular and muscular, while preserving true neural activity. Thanks to its low computational requirements, BSS-REG can be applied to low-density as well as high-density EEG data. We argue that BSS-REG may enable the development of novel BCI applications requiring high-density recordings, such as source-based neurofeedback and closed-loop neuromodulation. © 2018 IOP Publishing Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shirkhoda, A; Mauro, M.A.; Staab, E.V.
Fifty-four hemophiliac patients underwent a total of 94 studies using computed tomography (CT), ultrasound, or both. Not only common bleeding sites such as the iliopsoas muscles but also several unusual sites were encountered: these included th iliac bone, bowel wall, mesentery, rectus abdominis muscle, retroperitoneum, bladder wall, and scrotum. Both modalities gave comparable results, and each was helpful in (a) establishing the diagnosis, (b) evaluating the extent of bleeding and its effect on adjacent organs, and (c) demonstrating regression after treatment.
CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions
Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.
Gijsberts, Arjan; Metta, Giorgio
2013-05-01
Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Stock, Thomas A.
1995-01-01
Probabilistic composite micromechanics methods are developed that simulate expected uncertainties in unidirectional fiber composite properties. These methods are in the form of computational procedures using Monte Carlo simulation. The variables in which uncertainties are accounted for include constituent and void volume ratios, constituent elastic properties and strengths, and fiber misalignment. A graphite/epoxy unidirectional composite (ply) is studied to demonstrate fiber composite material property variations induced by random changes expected at the material micro level. Regression results are presented to show the relative correlation between predictor and response variables in the study. These computational procedures make possible a formal description of anticipated random processes at the intraply level, and the related effects of these on composite properties.
Users manual for flight control design programs
NASA Technical Reports Server (NTRS)
Nalbandian, J. Y.
1975-01-01
Computer programs for the design of analog and digital flight control systems are documented. The program DIGADAPT uses linear-quadratic-gaussian synthesis algorithms in the design of command response controllers and state estimators, and it applies covariance propagation analysis to the selection of sampling intervals for digital systems. Program SCHED executes correlation and regression analyses for the development of gain and trim schedules to be used in open-loop explicit-adaptive control laws. A linear-time-varying simulation of aircraft motions is provided by the program TVHIS, which includes guidance and control logic, as well as models for control actuator dynamics. The programs are coded in FORTRAN and are compiled and executed on both IBM and CDC computers.
Probabilistic Fiber Composite Micromechanics
NASA Technical Reports Server (NTRS)
Stock, Thomas A.
1996-01-01
Probabilistic composite micromechanics methods are developed that simulate expected uncertainties in unidirectional fiber composite properties. These methods are in the form of computational procedures using Monte Carlo simulation. The variables in which uncertainties are accounted for include constituent and void volume ratios, constituent elastic properties and strengths, and fiber misalignment. A graphite/epoxy unidirectional composite (ply) is studied to demonstrate fiber composite material property variations induced by random changes expected at the material micro level. Regression results are presented to show the relative correlation between predictor and response variables in the study. These computational procedures make possible a formal description of anticipated random processes at the intra-ply level, and the related effects of these on composite properties.
Prediction of Patient-Controlled Analgesic Consumption: A Multimodel Regression Tree Approach.
Hu, Yuh-Jyh; Ku, Tien-Hsiung; Yang, Yu-Hung; Shen, Jia-Ying
2018-01-01
Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.
GWAS with longitudinal phenotypes: performance of approximate procedures
Sikorska, Karolina; Montazeri, Nahid Mostafavi; Uitterlinden, André; Rivadeneira, Fernando; Eilers, Paul HC; Lesaffre, Emmanuel
2015-01-01
Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 2012; 32.1: 165–180), we proposed the conditional two-step (CTS) approach as a fast method providing an approximation to the P-value for the longitudinal single-nucleotide polymorphism (SNP) effect. In the first step a reduced conditional LMM is fit, omitting all the SNP terms. In the second step, the estimated random slopes are regressed on SNPs. The CTS has been applied to the bone mineral density data from the Rotterdam Study and proved to work very well even in unbalanced situations. In another article (Sikorska et al: GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14: 166), we suggested semi-parallel computations, greatly speeding up fitting many linear regressions. Combining CTS with fast linear regression reduces the computation time from several weeks to a few minutes on a single computer. Here, we explore further the properties of the CTS both analytically and by simulations. We investigate the performance of our proposal in comparison with a related but different approach, the two-step procedure. It is analytically shown that for the balanced case, under mild assumptions, the P-value provided by the CTS is the same as from the LMM. For unbalanced data and in realistic situations, simulations show that the CTS method does not inflate the type I error rate and implies only a minimal loss of power. PMID:25712081
Fisher, Charles K; Mehta, Pankaj
2015-06-01
Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Dorband, J. E.; Tilak, N.; Radov, A.
2016-12-01
In this paper, a classical computer implementation of RBM is compared to a quantum annealing based RBM running on a D-Wave 2X (an adiabatic quantum computer). The codes for both are essentially identical. Only a flag is set to change the activation function from a classically computed logistic function to the D-Wave. To obtain greater understanding of the behavior of the D-Wave, a study of the stochastic properties of a virtual qubit (a 12 qubit chain) and a cell of qubits (an 8 qubit cell) was performed. We will present the results of comparing the D-Wave implementation with a theoretically errorless adiabatic quantum computer. The main purpose of this study is to develop a generic RBM regression tool in order to infer CO2 fluxes from the NASA satellite OCO-2 observed CO2 concentrations and predicted atmospheric states using regression models. The carbon fluxes will then be assimilated into a land surface model to predict the Net Ecosystem Exchange at globally distributed regional sites.
Jiang, Feng; Han, Ji-zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088
Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
Batchelor, Connor; Pordeli, Pooneh; d'Esterre, Christopher D; Najm, Mohamed; Al-Ajlan, Fahad S; Boesen, Mari E; McDougall, Connor; Hur, Lisa; Fainardi, Enrico; Shankar, Jai Jai Shiva; Rubiera, Marta; Khaw, Alexander V; Hill, Michael D; Demchuk, Andrew M; Sajobi, Tolulope T; Goyal, Mayank; Lee, Ting-Yim; Aviv, Richard I; Menon, Bijoy K
2017-06-01
Intracerebral hemorrhage is a feared complication of intravenous alteplase therapy in patients with acute ischemic stroke. We explore the use of multimodal computed tomography in predicting this complication. All patients were administered intravenous alteplase with/without intra-arterial therapy. An age- and sex-matched case-control design with classic and conditional logistic regression techniques was chosen for analyses. Outcome was parenchymal hemorrhage on 24- to 48-hour imaging. Exposure variables were imaging (noncontrast computed tomography hypoattenuation degree, relative volume of very low cerebral blood volume, relative volume of cerebral blood flow ≤7 mL/min·per 100 g, relative volume of T max ≥16 s with all volumes standardized to z axis coverage, mean permeability surface area product values within T max ≥8 s volume, and mean permeability surface area product values within ipsilesional hemisphere) and clinical variables (NIHSS [National Institutes of Health Stroke Scale], onset to imaging time, baseline systolic blood pressure, blood glucose, serum creatinine, treatment type, and reperfusion status). One-hundred eighteen subjects (22 patients with parenchymal hemorrhage versus 96 without, median baseline NIHSS score of 15) were included in the final analysis. In multivariable regression, noncontrast computed tomography hypoattenuation grade ( P <0.006) and computerized tomography perfusion white matter relative volume of very low cerebral blood volume ( P =0.04) were the only significant variables associated with parenchymal hemorrhage on follow-up imaging (area under the curve, 0.73; 95% confidence interval, 0.63-0.83). Interrater reliability for noncontrast computed tomography hypoattenuation grade was moderate (κ=0.6). Baseline hypoattenuation on noncontrast computed tomography and very low cerebral blood volume on computerized tomography perfusion are associated with development of parenchymal hemorrhage in patients with acute ischemic stroke receiving intravenous alteplase. © 2017 American Heart Association, Inc.
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
Adolescent Sedentary Behaviors: Correlates Differ for Television Viewing and Computer Use
Babey, Susan H.; Hastert, Theresa A.; Wolstein, Joelle
2013-01-01
Purpose Sedentary behavior is associated with obesity in youth. Understanding correlates of specific sedentary behaviors can inform the development of interventions to reduce sedentary time. The current research examines correlates of leisure computer use and television viewing among California adolescents. Methods Using data from the 2005 California Health Interview Survey (CHIS), we examined individual, family and environmental correlates of two sedentary behaviors among 4,029 adolescents: leisure computer use and television watching. Results Linear regression analyses adjusting for a range of factors indicated several differences in the correlates of television watching and computer use. Correlates of additional time spent watching television included male sex, American Indian and African American race, lower household income, lower levels of physical activity, lower parent educational attainment, and additional hours worked by parents. Correlates of a greater amount of time spent using the computer for fun included older age, Asian race, higher household income, lower levels of physical activity, less parental knowledge of free time activities, and living in neighborhoods with higher proportions of non-white residents and higher proportions of low-income residents. Only physical activity was associated similarly with both watching television and computer use. Conclusions These results suggest that correlates of time spent on television watching and leisure computer use are different. Reducing screen time is a potentially successful strategy in combating childhood obesity, and understanding differences in the correlates of different screen time behaviors can inform the development of more effective interventions to reduce sedentary time. PMID:23260837
Smith, S. Jerrod; Esralew, Rachel A.
2010-01-01
The USGS Streamflow Statistics (StreamStats) Program was created to make geographic information systems-based estimation of streamflow statistics easier, faster, and more consistent than previously used manual techniques. The StreamStats user interface is a map-based internet application that allows users to easily obtain streamflow statistics, basin characteristics, and other information for user-selected U.S. Geological Survey data-collection stations and ungaged sites of interest. The application relies on the data collected at U.S. Geological Survey streamflow-gaging stations, computer aided computations of drainage-basin characteristics, and published regression equations for several geographic regions comprising the United States. The StreamStats application interface allows the user to (1) obtain information on features in selected map layers, (2) delineate drainage basins for ungaged sites, (3) download drainage-basin polygons to a shapefile, (4) compute selected basin characteristics for delineated drainage basins, (5) estimate selected streamflow statistics for ungaged points on a stream, (6) print map views, (7) retrieve information for U.S. Geological Survey streamflow-gaging stations, and (8) get help on using StreamStats. StreamStats was designed for national application, with each state, territory, or group of states responsible for creating unique geospatial datasets and regression equations to compute selected streamflow statistics. With the cooperation of the Oklahoma Department of Transportation, StreamStats has been implemented for Oklahoma and is available at http://water.usgs.gov/osw/streamstats/. The Oklahoma StreamStats application covers 69 processed hydrologic units and most of the state of Oklahoma. Basin characteristics available for computation include contributing drainage area, contributing drainage area that is unregulated by Natural Resources Conservation Service floodwater retarding structures, mean-annual precipitation at the drainage-basin outlet for the period 1961-1990, 10-85 channel slope (slope between points located at 10 percent and 85 percent of the longest flow-path length upstream from the outlet), and percent impervious area. The Oklahoma StreamStats application interacts with the National Streamflow Statistics database, which contains the peak-flow regression equations in a previously published report. Fourteen peak-flow (flood) frequency statistics are available for computation in the Oklahoma StreamStats application. These statistics include the peak flow at 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence intervals for rural, unregulated streams; and the peak flow at 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence intervals for rural streams that are regulated by Natural Resources Conservation Service floodwater retarding structures. Basin characteristics and streamflow statistics cannot be computed for locations in playa basins (mostly in the Oklahoma Panhandle) and along main stems of the largest river systems in the state, namely the Arkansas, Canadian, Cimarron, Neosho, Red, and Verdigris Rivers, because parts of the drainage areas extend outside of the processed hydrologic units.
RRegrs: an R package for computer-aided model selection with multiple regression models.
Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L
2015-01-01
Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.
Promoting Colorectal Cancer Screening Discussion
Christy, Shannon M.; Perkins, Susan M.; Tong, Yan; Krier, Connie; Champion, Victoria L.; Skinner, Celette Sugg; Springston, Jeffrey K.; Imperiale, Thomas F.; Rawl, Susan M.
2013-01-01
Background Provider recommendation is a predictor of colorectal cancer (CRC) screening. Purpose To compare the effects of two clinic-based interventions on patient–provider discussions about CRC screening. Design Two-group RCT with data collected at baseline and 1 week post-intervention. Participants/setting African-American patients that were non-adherent to CRC screening recommendations (n=693) with a primary care visit between 2008 and 2010 in one of 11 urban primary care clinics. Intervention Participants received either a computer-delivered tailored CRC screening intervention or a nontailored informational brochure about CRC screening immediately prior to their primary care visit. Main outcome measures Between-group differences in odds of having had a CRC screening discussion about a colon test, with and without adjusting for demographic, clinic, health literacy, health belief, and social support variables, were examined as predictors of a CRC screening discussion using logistic regression. Intervention effects on CRC screening test order by PCPs were examined using logistic regression. Analyses were conducted in 2011 and 2012. Results Compared to the brochure group, a greater proportions of those in the computer-delivered tailored intervention group reported having had a discussion with their provider about CRC screening (63% vs 48%, OR=1.81, p<0.001). Predictors of a discussion about CRC screening included computer group participation, younger age, reason for visit, being unmarried, colonoscopy self-efficacy, and family member/friend recommendation (all p-values <0.05). Conclusions The computer-delivered tailored intervention was more effective than a nontailored brochure at stimulating patient–provider discussions about CRC screening. Those who received the computer-delivered intervention also were more likely to have a CRC screening test (fecal occult blood test or colonoscopy) ordered by their PCP. Trial registration This study is registered at www.clinicaltrials.gov NCT00672828. PMID:23498096
Heath, Anna; Manolopoulou, Ioanna; Baio, Gianluca
2016-10-15
The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the 'cost' of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Monte Carlo simulations. However, recent developments have demonstrated that the EVPPI can be estimated by non-parametric regression methods, which have significantly decreased the computation time required to approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian Process (GP) regression is suggested, but this can still be prohibitively expensive. Applying fast computation methods developed in spatial statistics using Integrated Nested Laplace Approximations (INLA) and projecting from a high-dimensional into a low-dimensional input space allows us to decrease the computation time for fitting these high-dimensional GP, often substantially. We demonstrate that the EVPPI calculated using our method for GP regression is in line with the standard GP regression method and that despite the apparent methodological complexity of this new method, R functions are available in the package BCEA to implement it simply and efficiently. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Recruiting Older Youths: Insights from a New Survey of Army Recruits
2014-01-01
remaining in the service at the time to be considered for promotion 8. the unconditional probability of achieving the military grade of E-5 at four years...of service 9. the unconditional probability of achieving the military grade of E-5 at six years of ser- vice. We examined both the total effects of...career outcomes for Army enlist- ees. These effects are computed from separate linear probability regression models that include only dummy variables
Fast function-on-scalar regression with penalized basis expansions.
Reiss, Philip T; Huang, Lei; Mennes, Maarten
2010-01-01
Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.
On the use of log-transformation vs. nonlinear regression for analyzing biological power laws.
Xiao, Xiao; White, Ethan P; Hooten, Mevin B; Durham, Susan L
2011-10-01
Power-law relationships are among the most well-studied functional relationships in biology. Recently the common practice of fitting power laws using linear regression (LR) on log-transformed data has been criticized, calling into question the conclusions of hundreds of studies. It has been suggested that nonlinear regression (NLR) is preferable, but no rigorous comparison of these two methods has been conducted. Using Monte Carlo simulations, we demonstrate that the error distribution determines which method performs better, with NLR better characterizing data with additive, homoscedastic, normal error and LR better characterizing data with multiplicative, heteroscedastic, lognormal error. Analysis of 471 biological power laws shows that both forms of error occur in nature. While previous analyses based on log-transformation appear to be generally valid, future analyses should choose methods based on a combination of biological plausibility and analysis of the error distribution. We provide detailed guidelines and associated computer code for doing so, including a model averaging approach for cases where the error structure is uncertain.
Lin, Feng-Chang; Zhu, Jun
2012-01-01
We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.
Applications of modern statistical methods to analysis of data in physical science
NASA Astrophysics Data System (ADS)
Wicker, James Eric
Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
NASA Astrophysics Data System (ADS)
Coffman, Mitchell Ward
The purpose of this dissertation was to examine the relationship between student access to a computer at home and academic achievement. The 2009 National Assessment of Educational Progress (NAEP) dataset was probed using the National Data Explorer (NDE) to investigate correlations in the subsets of SES, Parental Education, Race, and Gender as it relates to access of a home computer and improved performance scores for U.S. public school grade 12 science students. A causal-comparative approach was employed seeking clarity on the relationship between home access and performance scores. The influence of home access cannot overcome the challenges students of lower SES face. The achievement gap, or a second digital divide, for underprivileged classes of students, including minorities does not appear to contract via student access to a home computer. Nonetheless, in tests for significance, statistically significant improvement in science performance scores was reported for those having access to a computer at home compared to those not having access. Additionally, regression models reported evidence of correlations between and among subsets of controls for the demographic factors gender, race, and socioeconomic status. Variability in these correlations was high; suggesting influence from unobserved factors may have more impact upon the dependent variable. Having access to a computer at home increases performance scores for grade 12 general science students of all races, genders and socioeconomic levels. However, the performance gap is roughly equivalent to the existing performance gap of the national average for science scores, suggesting little influence from access to a computer on academic achievement. The variability of scores reported in the regression analysis models reflects a moderate to low effect, suggesting an absence of causation. These statistical results are accurate and confirm the literature review, whereby having access to a computer at home and the predictor variables were found to have a significant impact on performance scores, although the data presented suggest computer access at home is less influential upon performance scores than poverty and its correlates.
Sparse Regression as a Sparse Eigenvalue Problem
NASA Technical Reports Server (NTRS)
Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai
2008-01-01
We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Iavarone, Salvatore; Smith, Sean T.; Smith, Philip J.
Oxy-coal combustion is an emerging low-cost “clean coal” technology for emissions reduction and Carbon Capture and Sequestration (CCS). The use of Computational Fluid Dynamics (CFD) tools is crucial for the development of cost-effective oxy-fuel technologies and the minimization of environmental concerns at industrial scale. The coupling of detailed chemistry models and CFD simulations is still challenging, especially for large-scale plants, because of the high computational efforts required. The development of scale-bridging models is therefore necessary, to find a good compromise between computational efforts and the physical-chemical modeling precision. This paper presents a procedure for scale-bridging modeling of coal devolatilization, inmore » the presence of experimental error, that puts emphasis on the thermodynamic aspect of devolatilization, namely the final volatile yield of coal, rather than kinetics. The procedure consists of an engineering approach based on dataset consistency and Bayesian methodology including Gaussian-Process Regression (GPR). Experimental data from devolatilization tests carried out in an oxy-coal entrained flow reactor were considered and CFD simulations of the reactor were performed. Jointly evaluating experiments and simulations, a novel yield model was validated against the data via consistency analysis. In parallel, a Gaussian-Process Regression was performed, to improve the understanding of the uncertainty associated to the devolatilization, based on the experimental measurements. Potential model forms that could predict yield during devolatilization were obtained. The set of model forms obtained via GPR includes the yield model that was proven to be consistent with the data. Finally, the overall procedure has resulted in a novel yield model for coal devolatilization and in a valuable evaluation of uncertainty in the data, in the model form, and in the model parameters.« less
Iavarone, Salvatore; Smith, Sean T.; Smith, Philip J.; ...
2017-06-03
Oxy-coal combustion is an emerging low-cost “clean coal” technology for emissions reduction and Carbon Capture and Sequestration (CCS). The use of Computational Fluid Dynamics (CFD) tools is crucial for the development of cost-effective oxy-fuel technologies and the minimization of environmental concerns at industrial scale. The coupling of detailed chemistry models and CFD simulations is still challenging, especially for large-scale plants, because of the high computational efforts required. The development of scale-bridging models is therefore necessary, to find a good compromise between computational efforts and the physical-chemical modeling precision. This paper presents a procedure for scale-bridging modeling of coal devolatilization, inmore » the presence of experimental error, that puts emphasis on the thermodynamic aspect of devolatilization, namely the final volatile yield of coal, rather than kinetics. The procedure consists of an engineering approach based on dataset consistency and Bayesian methodology including Gaussian-Process Regression (GPR). Experimental data from devolatilization tests carried out in an oxy-coal entrained flow reactor were considered and CFD simulations of the reactor were performed. Jointly evaluating experiments and simulations, a novel yield model was validated against the data via consistency analysis. In parallel, a Gaussian-Process Regression was performed, to improve the understanding of the uncertainty associated to the devolatilization, based on the experimental measurements. Potential model forms that could predict yield during devolatilization were obtained. The set of model forms obtained via GPR includes the yield model that was proven to be consistent with the data. Finally, the overall procedure has resulted in a novel yield model for coal devolatilization and in a valuable evaluation of uncertainty in the data, in the model form, and in the model parameters.« less
Efficient least angle regression for identification of linear-in-the-parameters models
Beach, Thomas H.; Rezgui, Yacine
2017-01-01
Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140
Mastin, Mark C.; Konrad, Christopher P.; Veilleux, Andrea G.; Tecca, Alison E.
2016-09-20
An investigation into the magnitude and frequency of floods in Washington State computed the annual exceedance probability (AEP) statistics for 648 U.S. Geological Survey unregulated streamgages in and near the borders of Washington using the recorded annual peak flows through water year 2014. This is an updated report from a previous report published in 1998 that used annual peak flows through the water year 1996. New in this report, a regional skew coefficient was developed for the Pacific Northwest region that includes areas in Oregon, Washington, Idaho and western Montana within the Columbia River drainage basin south of the United States-Canada border, the coastal areas of Oregon and western Washington, and watersheds draining into Puget Sound, Washington. The skew coefficient is an important term in the Log Pearson Type III equation used to define the distribution of the log-transformed annual peaks. The Expected Moments Algorithm was used to fit historical and censored peak-flow data to the log Pearson Type III distribution. A Multiple Grubb-Beck test was employed to censor low outliers of annual peak flows to improve on the frequency distribution. This investigation also includes a section on observed trends in annual peak flows that showed significant trends (p-value < 0.05) in 21 of 83 long-term sites, but with small magnitude Kendall tau values suggesting a limited monotonic trend in the time series of annual peaks. Most of the sites with a significant trend in western Washington were positive and all the sites with significant trends (three sites) in eastern Washington were negative.Multivariate regression analysis with measured basin characteristics and the AEP statistics at long-term, unregulated, and un-urbanized (defined as drainage basins with less than 5 percent impervious land cover for this investigation) streamgages within Washington and some in Idaho and Oregon that are near the Washington border was used to develop equations to estimate AEP statistics at ungaged basins. Washington was divided into four regions to improve the accuracy of the regression equations; a set of equations for eight selected AEPs and for each region were constructed. Selected AEP statistics included the annual peak flows that equaled or exceeded 50, 20, 10, 4, 2, 1, 0.5 and 0.2 percent of the time equivalent to peak flows for peaks with a 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year recurrence intervals, respectively. Annual precipitation and drainage area were the significant basin characteristics in the regression equations for all four regression regions in Washington and forest cover was significant for the two regression regions in eastern Washington. Average standard error of prediction for the regional regression equations ranged from 70.19 to 125.72 percent for Regression Regions 1 and 2 on the eastern side of the Cascade Mountains and from 43.22 to 58.04 percent for Regression Regions 3 and 4 on the western side of the Cascade Mountains. The pseudo coefficient of determination (where a value of 100 signifies a perfect regression model) ranged from 68.39 to 90.68 for Regression Regions 1 and 2, and 92.35 to 95.44 for Regions 3 and 4.The calculated AEP statistics for the streamgages and the regional regression equations are expected to be incorporated into StreamStats after the publication of this report. StreamStats is the interactive Web-based map tool created by the U.S. Geological Survey to allow the user to choose a streamgage and obtain published statistics or choose ungaged locations where the program automatically applies the regional regression equations and computes the estimates of the AEP statistics.
Jin, Yonghong; Zhang, Qi; Shan, Lifei; Li, Sai-Ping
2015-01-01
Financial networks have been extensively studied as examples of real world complex networks. In this paper, we establish and study the network of venture capital (VC) firms in China. We compute and analyze the statistical properties of the network, including parameters such as degrees, mean lengths of the shortest paths, clustering coefficient and robustness. We further study the topology of the network and find that it has small-world behavior. A multiple linear regression model is introduced to study the relation between network parameters and major regional economic indices in China. From the result of regression, we find that, economic aggregate (including the total GDP, investment, consumption and net export), upgrade of industrial structure, employment and remuneration of a region are all positively correlated with the degree and the clustering coefficient of the VC sub-network of the region, which suggests that the development of the VC industry has substantial effects on regional economy in China.
Jin, Yonghong; Zhang, Qi; Shan, Lifei; Li, Sai-Ping
2015-01-01
Financial networks have been extensively studied as examples of real world complex networks. In this paper, we establish and study the network of venture capital (VC) firms in China. We compute and analyze the statistical properties of the network, including parameters such as degrees, mean lengths of the shortest paths, clustering coefficient and robustness. We further study the topology of the network and find that it has small-world behavior. A multiple linear regression model is introduced to study the relation between network parameters and major regional economic indices in China. From the result of regression, we find that, economic aggregate (including the total GDP, investment, consumption and net export), upgrade of industrial structure, employment and remuneration of a region are all positively correlated with the degree and the clustering coefficient of the VC sub-network of the region, which suggests that the development of the VC industry has substantial effects on regional economy in China. PMID:26340555
Overhead longwave infrared hyperspectral material identification using radiometric models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zelinski, M. E.
Material detection algorithms used in hyperspectral data processing are computationally efficient but can produce relatively high numbers of false positives. Material identification performed as a secondary processing step on detected pixels can help separate true and false positives. This paper presents a material identification processing chain for longwave infrared hyperspectral data of solid materials collected from airborne platforms. The algorithms utilize unwhitened radiance data and an iterative algorithm that determines the temperature, humidity, and ozone of the atmospheric profile. Pixel unmixing is done using constrained linear regression and Bayesian Information Criteria for model selection. The resulting product includes an optimalmore » atmospheric profile and full radiance material model that includes material temperature, abundance values, and several fit statistics. A logistic regression method utilizing all model parameters to improve identification is also presented. This paper details the processing chain and provides justification for the algorithms used. Several examples are provided using modeled data at different noise levels.« less
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit
Chu, Annie; Cui, Jenny; Dinov, Ivo D.
2011-01-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
System and Method for Monitoring Distributed Asset Data
NASA Technical Reports Server (NTRS)
Gorinevsky, Dimitry (Inventor)
2015-01-01
A computer-based monitoring system and monitoring method implemented in computer software for detecting, estimating, and reporting the condition states, their changes, and anomalies for many assets. The assets are of same type, are operated over a period of time, and outfitted with data collection systems. The proposed monitoring method accounts for variability of working conditions for each asset by using regression model that characterizes asset performance. The assets are of the same type but not identical. The proposed monitoring method accounts for asset-to-asset variability; it also accounts for drifts and trends in the asset condition and data. The proposed monitoring system can perform distributed processing of massive amounts of historical data without discarding any useful information where moving all the asset data into one central computing system might be infeasible. The overall processing is includes distributed preprocessing data records from each asset to produce compressed data.
The use of gas chromatographic-mass spectrometric-computer systems in pharmacokinetic studies.
Horning, M G; Nowlin, J; Stafford, M; Lertratanangkoon, K; Sommer, K R; Hill, R M; Stillwell, R N
1975-10-29
Pharmacokinetic studies involving plasma, urine, breast milk, saliva and liver homogenates have been carried out by selective ion detection with a gas chromatographic-mass spectrometric-computer system operated in the chemical ionization mode. Stable isotope labeled drugs were used as internal standards for quantification. The half-lives, the concentration at zero time, the slope (regression coefficient), the maximum velocity of the reaction and the apparent Michaelis constant of the reaction were determined by regression analysis, and also by graphic means.
Factors influencing use of an e-health website in a community sample of older adults.
Czaja, Sara J; Sharit, Joseph; Lee, Chin Chin; Nair, Sankaran N; Hernández, Mario A; Arana, Neysarí; Fu, Shih Hua
2013-01-01
The use of the internet as a source of health information and link to healthcare services has raised concerns about the ability of consumers, especially vulnerable populations such as older adults, to access these applications. This study examined the influence of training on the ability of adults (aged 45+ years) to use the Medicare.gov website to solve problems related to health management. The influence of computer experience and cognitive abilities on performance was also examined. Seventy-one participants, aged 47-92, were randomized into a Multimedia training, Unimodal training, or Cold Start condition and completed three healthcare management problems. MEASUREMENT AND ANALYSES: Computer/internet experience was measured via questionnaire, and cognitive abilities were assessed using standard neuropsychological tests. Performance metrics included measures of navigation, accuracy and efficiency. Data were analyzed using analysis of variance, χ(2) and regression techniques. The data indicate that there was no difference among the three conditions on measures of accuracy, efficiency, or navigation. However, results of the regression analyses showed that, overall, people who received training performed better on the tasks, as evidenced by greater accuracy and efficiency. Performance was also significantly influenced by prior computer experience and cognitive abilities. Participants with more computer experience and higher cognitive abilities performed better. The findings indicate that training, experience, and abilities are important when using complex health websites. However, training alone is not sufficient. The complexity of web content needs to be considered to ensure successful use of these websites by those with lower abilities.
Factors influencing use of an e-health website in a community sample of older adults
Sharit, Joseph; Lee, Chin Chin; Nair, Sankaran N; Hernández, Mario A; Arana, Neysarí; Fu, Shih Hua
2013-01-01
Objective The use of the internet as a source of health information and link to healthcare services has raised concerns about the ability of consumers, especially vulnerable populations such as older adults, to access these applications. This study examined the influence of training on the ability of adults (aged 45+ years) to use the Medicare.gov website to solve problems related to health management. The influence of computer experience and cognitive abilities on performance was also examined. Design Seventy-one participants, aged 47–92, were randomized into a Multimedia training, Unimodal training, or Cold Start condition and completed three healthcare management problems. Measurement and analyses Computer/internet experience was measured via questionnaire, and cognitive abilities were assessed using standard neuropsychological tests. Performance metrics included measures of navigation, accuracy and efficiency. Data were analyzed using analysis of variance, χ2 and regression techniques. Results The data indicate that there was no difference among the three conditions on measures of accuracy, efficiency, or navigation. However, results of the regression analyses showed that, overall, people who received training performed better on the tasks, as evidenced by greater accuracy and efficiency. Performance was also significantly influenced by prior computer experience and cognitive abilities. Participants with more computer experience and higher cognitive abilities performed better. Conclusions The findings indicate that training, experience, and abilities are important when using complex health websites. However, training alone is not sufficient. The complexity of web content needs to be considered to ensure successful use of these websites by those with lower abilities. PMID:22802269
Correlation and prediction of dynamic human isolated joint strength from lean body mass
NASA Technical Reports Server (NTRS)
Pandya, Abhilash K.; Hasson, Scott M.; Aldridge, Ann M.; Maida, James C.; Woolford, Barbara J.
1992-01-01
A relationship between a person's lean body mass and the amount of maximum torque that can be produced with each isolated joint of the upper extremity was investigated. The maximum dynamic isolated joint torque (upper extremity) on 14 subjects was collected using a dynamometer multi-joint testing unit. These data were reduced to a table of coefficients of second degree polynomials, computed using a least squares regression method. All the coefficients were then organized into look-up tables, a compact and convenient storage/retrieval mechanism for the data set. Data from each joint, direction and velocity, were normalized with respect to that joint's average and merged into files (one for each curve for a particular joint). Regression was performed on each one of these files to derive a table of normalized population curve coefficients for each joint axis, direction, and velocity. In addition, a regression table which included all upper extremity joints was built which related average torque to lean body mass for an individual. These two tables are the basis of the regression model which allows the prediction of dynamic isolated joint torques from an individual's lean body mass.
Designing for deeper learning in a blended computer science course for middle school students
NASA Astrophysics Data System (ADS)
Grover, Shuchi; Pea, Roy; Cooper, Stephen
2015-04-01
The focus of this research was to create and test an introductory computer science course for middle school. Titled "Foundations for Advancing Computational Thinking" (FACT), the course aims to prepare and motivate middle school learners for future engagement with algorithmic problem solving. FACT was also piloted as a seven-week course on Stanford's OpenEdX MOOC platform for blended in-class learning. Unique aspects of FACT include balanced pedagogical designs that address the cognitive, interpersonal, and intrapersonal aspects of "deeper learning"; a focus on pedagogical strategies for mediating and assessing for transfer from block-based to text-based programming; curricular materials for remedying misperceptions of computing; and "systems of assessments" (including formative and summative quizzes and tests, directed as well as open-ended programming assignments, and a transfer test) to get a comprehensive picture of students' deeper computational learning. Empirical investigations, accomplished over two iterations of a design-based research effort with students (aged 11-14 years) in a public school, sought to examine student understanding of algorithmic constructs, and how well students transferred this learning from Scratch to text-based languages. Changes in student perceptions of computing as a discipline were measured. Results and mixed-method analyses revealed that students in both studies (1) achieved substantial learning gains in algorithmic thinking skills, (2) were able to transfer their learning from Scratch to a text-based programming context, and (3) achieved significant growth toward a more mature understanding of computing as a discipline. Factor analyses of prior computing experience, multivariate regression analyses, and qualitative analyses of student projects and artifact-based interviews were conducted to better understand the factors affecting learning outcomes. Prior computing experiences (as measured by a pretest) and math ability were found to be strong predictors of learning outcomes.
Duncan, Dustin T; Kawachi, Ichiro; White, Kellee; Williams, David R
2013-08-01
The geography of recreational open space might be inequitable in terms of minority neighborhood racial/ethnic composition and neighborhood poverty, perhaps due in part to residential segregation. This study evaluated the association between minority neighborhood racial/ethnic composition, neighborhood poverty, and recreational open space in Boston, Massachusetts (US). Across Boston census tracts, we computed percent non-Hispanic Black, percent Hispanic, and percent families in poverty as well as recreational open space density. We evaluated spatial autocorrelation in study variables and in the ordinary least squares (OLS) regression residuals via the Global Moran's I. We then computed Spearman correlations between the census tract socio-demographic characteristics and recreational open space density, including correlations adjusted for spatial autocorrelation. After this, we computed OLS regressions or spatial regressions as appropriate. Significant positive spatial autocorrelation was found for neighborhood socio-demographic characteristics (all p value = 0.001). We found marginally significant positive spatial autocorrelation in recreational open space (Global Moran's I = 0.082; p value = 0.053). However, we found no spatial autocorrelation in the OLS regression residuals, which indicated that spatial models were not appropriate. There was a negative correlation between census tract percent non-Hispanic Black and recreational open space density (r S = -0.22; conventional p value = 0.005; spatially adjusted p value = 0.019) as well as a negative correlation between predominantly non-Hispanic Black census tracts (>60 % non-Hispanic Black in a census tract) and recreational open space density (r S = -0.23; conventional p value = 0.003; spatially adjusted p value = 0.007). In bivariate and multivariate OLS models, percent non-Hispanic Black in a census tract and predominantly Black census tracts were associated with decreased density of recreational open space (p value < 0.001). Consistent with several previous studies in other geographic locales, we found that Black neighborhoods in Boston were less likely to have recreational open spaces, indicating the need for policy interventions promoting equitable access. Such interventions may contribute to reductions and disparities in obesity.
Engagement, Persistence, and Gender in Computer Science: Results of a Smartphone ESM Study.
Milesi, Carolina; Perez-Felkner, Lara; Brown, Kevin; Schneider, Barbara
2017-01-01
While the underrepresentation of women in the fast-growing STEM field of computer science (CS) has been much studied, no consensus exists on the key factors influencing this widening gender gap. Possible suspects include gender differences in aptitude, interest, and academic environment. Our study contributes to this literature by applying student engagement research to study the experiences of college students studying CS, to assess the degree to which differences in men and women's engagement may help account for gender inequity in the field. Specifically, we use the Experience Sampling Method (ESM) to evaluate in real-time the engagement of college students during varied activities and environments. Over the course of a full week in fall semester and a full week in spring semester, 165 students majoring in CS at two Research I universities were "beeped" several times a day via a smartphone app prompting them to fill out a short questionnaire including open-ended and scaled items. These responses were paired with administrative and over 2 years of transcript data provided by their institutions. We used mean comparisons and logistic regression analysis to compare enrollment and persistence patterns among CS men and women. Results suggest that despite the obstacles associated with women's underrepresentation in computer science, women are more likely to continue taking computer science courses when they felt challenged and skilled in their initial computer science classes. We discuss implications for further research.
Traffic flow forecasting using approximate nearest neighbor nonparametric regression
DOT National Transportation Integrated Search
2000-12-01
The purpose of this research is to enhance nonparametric regression (NPR) for use in real-time systems by first reducing execution time using advanced data structures and imprecise computations and then developing a methodology for applying NPR. Due ...
The UK Military Experience of Thoracic Injury in the Wars in Iraq and Afghanistan
2013-01-01
investigations including computed tomography (CT), laboratory and blood bank. A Role 4 hospital is a fixed capability in the home nation capable of providing full...not an independent predictor of mortality in our model. Goodness of the logistic regression model fit was demonstrated using a Hosmer and Lemeshow test...of good practice and ethical care; thus we believe the hidden mortality is minimal. It is possible that in some circumstances, the desire to do
Computing Science and Statistics: Volume 24. Graphics and Visualization
1993-03-20
r, is set to 3.569, the population examples include: kneading ingredients into a bread eventually oscillates about 16 fixed values. However the dough ...34fun statistics". My goal is to offer leagues I said in jest "After all, regression analysis is you the equivalent of a fortune cookie which clearly is... cookie of the night reads: One problem that statisticians traditionally seem to "You have good friends who will come to your aid in have is that they
NASA Astrophysics Data System (ADS)
Fei, Cheng-Wei; Bai, Guang-Chen
2014-12-01
To improve the computational precision and efficiency of probabilistic design for mechanical dynamic assembly like the blade-tip radial running clearance (BTRRC) of gas turbine, a distribution collaborative probabilistic design method-based support vector machine of regression (SR)(called as DCSRM) is proposed by integrating distribution collaborative response surface method and support vector machine regression model. The mathematical model of DCSRM is established and the probabilistic design idea of DCSRM is introduced. The dynamic assembly probabilistic design of aeroengine high-pressure turbine (HPT) BTRRC is accomplished to verify the proposed DCSRM. The analysis results reveal that the optimal static blade-tip clearance of HPT is gained for designing BTRRC, and improving the performance and reliability of aeroengine. The comparison of methods shows that the DCSRM has high computational accuracy and high computational efficiency in BTRRC probabilistic analysis. The present research offers an effective way for the reliability design of mechanical dynamic assembly and enriches mechanical reliability theory and method.
New insights into faster computation of uncertainties
NASA Astrophysics Data System (ADS)
Bhattacharya, Atreyee
2012-11-01
Heavy computation power, lengthy simulations, and an exhaustive number of model runs—often these seem like the only statistical tools that scientists have at their disposal when computing uncertainties associated with predictions, particularly in cases of environmental processes such as groundwater movement. However, calculation of uncertainties need not be as lengthy, a new study shows. Comparing two approaches—the classical Bayesian “credible interval” and a less commonly used regression-based “confidence interval” method—Lu et al. show that for many practical purposes both methods provide similar estimates of uncertainties. The advantage of the regression method is that it demands 10-1000 model runs, whereas the classical Bayesian approach requires 10,000 to millions of model runs.
Thieler, E. Robert; Himmelstoss, Emily A.; Zichichi, Jessica L.; Ergul, Ayhan
2009-01-01
The Digital Shoreline Analysis System (DSAS) version 4.0 is a software extension to ESRI ArcGIS v.9.2 and above that enables a user to calculate shoreline rate-of-change statistics from multiple historic shoreline positions. A user-friendly interface of simple buttons and menus guides the user through the major steps of shoreline change analysis. Components of the extension and user guide include (1) instruction on the proper way to define a reference baseline for measurements, (2) automated and manual generation of measurement transects and metadata based on user-specified parameters, and (3) output of calculated rates of shoreline change and other statistical information. DSAS computes shoreline rates of change using four different methods: (1) endpoint rate, (2) simple linear regression, (3) weighted linear regression, and (4) least median of squares. The standard error, correlation coefficient, and confidence interval are also computed for the simple and weighted linear-regression methods. The results of all rate calculations are output to a table that can be linked to the transect file by a common attribute field. DSAS is intended to facilitate the shoreline change-calculation process and to provide rate-of-change information and the statistical data necessary to establish the reliability of the calculated results. The software is also suitable for any generic application that calculates positional change over time, such as assessing rates of change of glacier limits in sequential aerial photos, river edge boundaries, land-cover changes, and so on.
Age estimation using pulp/tooth area ratio in maxillary canines-A digital image analysis.
Juneja, Manjushree; Devi, Yashoda B K; Rakesh, N; Juneja, Saurabh
2014-09-01
Determination of age of a subject is one of the most important aspects of medico-legal cases and anthropological research. Radiographs can be used to indirectly measure the rate of secondary dentine deposition which is depicted by reduction in the pulp area. In this study, 200 patients of Karnataka aged between 18-72 years were selected for the study. Panoramic radiographs were made and indirectly digitized. Radiographic images of maxillary canines (RIC) were processed using a computer-aided drafting program (ImageJ). The variables pulp/root length (p), pulp/tooth length (r), pulp/root width at enamel-cementum junction (ECJ) level (a), pulp/root width at mid-root level (c), pulp/root width at midpoint level between ECJ level and mid-root level (b) and pulp/tooth area ratio (AR) were recorded. All the morphological variables including gender were statistically analyzed to derive regression equation for estimation of age. It was observed that 2 variables 'AR' and 'b' contributed significantly to the fit and were included in the regression model, yielding the formula: Age = 87.305-480.455(AR)+48.108(b). Statistical analysis indicated that the regression equation with selected variables explained 96% of total variance with the median of the residuals of 0.1614 years and standard error of estimate of 3.0186 years. There is significant correlation between age and morphological variables 'AR' and 'b' and the derived population specific regression equation can be potentially used for estimation of chronological age of individuals of Karnataka origin.
Short-term outcome of 1,465 computer-navigated primary total knee replacements 2005-2008.
Gøthesen, Oystein; Espehaug, Birgitte; Havelin, Leif; Petursson, Gunnar; Furnes, Ove
2011-06-01
and purpose Improvement of positioning and alignment by the use of computer-assisted surgery (CAS) might improve longevity and function in total knee replacements, but there is little evidence. In this study, we evaluated the short-term results of computer-navigated knee replacements based on data from the Norwegian Arthroplasty Register. Primary total knee replacements without patella resurfacing, reported to the Norwegian Arthroplasty Register during the years 2005-2008, were evaluated. The 5 most common implants and the 3 most common navigation systems were selected. Cemented, uncemented, and hybrid knees were included. With the risk of revision for any cause as the primary endpoint and intraoperative complications and operating time as secondary outcomes, 1,465 computer-navigated knee replacements (CAS) and 8,214 conventionally operated knee replacements (CON) were compared. Kaplan-Meier survival analysis and Cox regression analysis with adjustment for age, sex, prosthesis brand, fixation method, previous knee surgery, preoperative diagnosis, and ASA category were used. Kaplan-Meier estimated survival at 2 years was 98% (95% CI: 97.5-98.3) in the CON group and 96% (95% CI: 95.0-97.8) in the CAS group. The adjusted Cox regression analysis showed a higher risk of revision in the CAS group (RR = 1.7, 95% CI: 1.1-2.5; p = 0.02). The LCS Complete knee had a higher risk of revision with CAS than with CON (RR = 2.1, 95% CI: 1.3-3.4; p = 0.004)). The differences were not statistically significant for the other prosthesis brands. Mean operating time was 15 min longer in the CAS group. With the introduction of computer-navigated knee replacement surgery in Norway, the short-term risk of revision has increased for computer-navigated replacement with the LCS Complete. The mechanisms of failure of these implantations should be explored in greater depth, and in this study we have not been able to draw conclusions regarding causation.
Some comparisons of complexity in dictionary-based and linear computational models.
Gnecco, Giorgio; Kůrková, Věra; Sanguineti, Marcello
2011-03-01
Neural networks provide a more flexible approximation of functions than traditional linear regression. In the latter, one can only adjust the coefficients in linear combinations of fixed sets of functions, such as orthogonal polynomials or Hermite functions, while for neural networks, one may also adjust the parameters of the functions which are being combined. However, some useful properties of linear approximators (such as uniqueness, homogeneity, and continuity of best approximation operators) are not satisfied by neural networks. Moreover, optimization of parameters in neural networks becomes more difficult than in linear regression. Experimental results suggest that these drawbacks of neural networks are offset by substantially lower model complexity, allowing accuracy of approximation even in high-dimensional cases. We give some theoretical results comparing requirements on model complexity for two types of approximators, the traditional linear ones and so called variable-basis types, which include neural networks, radial, and kernel models. We compare upper bounds on worst-case errors in variable-basis approximation with lower bounds on such errors for any linear approximator. Using methods from nonlinear approximation and integral representations tailored to computational units, we describe some cases where neural networks outperform any linear approximator. Copyright © 2010 Elsevier Ltd. All rights reserved.
Perry, Charles A.; Wolock, David M.; Artman, Joshua C.
2004-01-01
Streamflow statistics of flow duration and peak-discharge frequency were estimated for 4,771 individual locations on streams listed on the 1999 Kansas Surface Water Register. These statistics included the flow-duration values of 90, 75, 50, 25, and 10 percent, as well as the mean flow value. Peak-discharge frequency values were estimated for the 2-, 5-, 10-, 25-, 50-, and 100-year floods. Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating flow-duration values of 90, 75, 50, 25, and 10 percent and the mean flow for uncontrolled flow stream locations. The contributing-drainage areas of 149 U.S. Geological Survey streamflow-gaging stations in Kansas and parts of surrounding States that had flow uncontrolled by Federal reservoirs and used in the regression analyses ranged from 2.06 to 12,004 square miles. Logarithmic transformations of climatic and basin data were performed to yield the best linear relation for developing equations to compute flow durations and mean flow. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were contributing-drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. The analyses yielded a model standard error of prediction range of 0.43 logarithmic units for the 90-percent duration analysis to 0.15 logarithmic units for the 10-percent duration analysis. The model standard error of prediction was 0.14 logarithmic units for the mean flow. Regression equations used to estimate peak-discharge frequency values were obtained from a previous report, and estimates for the 2-, 5-, 10-, 25-, 50-, and 100-year floods were determined for this report. The regression equations and an interpolation procedure were used to compute flow durations, mean flow, and estimates of peak-discharge frequency for locations along uncontrolled flow streams on the 1999 Kansas Surface Water Register. Flow durations, mean flow, and peak-discharge frequency values determined at available gaging stations were used to interpolate the regression-estimated flows for the stream locations where available. Streamflow statistics for locations that had uncontrolled flow were interpolated using data from gaging stations weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled reaches of Kansas streams, the streamflow statistics were interpolated between gaging stations using only gaged data weighted by drainage area.
NASA Technical Reports Server (NTRS)
Rummler, D. R.
1976-01-01
The results are presented of investigations to apply regression techniques to the development of methodology for creep-rupture data analysis. Regression analysis techniques are applied to the explicit description of the creep behavior of materials for space shuttle thermal protection systems. A regression analysis technique is compared with five parametric methods for analyzing three simulated and twenty real data sets, and a computer program for the evaluation of creep-rupture data is presented.
Effect of contact lens use on Computer Vision Syndrome.
Tauste, Ana; Ronda, Elena; Molina, María-José; Seguí, Mar
2016-03-01
To analyse the relationship between Computer Vision Syndrome (CVS) in computer workers and contact lens use, according to lens materials. Cross-sectional study. The study included 426 civil-service office workers, of whom 22% were contact lens wearers. Workers completed the Computer Vision Syndrome Questionnaire (CVS-Q) and provided information on their contact lenses and exposure to video display terminals (VDT) at work. CVS was defined as a CVS-Q score of 6 or more. The covariates were age and sex. Logistic regression was used to calculate the association (crude and adjusted for age and sex) between CVS and individual and work-related factors, and between CVS and contact lens type. Contact lens wearers are more likely to suffer CVS than non-lens wearers, with a prevalence of 65% vs 50%. Workers who wear contact lenses and are exposed to the computer for more than 6 h day(-1) are more likely to suffer CVS than non-lens wearers working at the computer for the same amount of time (aOR = 4.85; 95% CI, 1.25-18.80; p = 0.02). Regular contact lens use increases CVS after 6 h of computer work. © 2016 The Authors Ophthalmic & Physiological Optics © 2016 The College of Optometrists.
Eash, David A.; Barnes, Kimberlee K.
2017-01-01
A statewide study was conducted to develop regression equations for estimating six selected low-flow frequency statistics and harmonic mean flows for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include: the annual 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years, the annual 30-day mean low flow for a recurrence interval of 5 years, and the seasonal (October 1 through December 31) 1- and 7-day mean low flows for a recurrence interval of 10 years. Estimation equations also were developed for the harmonic-mean-flow statistic. Estimates of these seven selected statistics are provided for 208 U.S. Geological Survey continuous-record streamgages using data through September 30, 2006. The study area comprises streamgages located within Iowa and 50 miles beyond the State's borders. Because trend analyses indicated statistically significant positive trends when considering the entire period of record for the majority of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. The median number of years of record used to compute each of these seven selected statistics was 35. Geographic information system software was used to measure 54 selected basin characteristics for each streamgage. Following the removal of two streamgages from the initial data set, data collected for 206 streamgages were compiled to investigate three approaches for regionalization of the seven selected statistics. Regionalization, a process using statistical regression analysis, provides a relation for efficiently transferring information from a group of streamgages in a region to ungaged sites in the region. The three regionalization approaches tested included statewide, regional, and region-of-influence regressions. For the regional regression, the study area was divided into three low-flow regions on the basis of hydrologic characteristics, landform regions, and soil regions. A comparison of root mean square errors and average standard errors of prediction for the statewide, regional, and region-of-influence regressions determined that the regional regression provided the best estimates of the seven selected statistics at ungaged sites in Iowa. Because a significant number of streams in Iowa reach zero flow as their minimum flow during low-flow years, four different types of regression analyses were used: left-censored, logistic, generalized-least-squares, and weighted-least-squares regression. A total of 192 streamgages were included in the development of 27 regression equations for the three low-flow regions. For the northeast and northwest regions, a censoring threshold was used to develop 12 left-censored regression equations to estimate the 6 low-flow frequency statistics for each region. For the southern region a total of 12 regression equations were developed; 6 logistic regression equations were developed to estimate the probability of zero flow for the 6 low-flow frequency statistics and 6 generalized least-squares regression equations were developed to estimate the 6 low-flow frequency statistics, if nonzero flow is estimated first by use of the logistic equations. A weighted-least-squares regression equation was developed for each region to estimate the harmonic-mean-flow statistic. Average standard errors of estimate for the left-censored equations for the northeast region range from 64.7 to 88.1 percent and for the northwest region range from 85.8 to 111.8 percent. Misclassification percentages for the logistic equations for the southern region range from 5.6 to 14.0 percent. Average standard errors of prediction for generalized least-squares equations for the southern region range from 71.7 to 98.9 percent and pseudo coefficients of determination for the generalized-least-squares equations range from 87.7 to 91.8 percent. Average standard errors of prediction for weighted-least-squares equations developed for estimating the harmonic-mean-flow statistic for each of the three regions range from 66.4 to 80.4 percent. The regression equations are applicable only to stream sites in Iowa with low flows not significantly affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. If the equations are used at ungaged sites on regulated streams, or on streams affected by water-supply and agricultural withdrawals, then the estimates will need to be adjusted by the amount of regulation or withdrawal to estimate the actual flow conditions if that is of interest. Caution is advised when applying the equations for basins with characteristics near the applicable limits of the equations and for basins located in karst topography. A test of two drainage-area ratio methods using 31 pairs of streamgages, for the annual 7-day mean low-flow statistic for a recurrence interval of 10 years, indicates a weighted drainage-area ratio method provides better estimates than regional regression equations for an ungaged site on a gaged stream in Iowa when the drainage-area ratio is between 0.5 and 1.4. These regression equations will be implemented within the U.S. Geological Survey StreamStats web-based geographic-information-system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the seven selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these seven selected statistics are provided for the streamgage.
Application of linear regression analysis in accuracy assessment of rolling force calculations
NASA Astrophysics Data System (ADS)
Poliak, E. I.; Shim, M. K.; Kim, G. S.; Choo, W. Y.
1998-10-01
Efficient operation of the computational models employed in process control systems require periodical assessment of the accuracy of their predictions. Linear regression is proposed as a tool which allows separate systematic and random prediction errors from those related to measurements. A quantitative characteristic of the model predictive ability is introduced in addition to standard statistical tests for model adequacy. Rolling force calculations are considered as an example for the application. However, the outlined approach can be used to assess the performance of any computational model.
Estimating Flow-Duration and Low-Flow Frequency Statistics for Unregulated Streams in Oregon
Risley, John; Stonewall, Adam J.; Haluska, Tana
2008-01-01
Flow statistical datasets, basin-characteristic datasets, and regression equations were developed to provide decision makers with surface-water information needed for activities such as water-quality regulation, water-rights adjudication, biological habitat assessment, infrastructure design, and water-supply planning and management. The flow statistics, which included annual and monthly period of record flow durations (5th, 10th, 25th, 50th, and 95th percent exceedances) and annual and monthly 7-day, 10-year (7Q10) and 7-day, 2-year (7Q2) low flows, were computed at 466 streamflow-gaging stations at sites with unregulated flow conditions throughout Oregon and adjacent areas of neighboring States. Regression equations, created from the flow statistics and basin characteristics of the stations, can be used to estimate flow statistics at ungaged stream sites in Oregon. The study area was divided into 10 regression modeling regions based on ecological, topographic, geologic, hydrologic, and climatic criteria. In total, 910 annual and monthly regression equations were created to predict the 7 flow statistics in the 10 regions. Equations to predict the five flow-duration exceedance percentages and the two low-flow frequency statistics were created with Ordinary Least Squares and Generalized Least Squares regression, respectively. The standard errors of estimate of the equations created to predict the 5th and 95th percent exceedances had medians of 42.4 and 64.4 percent, respectively. The standard errors of prediction of the equations created to predict the 7Q2 and 7Q10 low-flow statistics had medians of 51.7 and 61.2 percent, respectively. Standard errors for regression equations for sites in western Oregon were smaller than those in eastern Oregon partly because of a greater density of available streamflow-gaging stations in western Oregon than eastern Oregon. High-flow regression equations (such as the 5th and 10th percent exceedances) also generally were more accurate than the low-flow regression equations (such as the 95th percent exceedance and 7Q10 low-flow statistic). The regression equations predict unregulated flow conditions in Oregon. Flow estimates need to be adjusted if they are used at ungaged sites that are regulated by reservoirs or affected by water-supply and agricultural withdrawals if actual flow conditions are of interest. The regression equations are installed in the USGS StreamStats Web-based tool (http://water.usgs.gov/osw/streamstats/index.html, accessed July 16, 2008). StreamStats provides users with a set of annual and monthly flow-duration and low-flow frequency estimates for ungaged sites in Oregon in addition to the basin characteristics for the sites. Prediction intervals at the 90-percent confidence level also are automatically computed.
Sitting Time in Adults 65 Years and Over: Behavior, Knowledge, and Intentions to Change.
Alley, Stephanie; van Uffelen, Jannique G Z; Duncan, Mitch J; De Cocker, Katrien; Schoeppe, Stephanie; Rebar, Amanda L; Vandelanotte, Corneel
2018-04-01
This study examined sitting time, knowledge, and intentions to change sitting time in older adults. An online survey was completed by 494 Australians aged 65+. Average daily sitting was high (9.0 hr). Daily sitting time was the highest during TV (3.3 hr), computer (2.1 hr), and leisure (1.7 hr). A regression analysis demonstrated that women were more knowledgeable about the health risks of sitting compared to men. The percentage of older adults intending to sit less were the highest for TV (24%), leisure (24%), and computer (19%) sitting time. Regression analyses demonstrated that intentions varied by gender (for TV sitting), education (leisure and work sitting), body mass index (computer, leisure, and transport sitting), and physical activity (TV, computer, and leisure sitting). Interventions should target older adults' TV, computer, and leisure time sitting, with a focus on intentions in older males and older adults with low education, those who are active, and those with a normal weight.
NASA Astrophysics Data System (ADS)
Aygunes, Gunes
2017-07-01
The objective of this paper is to survey and determine the macroeconomic factors affecting the level of venture capital (VC) investments in a country. The literary depends on venture capitalists' quality and countries' venture capital investments. The aim of this paper is to give relationship between venture capital investment and macro economic variables via statistical computation method. We investigate the countries and macro economic variables. By using statistical computation method, we derive correlation between venture capital investments and macro economic variables. According to method of logistic regression model (logit regression or logit model), macro economic variables are correlated with each other in three group. Venture capitalists regard correlations as a indicator. Finally, we give correlation matrix of our results.
ERIC Educational Resources Information Center
Tuncer, Murat
2013-01-01
Present research investigates reciprocal relations amidst computer self-efficacy, scientific research and information literacy self-efficacy. Research findings have demonstrated that according to standardized regression coefficients, computer self-efficacy has a positive effect on information literacy self-efficacy. Likewise it has been detected…
Verifying the Simulation Hypothesis via Infinite Nested Universe Simulacrum Loops
NASA Astrophysics Data System (ADS)
Sharma, Vikrant
2017-01-01
The simulation hypothesis proposes that local reality exists as a simulacrum within a hypothetical computer's dimension. More specifically, Bostrom's trilemma proposes that the number of simulations an advanced 'posthuman' civilization could produce makes the proposition very likely. In this paper a hypothetical method to verify the simulation hypothesis is discussed using infinite regression applied to a new type of infinite loop. Assign dimension n to any computer in our present reality, where dimension signifies the hierarchical level in nested simulations our reality exists in. A computer simulating known reality would be dimension (n-1), and likewise a computer simulating an artificial reality, such as a video game, would be dimension (n +1). In this method, among others, four key assumptions are made about the nature of the original computer dimension n. Summations show that regressing such a reality infinitely will create convergence, implying that the verification of whether local reality is a grand simulation is feasible to detect with adequate compute capability. The action of reaching said convergence point halts the simulation of local reality. Sensitivities to the four assumptions and implications are discussed.
Edelson, Lisa R; Mathias, Kevin C; Fulgoni, Victor L; Karagounis, Leonidas G
2016-02-04
Physical strength is associated with improved health outcomes in children. Heavier children tend to have lower functional strength and mobility. Physical activity can increase children's strength, but it is unknown how different types of electronic media use impact physical strength. Data from the NHANES National Youth Fitness Survey (NNYFS) from children ages 6-15 were analyzed in this study. Regression models were conducted to determine if screen-based sedentary behaviors (television viewing time, computer/video game time) were associated with strength measures (grip, leg extensions, modified pull-ups, plank) while controlling for potential confounders including child age, sex, BMI z-score, and days per week with 60+ minutes of physical activity. Grip strength and leg extensions divided by body weight were analyzed to provide measures of relative strength together with pull-ups and plank, which require lifting the body. The results from the regression models showed the hypothesized inverse association between TV time and all strength measures. Computer time was only significantly inversely associated with the ability to do one or more pull-ups. This study shows that television viewing, but not computer/videogames, is inversely associated with measures of child strength while controlling for child characteristics and physical activity. These findings suggest that "screen time" may not be a unified construct with respect to strength outcomes and that further exploration of the potential benefits of reducing television time on children's strength and related mobility is needed.
Liu, Fang; Eugenio, Evercita C
2018-04-01
Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.
Development of surrogate models for the prediction of the flow around an aircraft propeller
NASA Astrophysics Data System (ADS)
Salpigidou, Christina; Misirlis, Dimitris; Vlahostergios, Zinon; Yakinthos, Kyros
2018-05-01
In the present work, the derivation of two surrogate models (SMs) for modelling the flow around a propeller for small aircrafts is presented. Both methodologies use derived functions based on computations with the detailed propeller geometry. The computations were performed using k-ω shear stress transport for modelling turbulence. In the SMs, the modelling of the propeller was performed in a computational domain of disk-like geometry, where source terms were introduced in the momentum equations. In the first SM, the source terms were polynomial functions of swirl and thrust, mainly related to the propeller radius. In the second SM, regression analysis was used to correlate the source terms with the velocity distribution through the propeller. The proposed SMs achieved faster convergence, in relation to the detail model, by providing also results closer to the available operational data. The regression-based model was the most accurate and required less computational time for convergence.
Toy, Brian C; Krishnadev, Nupura; Indaram, Maanasa; Cunningham, Denise; Cukras, Catherine A; Chew, Emily Y; Wong, Wai T
2013-09-01
To investigate the association of spontaneous drusen regression in intermediate age-related macular degeneration (AMD) with changes on fundus photography and fundus autofluorescence (FAF) imaging. Prospective observational case series. Fundus images from 58 eyes (in 58 patients) with intermediate AMD and large drusen were assessed over 2 years for areas of drusen regression that exceeded the area of circle C1 (diameter 125 μm; Age-Related Eye Disease Study grading protocol). Manual segmentation and computer-based image analysis were used to detect and delineate areas of drusen regression. Delineated regions were graded as to their appearance on fundus photographs and FAF images, and changes in FAF signal were graded manually and quantitated using automated image analysis. Drusen regression was detected in approximately half of study eyes using manual (48%) and computer-assisted (50%) techniques. At year-2, the clinical appearance of areas of drusen regression on fundus photography was mostly unremarkable, with a majority of eyes (71%) demonstrating no detectable clinical abnormalities, and the remainder (29%) showing minor pigmentary changes. However, drusen regression areas were associated with local changes in FAF that were significantly more prominent than changes on fundus photography. A majority of eyes (64%-66%) demonstrated a predominant decrease in overall FAF signal, while 14%-21% of eyes demonstrated a predominant increase in overall FAF signal. FAF imaging demonstrated that drusen regression in intermediate AMD was often accompanied by changes in local autofluorescence signal. Drusen regression may be associated with concurrent structural and physiologic changes in the outer retina. Published by Elsevier Inc.
Campos-Filho, N; Franco, E L
1989-02-01
A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.
Radmanesh, Farid; Falcone, Guido J; Anderson, Christopher D; Battey, Thomas W K; Ayres, Alison M; Vashkevich, Anastasia; McNamara, Kristen A; Schwab, Kristin; Romero, Javier M; Viswanathan, Anand; Greenberg, Steven M; Goldstein, Joshua N; Rosand, Jonathan; Brouwers, H Bart
2014-06-01
Patients with intracerebral hemorrhage (ICH) who present with a spot sign on computed tomography angiography are at increased risk of hematoma expansion and poor outcome. Because primary ICH is the acute manifestation of chronic cerebral small vessel disease, we investigated whether different clinical or imaging characteristics predict spot sign presence, using ICH location as a surrogate for arteriolosclerosis- and cerebral amyloid angiopathy-related ICH. Patients with primary ICH and available computed tomography angiography at presentation were included. Predictors of spot sign were assessed using uni- and multivariable regression, stratified by ICH location. Seven hundred forty-one patients were eligible, 335 (45%) deep and 406 (55%) lobar ICH. At least one spot sign was present in 76 (23%) deep and 102 (25%) lobar ICH patients. In multivariable regression, warfarin (odds ratio [OR], 2.42; 95% confidence interval [CI], 1.01-5.71; P=0.04), baseline ICH volume (OR, 1.20; 95% CI, 1.09-1.33, per 10 mL increase; P<0.001), and time from symptom onset to computed tomography angiography (OR, 0.89; 95% CI, 0.80-0.96, per hour; P=0.009) were associated with the spot sign in deep ICH. Predictors of spot sign in lobar ICH were warfarin (OR, 3.95; 95% CI, 1.87-8.51; P<0.001) and baseline ICH volume (OR, 1.20; 95% CI, 1.10-1.31, per 10 mL increase; P<0.001). The most potent associations with spot sign are shared between deep and lobar ICH, suggesting that the acute bleeding process that arises in the setting of different chronic small vessel diseases shares commonalities. © 2014 American Heart Association, Inc.
Lifestyle and health-related quality of life: a cross-sectional study among civil servants in China.
Xu, Jun; Qiu, Jincai; Chen, Jie; Zou, Liai; Feng, Liyi; Lu, Yan; Wei, Qian; Zhang, Jinhua
2012-05-04
Health-related quality of life (HRQoL) has been increasingly acknowledged as a valid and appropriate indicator of public health and chronic morbidity. However, limited research was conducted among Chinese civil servants owing to the different lifestyle. The aim of the study was to evaluate the HRQoL among Chinese civil servants and to identify factors might be associated with their HRQoL. A cross-sectional study was conducted to investigate HRQoL of 15,000 civil servants in China using stratified random sampling methods. Independent-Samples t-Test, one-way ANOVA, and multiple stepwise regression were used to analyse the influencing factors and the HRQoL of the civil servants. A univariate analysis showed that there were significant differences among physical component summary (PCS), mental component summary (MCS), and TS between lifestyle factors, such as smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness (P < 0.05). Multiple stepwise regressions showed that there were significant differences among TS between lifestyle factors, such as breakfast, sleep time, physical exercise, operating computers, sedentariness, work time, and drinking (P < 0.05). In this study, using Short Form 36 items (SF-36), we assessed the association of HRQoL with lifestyle factors, including smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness in China. The performance of the questionnaire in the large-scale survey is satisfactory and provides a large picture of the HRQoL status in Chinese civil servants. Our results indicate that lifestyle factors such as smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness affect the HRQoL of civil servants in China.
Lifestyle and health-related quality of life: A cross-sectional study among civil servants in China
2012-01-01
Background Health-related quality of life (HRQoL) has been increasingly acknowledged as a valid and appropriate indicator of public health and chronic morbidity. However, limited research was conducted among Chinese civil servants owing to the different lifestyle. The aim of the study was to evaluate the HRQoL among Chinese civil servants and to identify factors might be associated with their HRQoL. Methods A cross-sectional study was conducted to investigate HRQoL of 15,000 civil servants in China using stratified random sampling methods. Independent-Samples t-Test, one-way ANOVA, and multiple stepwise regression were used to analyse the influencing factors and the HRQoL of the civil servants. Results A univariate analysis showed that there were significant differences among physical component summary (PCS), mental component summary (MCS), and TS between lifestyle factors, such as smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness (P < 0.05). Multiple stepwise regressions showed that there were significant differences among TS between lifestyle factors, such as breakfast, sleep time, physical exercise, operating computers, sedentariness, work time, and drinking (P < 0.05). Conclusion In this study, using Short Form 36 items (SF-36), we assessed the association of HRQoL with lifestyle factors, including smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness in China. The performance of the questionnaire in the large-scale survey is satisfactory and provides a large picture of the HRQoL status in Chinese civil servants. Our results indicate that lifestyle factors such as smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness affect the HRQoL of civil servants in China. PMID:22559315
NASA Technical Reports Server (NTRS)
Hollyday, E. F. (Principal Investigator)
1975-01-01
The author has identified the following significant results. Streamflow characteristics in the Delmarva Peninsula derived from the records of daily discharge of 20 gaged basins are representative of the full range in flow conditions and include all of those commonly used for design or planning purposes. They include annual flood peaks with recurrence intervals of 2, 5, 10, 25, and 50 years, mean annual discharge, standard deviation of the mean annual discharge, mean monthly discharges, standard deviation of the mean monthly discharges, low-flow characteristics, flood volume characteristics, and the discharge equalled or exceeded 50 percent of the time. Streamflow and basin characteristics were related by a technique of multiple regression using a digital computer. A control group of equations was computed using basin characteristics derived from maps and climatological records. An experimental group of equations was computed using basin characteristics derived from LANDSAT imagery as well as from maps and climatological records. Based on a reduction in standard error of estimate equal to or greater than 10 percent, the equations for 12 stream flow characteristics were substantially improved by adding to the analyses basin characteristics derived from LANDSAT imagery.
Shinozuka, Jun; Awaguni, Hitoshi; Tanaka, Shin-Ichiro; Makino, Shigeru; Maruyama, Rikken; Inaba, Tohru; Imashuku, Shinsaku
2016-07-01
Pulmonary nodules associated with Epstein-Barr virus (EBV)-related atypical infectious mononucleosis have rarely been described. A 12-year-old Japanese boy, upon admission, revealed multiple small round nodules (a total of 7 nodules in 4 to 8 mm size) in the lungs on computed tomography. The hemorrhagic pharyngeal tonsils with hot signals on 18F-fluorodeoxyglucose-positron emission tomography-computed tomography were biopsied revealing the presence of EBV-encoded small nuclear RNA (EBER)-positive cells; however, no lymphoma was noted. The patient was diagnosed as having atypical EBV-infectious mononucleosis associated with primary EBV infection. Pulmonary nodules markedly reduced in numbers and sizes spontaneously over a 2-year period. Differential diagnosis of pulmonary nodules in childhood should include atypical EBV infection.
A FORTRAN program for multivariate survival analysis on the personal computer.
Mulder, P G
1988-01-01
In this paper a FORTRAN program is presented for multivariate survival or life table regression analysis in a competing risks' situation. The relevant failure rate (for example, a particular disease or mortality rate) is modelled as a log-linear function of a vector of (possibly time-dependent) explanatory variables. The explanatory variables may also include the variable time itself, which is useful for parameterizing piecewise exponential time-to-failure distributions in a Gompertz-like or Weibull-like way as a more efficient alternative to Cox's proportional hazards model. Maximum likelihood estimates of the coefficients of the log-linear relationship are obtained from the iterative Newton-Raphson method. The program runs on a personal computer under DOS; running time is quite acceptable, even for large samples.
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
On the use of log-transformation vs. nonlinear regression for analyzing biological power laws
Xiao, X.; White, E.P.; Hooten, M.B.; Durham, S.L.
2011-01-01
Power-law relationships are among the most well-studied functional relationships in biology. Recently the common practice of fitting power laws using linear regression (LR) on log-transformed data has been criticized, calling into question the conclusions of hundreds of studies. It has been suggested that nonlinear regression (NLR) is preferable, but no rigorous comparison of these two methods has been conducted. Using Monte Carlo simulations, we demonstrate that the error distribution determines which method performs better, with NLR better characterizing data with additive, homoscedastic, normal error and LR better characterizing data with multiplicative, heteroscedastic, lognormal error. Analysis of 471 biological power laws shows that both forms of error occur in nature. While previous analyses based on log-transformation appear to be generally valid, future analyses should choose methods based on a combination of biological plausibility and analysis of the error distribution. We provide detailed guidelines and associated computer code for doing so, including a model averaging approach for cases where the error structure is uncertain. ?? 2011 by the Ecological Society of America.
2014-01-01
Background Meta-regression is becoming increasingly used to model study level covariate effects. However this type of statistical analysis presents many difficulties and challenges. Here two methods for calculating confidence intervals for the magnitude of the residual between-study variance in random effects meta-regression models are developed. A further suggestion for calculating credible intervals using informative prior distributions for the residual between-study variance is presented. Methods Two recently proposed and, under the assumptions of the random effects model, exact methods for constructing confidence intervals for the between-study variance in random effects meta-analyses are extended to the meta-regression setting. The use of Generalised Cochran heterogeneity statistics is extended to the meta-regression setting and a Newton-Raphson procedure is developed to implement the Q profile method for meta-analysis and meta-regression. WinBUGS is used to implement informative priors for the residual between-study variance in the context of Bayesian meta-regressions. Results Results are obtained for two contrasting examples, where the first example involves a binary covariate and the second involves a continuous covariate. Intervals for the residual between-study variance are wide for both examples. Conclusions Statistical methods, and R computer software, are available to compute exact confidence intervals for the residual between-study variance under the random effects model for meta-regression. These frequentist methods are almost as easily implemented as their established counterparts for meta-analysis. Bayesian meta-regressions are also easily performed by analysts who are comfortable using WinBUGS. Estimates of the residual between-study variance in random effects meta-regressions should be routinely reported and accompanied by some measure of their uncertainty. Confidence and/or credible intervals are well-suited to this purpose. PMID:25196829
Santolaria, P; Vicente-Fiel, S; Palacín, I; Fantova, E; Blasco, M E; Silvestre, M A; Yániz, J L
2015-12-01
This study was designed to evaluate the relevance of several sperm quality parameters and sperm population structure on the reproductive performance after cervical artificial insemination (AI) in sheep. One hundred and thirty-nine ejaculates from 56 adult rams were collected using an artificial vagina, processed for sperm quality assessment and used to perform 1319 AI. Analyses of sperm motility by computer-assisted sperm analysis (CASA), sperm nuclear morphometry by computer-assisted sperm morphometry analysis (CASMA), membrane integrity by acridine orange-propidium iodide combination and sperm DNA fragmentation using the sperm chromatin dispersion test (SCD) were performed. Clustering procedures using the sperm kinematic and morphometric data resulted in the classification of spermatozoa into three kinematic and three morphometric sperm subpopulations. Logistic regression procedures were used, including fertility at AI as the dependent variable (measured by lambing, 0 or 1) and farm, year, month of AI, female parity, female lambing-treatment interval, ram, AI technician and sperm quality parameters (including sperm subpopulations) as independent factors. Sperm quality variables remaining in the logistic regression model were viability and VCL. Fertility increased for each one-unit increase in viability (by a factor of 1.01) and in VCL (by a factor of 1.02). Multiple linear regression analyses were also performed to analyze the factors possibly influencing ejaculate fertility (N=139). The analysis yielded a significant (P<0.05) relationship between sperm viability and ejaculate fertility. The discriminant ability of the different semen variables to predict field fertility was analyzed using receiver operating characteristic (ROC) curve analysis. Sperm viability and VCL showed significant, albeit limited, predictive capacity on field fertility (0.57 and 0.54 Area Under Curve, respectively). The distribution of spermatozoa in the different subpopulations was not related to fertility. Copyright © 2015 Elsevier B.V. All rights reserved.
graphkernels: R and Python packages for graph comparison
Ghisu, M Elisabetta; Llinares-López, Felipe; Borgwardt, Karsten
2018-01-01
Abstract Summary Measuring the similarity of graphs is a fundamental step in the analysis of graph-structured data, which is omnipresent in computational biology. Graph kernels have been proposed as a powerful and efficient approach to this problem of graph comparison. Here we provide graphkernels, the first R and Python graph kernel libraries including baseline kernels such as label histogram based kernels, classic graph kernels such as random walk based kernels, and the state-of-the-art Weisfeiler-Lehman graph kernel. The core of all graph kernels is implemented in C ++ for efficiency. Using the kernel matrices computed by the package, we can easily perform tasks such as classification, regression and clustering on graph-structured samples. Availability and implementation The R and Python packages including source code are available at https://CRAN.R-project.org/package=graphkernels and https://pypi.python.org/pypi/graphkernels. Contact mahito@nii.ac.jp or elisabetta.ghisu@bsse.ethz.ch Supplementary information Supplementary data are available online at Bioinformatics. PMID:29028902
graphkernels: R and Python packages for graph comparison.
Sugiyama, Mahito; Ghisu, M Elisabetta; Llinares-López, Felipe; Borgwardt, Karsten
2018-02-01
Measuring the similarity of graphs is a fundamental step in the analysis of graph-structured data, which is omnipresent in computational biology. Graph kernels have been proposed as a powerful and efficient approach to this problem of graph comparison. Here we provide graphkernels, the first R and Python graph kernel libraries including baseline kernels such as label histogram based kernels, classic graph kernels such as random walk based kernels, and the state-of-the-art Weisfeiler-Lehman graph kernel. The core of all graph kernels is implemented in C ++ for efficiency. Using the kernel matrices computed by the package, we can easily perform tasks such as classification, regression and clustering on graph-structured samples. The R and Python packages including source code are available at https://CRAN.R-project.org/package=graphkernels and https://pypi.python.org/pypi/graphkernels. mahito@nii.ac.jp or elisabetta.ghisu@bsse.ethz.ch. Supplementary data are available online at Bioinformatics. © The Author(s) 2017. Published by Oxford University Press.
Shan, Zhi; Deng, Guoying; Li, Jipeng; Li, Yangyang; Zhang, Yongxing; Zhao, Qinghua
2013-01-01
This study investigates the neck/shoulder pain (NSP) and low back pain (LBP) among current high school students in Shanghai and explores the relationship between these pains and their possible influences, including digital products, physical activity, and psychological status. An anonymous self-assessment was administered to 3,600 students across 30 high schools in Shanghai. This questionnaire examined the prevalence of NSP and LBP and the level of physical activity as well as the use of mobile phones, personal computers (PC) and tablet computers (Tablet). The CES-D (Center for Epidemiological Studies Depression) scale was also included in the survey. The survey data were analyzed using the chi-square test, univariate logistic analyses and a multivariate logistic regression model. Three thousand sixteen valid questionnaires were received including 1,460 (48.41%) from male respondents and 1,556 (51.59%) from female respondents. The high school students in this study showed NSP and LBP rates of 40.8% and 33.1%, respectively, and the prevalence of both influenced by the student's grade, use of digital products, and mental status; these factors affected the rates of NSP and LBP to varying degrees. The multivariate logistic regression analysis revealed that Gender, grade, soreness after exercise, PC using habits, tablet use, sitting time after school and academic stress entered the final model of NSP, while the final model of LBP consisted of gender, grade, soreness after exercise, PC using habits, mobile phone use, sitting time after school, academic stress and CES-D score. High school students in Shanghai showed high prevalence of NSP and LBP that were closely related to multiple factors. Appropriate interventions should be implemented to reduce the occurrences of NSP and LBP.
Kernel Regression Estimation of Fiber Orientation Mixtures in Diffusion MRI
Cabeen, Ryan P.; Bastin, Mark E.; Laidlaw, David H.
2016-01-01
We present and evaluate a method for kernel regression estimation of fiber orientations and associated volume fractions for diffusion MR tractography and population-based atlas construction in clinical imaging studies of brain white matter. This is a model-based image processing technique in which representative fiber models are estimated from collections of component fiber models in model-valued image data. This extends prior work in nonparametric image processing and multi-compartment processing to provide computational tools for image interpolation, smoothing, and fusion with fiber orientation mixtures. In contrast to related work on multi-compartment processing, this approach is based on directional measures of divergence and includes data-adaptive extensions for model selection and bilateral filtering. This is useful for reconstructing complex anatomical features in clinical datasets analyzed with the ball-and-sticks model, and our framework’s data-adaptive extensions are potentially useful for general multi-compartment image processing. We experimentally evaluate our approach with both synthetic data from computational phantoms and in vivo clinical data from human subjects. With synthetic data experiments, we evaluate performance based on errors in fiber orientation, volume fraction, compartment count, and tractography-based connectivity. With in vivo data experiments, we first show improved scan-rescan reproducibility and reliability of quantitative fiber bundle metrics, including mean length, volume, streamline count, and mean volume fraction. We then demonstrate the creation of a multi-fiber tractography atlas from a population of 80 human subjects. In comparison to single tensor atlasing, our multi-fiber atlas shows more complete features of known fiber bundles and includes reconstructions of the lateral projections of the corpus callosum and complex fronto-parietal connections of the superior longitudinal fasciculus I, II, and III. PMID:26691524
Schuster, Alexander K; Tesarz, Jonas; Rezapour, Jasmin; Beutel, Manfred E; Bertram, Bernd; Pfeiffer, Norbert
2018-01-01
Visual impairment (VI) is associated with a variety of comorbidities including physical and mental health in industrial countries. Our aim is to examine associations between self-reported impairment and depressive symptoms in the German population. The point prevalence of self-reported VI in Germany was computed using data from the German Health Interview and Examination Survey for adults from 2008 to 2011 ( N = 7.783, 50.5% female, age range 18-79 years). VI was surveyed by two questions, one for seeing faces at a distance of 4 m and one for reading newspapers. Depressive symptoms were evaluated with the Patient Health Questionnaire-9 questionnaire and 2-week prevalence was computed with weighted data. Depressive symptoms were defined by a value of ≥10. Logistic regression analysis was performed to analyze an association between self-reported VI and depressive symptoms. Multivariable analysis including adjustment for age, gender, socioeconomic status, and chronic diseases were carried out with weighted data. The 2-week prevalence of depressive symptoms was 20.8% (95% CI: 16.6-25.7%) for some difficulties in distance vision and 14.4% (95% CI: 7.5-25.9%) for severe difficulties in distance vision, while 17.0% (95% CI: 13.3-21.4%), respectively, 16.7% (95% CI: 10.7-25.1%) for near vision. Analysis revealed that depressive symptoms were associated with self-reported VI for reading, respectively, with low VI for distance vision. Multivariable regression analysis including potential confounders confirmed these findings. Depressive symptoms are a frequent finding in subjects with difficulties in distance and near vision with a prevalence of up to 24%. Depressive comorbidity should therefore be evaluated in subjects reporting VI.
Caballero, Daniel; Antequera, Teresa; Caro, Andrés; Ávila, María Del Mar; G Rodríguez, Pablo; Perez-Palacios, Trinidad
2017-07-01
Magnetic resonance imaging (MRI) combined with computer vision techniques have been proposed as an alternative or complementary technique to determine the quality parameters of food in a non-destructive way. The aim of this work was to analyze the sensory attributes of dry-cured loins using this technique. For that, different MRI acquisition sequences (spin echo, gradient echo and turbo 3D), algorithms for MRI analysis (GLCM, NGLDM, GLRLM and GLCM-NGLDM-GLRLM) and predictive data mining techniques (multiple linear regression and isotonic regression) were tested. The correlation coefficient (R) and mean absolute error (MAE) were used to validate the prediction results. The combination of spin echo, GLCM and isotonic regression produced the most accurate results. In addition, the MRI data from dry-cured loins seems to be more suitable than the data from fresh loins. The application of predictive data mining techniques on computational texture features from the MRI data of loins enables the determination of the sensory traits of dry-cured loins in a non-destructive way. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
Engagement, Persistence, and Gender in Computer Science: Results of a Smartphone ESM Study
Milesi, Carolina; Perez-Felkner, Lara; Brown, Kevin; Schneider, Barbara
2017-01-01
While the underrepresentation of women in the fast-growing STEM field of computer science (CS) has been much studied, no consensus exists on the key factors influencing this widening gender gap. Possible suspects include gender differences in aptitude, interest, and academic environment. Our study contributes to this literature by applying student engagement research to study the experiences of college students studying CS, to assess the degree to which differences in men and women's engagement may help account for gender inequity in the field. Specifically, we use the Experience Sampling Method (ESM) to evaluate in real-time the engagement of college students during varied activities and environments. Over the course of a full week in fall semester and a full week in spring semester, 165 students majoring in CS at two Research I universities were “beeped” several times a day via a smartphone app prompting them to fill out a short questionnaire including open-ended and scaled items. These responses were paired with administrative and over 2 years of transcript data provided by their institutions. We used mean comparisons and logistic regression analysis to compare enrollment and persistence patterns among CS men and women. Results suggest that despite the obstacles associated with women's underrepresentation in computer science, women are more likely to continue taking computer science courses when they felt challenged and skilled in their initial computer science classes. We discuss implications for further research. PMID:28487664
Liang, Yuzhen; Xiong, Ruichang; Sandler, Stanley I; Di Toro, Dominic M
2017-09-05
Polyparameter Linear Free Energy Relationships (pp-LFERs), also called Linear Solvation Energy Relationships (LSERs), are used to predict many environmentally significant properties of chemicals. A method is presented for computing the necessary chemical parameters, the Abraham parameters (AP), used by many pp-LFERs. It employs quantum chemical calculations and uses only the chemical's molecular structure. The method computes the Abraham E parameter using density functional theory computed molecular polarizability and the Clausius-Mossotti equation relating the index refraction to the molecular polarizability, estimates the Abraham V as the COSMO calculated molecular volume, and computes the remaining AP S, A, and B jointly with a multiple linear regression using sixty-five solvent-water partition coefficients computed using the quantum mechanical COSMO-SAC solvation model. These solute parameters, referred to as Quantum Chemically estimated Abraham Parameters (QCAP), are further adjusted by fitting to experimentally based APs using QCAP parameters as the independent variables so that they are compatible with existing Abraham pp-LFERs. QCAP and adjusted QCAP for 1827 neutral chemicals are included. For 24 solvent-water systems including octanol-water, predicted log solvent-water partition coefficients using adjusted QCAP have the smallest root-mean-square errors (RMSEs, 0.314-0.602) compared to predictions made using APs estimated using the molecular fragment based method ABSOLV (0.45-0.716). For munition and munition-like compounds, adjusted QCAP has much lower RMSE (0.860) than does ABSOLV (4.45) which essentially fails for these compounds.
Sando, Roy; Sando, Steven K.; McCarthy, Peter M.; Dutton, DeAnn M.
2016-04-05
The U.S. Geological Survey (USGS), in cooperation with the Montana Department of Natural Resources and Conservation, completed a study to update methods for estimating peak-flow frequencies at ungaged sites in Montana based on peak-flow data at streamflow-gaging stations through water year 2011. The methods allow estimation of peak-flow frequencies (that is, peak-flow magnitudes, in cubic feet per second, associated with annual exceedance probabilities of 66.7, 50, 42.9, 20, 10, 4, 2, 1, 0.5, and 0.2 percent) at ungaged sites. The annual exceedance probabilities correspond to 1.5-, 2-, 2.33-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year recurrence intervals, respectively.Regional regression analysis is a primary focus of Chapter F of this Scientific Investigations Report, and regression equations for estimating peak-flow frequencies at ungaged sites in eight hydrologic regions in Montana are presented. The regression equations are based on analysis of peak-flow frequencies and basin characteristics at 537 streamflow-gaging stations in or near Montana and were developed using generalized least squares regression or weighted least squares regression.All of the data used in calculating basin characteristics that were included as explanatory variables in the regression equations were developed for and are available through the USGS StreamStats application (http://water.usgs.gov/osw/streamstats/) for Montana. StreamStats is a Web-based geographic information system application that was created by the USGS to provide users with access to an assortment of analytical tools that are useful for water-resource planning and management. The primary purpose of the Montana StreamStats application is to provide estimates of basin characteristics and streamflow characteristics for user-selected ungaged sites on Montana streams. The regional regression equations presented in this report chapter can be conveniently solved using the Montana StreamStats application.Selected results from this study were compared with results of previous studies. For most hydrologic regions, the regression equations reported for this study had lower mean standard errors of prediction (in percent) than the previously reported regression equations for Montana. The equations presented for this study are considered to be an improvement on the previously reported equations primarily because this study (1) included 13 more years of peak-flow data; (2) included 35 more streamflow-gaging stations than previous studies; (3) used a detailed geographic information system (GIS)-based definition of the regulation status of streamflow-gaging stations, which allowed better determination of the unregulated peak-flow records that are appropriate for use in the regional regression analysis; (4) included advancements in GIS and remote-sensing technologies, which allowed more convenient calculation of basin characteristics and investigation of many more candidate basin characteristics; and (5) included advancements in computational and analytical methods, which allowed more thorough and consistent data analysis.This report chapter also presents other methods for estimating peak-flow frequencies at ungaged sites. Two methods for estimating peak-flow frequencies at ungaged sites located on the same streams as streamflow-gaging stations are described. Additionally, envelope curves relating maximum recorded annual peak flows to contributing drainage area for each of the eight hydrologic regions in Montana are presented and compared to a national envelope curve. In addition to providing general information on characteristics of large peak flows, the regional envelope curves can be used to assess the reasonableness of peak-flow frequency estimates determined using the regression equations.
Daily values flow comparison and estimates using program HYCOMP, version 1.0
Sanders, Curtis L.
2002-01-01
A method used by the U.S. Geological Survey for quality control in computing daily value flow records is to compare hydrographs of computed flows at a station under review to hydrographs of computed flows at a selected index station. The hydrographs are placed on top of each other (as hydrograph overlays) on a light table, compared, and missing daily flow data estimated. This method, however, is subjective and can produce inconsistent results, because hydrographers can differ when calculating acceptable limits of deviation between observed and estimated flows. Selection of appropriate index stations also is judgemental, giving no consideration to the mathematical correlation between the review station and the index station(s). To address the limitation of the hydrograph overlay method, a set of software programs, written in the SAS macrolanguage, was developed and designated Program HYDCOMP. The program automatically selects statistically comparable index stations by correlation and regression, and performs hydrographic comparisons and estimates of missing data by regressing daily mean flows at the review station against -8 to +8 lagged flows at one or two index stations and day-of-week. Another advantage that HYDCOMP has over the graphical method is that estimated flows, the criteria for determining the quality of the data, and the selection of index stations are determined statistically, and are reproducible from one user to another. HYDCOMP will load the most-correlated index stations into another file containing the ?best index stations,? but will not overwrite stations already in the file. A knowledgeable user should delete unsuitable index stations from this file based on standard error of estimate, hydrologic similarity of candidate index stations to the review station, and knowledge of the individual station characteristics. Also, the user can add index stations not selected by HYDCOMP, if desired. Once the file of best-index stations is created, a user may do hydrographic comparison and data estimates by entering the number of the review station, selecting an index station, and specifying the periods to be used for regression and plotting. For example, the user can restrict the regression to ice-free periods of the year to exclude flows estimated during iced conditions. However, the regression could still be used to estimate flow during iced conditions. HYDCOMP produces the standard error of estimate as a measure of the central scatter of the regression and R-square (coefficient of determination) for evaluating the accuracy of the regression. Output from HYDCOMP includes plots of percent residuals against (1) time within the regression and plot periods, (2) month and day of the year for evaluating seasonal bias in the regression, and (3) the magnitude of flow. For hydrographic comparisons, it plots 2-month segments of hydrographs over the selected plot period showing the observed flows, the regressed flows, the 95 percent confidence limit flows, flow measurements, and regression limits. If the observed flows at the review station remain outside the 95 percent confidence limits for a prolonged period, there may be some error in the flows at the review station or at the index station(s). In addition, daily minimum and maximum temperatures and daily rainfall are shown on the hydrographs, if available, to help indicate whether an apparent change in flow may result from rainfall or from changes in backwater from melting ice or freezing water. HYDCOMP statistically smooths estimated flows from non-missing flows at the edges of the gaps in data into regressed flows at the center of the gaps using the Kalman smoothing algorithm. Missing flows are automatically estimated by HYDCOMP, but the user also can specify that periods of erroneous, but nonmissing flows, be estimated by the program.
Cross-validation pitfalls when selecting and assessing regression and classification models.
Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon
2014-03-29
We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.
Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les
2008-01-01
To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.
Static and moving solid/gas interface modeling in a hybrid rocket engine
NASA Astrophysics Data System (ADS)
Mangeot, Alexandre; William-Louis, Mame; Gillard, Philippe
2018-07-01
A numerical model was developed with CFD-ACE software to study the working condition of an oxygen-nitrogen/polyethylene hybrid rocket combustor. As a first approach, a simplified numerical model is presented. It includes a compressible transient gas phase in which a two-step combustion mechanism is implemented coupled to a radiative model. The solid phase from the fuel grain is a semi-opaque material with its degradation process modeled by an Arrhenius type law. Two versions of the model were tested. The first considers the solid/gas interface with a static grid while the second uses grid deformation during the computation to follow the asymmetrical regression. The numerical results are obtained with two different regression kinetics originating from ThermoGravimetry Analysis and test bench results. In each case, the fuel surface temperature is retrieved within a range of 5% error. However, good results are only found using kinetics from the test bench. The regression rate is found within 0.03 mm s-1 and average combustor pressure and its variation over time have the same intensity than the measurements conducted on the test bench. The simulation that uses grid deformation to follow the regression shows a good stability over a 10 s simulated time simulation.
Prediction of Mass Spectral Response Factors from Predicted Chemometric Data for Druglike Molecules
NASA Astrophysics Data System (ADS)
Cramer, Christopher J.; Johnson, Joshua L.; Kamel, Amin M.
2017-02-01
A method is developed for the prediction of mass spectral ion counts of drug-like molecules using in silico calculated chemometric data. Various chemometric data, including polar and molecular surface areas, aqueous solvation free energies, and gas-phase and aqueous proton affinities were computed, and a statistically significant relationship between measured mass spectral ion counts and the combination of aqueous proton affinity and total molecular surface area was identified. In particular, through multilinear regression of ion counts on predicted chemometric data, we find that log10(MS ion counts) = -4.824 + c 1•PA + c 2•SA, where PA is the aqueous proton affinity of the molecule computed at the SMD(aq)/M06-L/MIDI!//M06-L/MIDI! level of electronic structure theory, SA is the total surface area of the molecule in its conjugate base form, and c 1 and c 2 have values of -3.912 × 10-2 mol kcal-1 and 3.682 × 10-3 Å-2. On a 66-molecule training set, this regression exhibits a multiple R value of 0.791 with p values for the intercept, c 1, and c 2 of 1.4 × 10-3, 4.3 × 10-10, and 2.5 × 10-6, respectively. Application of this regression to an 11-molecule test set provides a good correlation of prediction with experiment ( R = 0.905) albeit with a systematic underestimation of about 0.2 log units. This method may prove useful for semiquantitative analysis of drug metabolites for which MS response factors or authentic standards are not readily available.
Computation of Effect Size for Moderating Effects of Categorical Variables in Multiple Regression
ERIC Educational Resources Information Center
Aguinis, Herman; Pierce, Charles A.
2006-01-01
The computation and reporting of effect size estimates is becoming the norm in many journals in psychology and related disciplines. Despite the increased importance of effect sizes, researchers may not report them or may report inaccurate values because of a lack of appropriate computational tools. For instance, Pierce, Block, and Aguinis (2004)…
ERIC Educational Resources Information Center
Punch, Raymond J.
2012-01-01
The purpose of the quantitative regression study was to explore and to identify relationships between attitudes toward use and perceptions of value of computer-based simulation programs, of college instructors, toward computer based simulation programs. A relationship has been reported between attitudes toward use and perceptions of the value of…
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages.
Choi, Youn-Kyung; Kim, Jinmi; Yamaguchi, Tetsutaro; Maki, Koutaro; Ko, Ching-Chang; Kim, Yong-Il
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5-18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level.
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages
Choi, Youn-Kyung; Kim, Jinmi; Maki, Koutaro; Ko, Ching-Chang
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5–18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level. PMID:27340668
Computer-aided Classification of Mammographic Masses Using Visually Sensitive Image Features
Wang, Yunzhi; Aghaei, Faranak; Zarafshani, Ali; Qiu, Yuchen; Qian, Wei; Zheng, Bin
2017-01-01
Purpose To develop a new computer-aided diagnosis (CAD) scheme that computes visually sensitive image features routinely used by radiologists to develop a machine learning classifier and distinguish between the malignant and benign breast masses detected from digital mammograms. Methods An image dataset including 301 breast masses was retrospectively selected. From each segmented mass region, we computed image features that mimic five categories of visually sensitive features routinely used by radiologists in reading mammograms. We then selected five optimal features in the five feature categories and applied logistic regression models for classification. A new CAD interface was also designed to show lesion segmentation, computed feature values and classification score. Results Areas under ROC curves (AUC) were 0.786±0.026 and 0.758±0.027 when to classify mass regions depicting on two view images, respectively. By fusing classification scores computed from two regions, AUC increased to 0.806±0.025. Conclusion This study demonstrated a new approach to develop CAD scheme based on 5 visually sensitive image features. Combining with a “visual aid” interface, CAD results may be much more easily explainable to the observers and increase their confidence to consider CAD generated classification results than using other conventional CAD approaches, which involve many complicated and visually insensitive texture features. PMID:27911353
Using modified fruit fly optimisation algorithm to perform the function test and case studies
NASA Astrophysics Data System (ADS)
Pan, Wen-Tsao
2013-06-01
Evolutionary computation is a computing mode established by practically simulating natural evolutionary processes based on the concept of Darwinian Theory, and it is a common research method. The main contribution of this paper was to reinforce the function of searching for the optimised solution using the fruit fly optimization algorithm (FOA), in order to avoid the acquisition of local extremum solutions. The evolutionary computation has grown to include the concepts of animal foraging behaviour and group behaviour. This study discussed three common evolutionary computation methods and compared them with the modified fruit fly optimization algorithm (MFOA). It further investigated the ability of the three mathematical functions in computing extreme values, as well as the algorithm execution speed and the forecast ability of the forecasting model built using the optimised general regression neural network (GRNN) parameters. The findings indicated that there was no obvious difference between particle swarm optimization and the MFOA in regards to the ability to compute extreme values; however, they were both better than the artificial fish swarm algorithm and FOA. In addition, the MFOA performed better than the particle swarm optimization in regards to the algorithm execution speed, and the forecast ability of the forecasting model built using the MFOA's GRNN parameters was better than that of the other three forecasting models.
Blind source computer device identification from recorded VoIP calls for forensic investigation.
Jahanirad, Mehdi; Anuar, Nor Badrul; Wahab, Ainuddin Wahid Abdul
2017-03-01
The VoIP services provide fertile ground for criminal activity, thus identifying the transmitting computer devices from recorded VoIP call may help the forensic investigator to reveal useful information. It also proves the authenticity of the call recording submitted to the court as evidence. This paper extended the previous study on the use of recorded VoIP call for blind source computer device identification. Although initial results were promising but theoretical reasoning for this is yet to be found. The study suggested computing entropy of mel-frequency cepstrum coefficients (entropy-MFCC) from near-silent segments as an intrinsic feature set that captures the device response function due to the tolerances in the electronic components of individual computer devices. By applying the supervised learning techniques of naïve Bayesian, linear logistic regression, neural networks and support vector machines to the entropy-MFCC features, state-of-the-art identification accuracy of near 99.9% has been achieved on different sets of computer devices for both call recording and microphone recording scenarios. Furthermore, unsupervised learning techniques, including simple k-means, expectation-maximization and density-based spatial clustering of applications with noise (DBSCAN) provided promising results for call recording dataset by assigning the majority of instances to their correct clusters. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil
2015-01-01
PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub. PMID:26798326
Solid harmonic wavelet scattering for predictions of molecule properties
NASA Astrophysics Data System (ADS)
Eickenberg, Michael; Exarchakis, Georgios; Hirn, Matthew; Mallat, Stéphane; Thiry, Louis
2018-06-01
We present a machine learning algorithm for the prediction of molecule properties inspired by ideas from density functional theory (DFT). Using Gaussian-type orbital functions, we create surrogate electronic densities of the molecule from which we compute invariant "solid harmonic scattering coefficients" that account for different types of interactions at different scales. Multilinear regressions of various physical properties of molecules are computed from these invariant coefficients. Numerical experiments show that these regressions have near state-of-the-art performance, even with relatively few training examples. Predictions over small sets of scattering coefficients can reach a DFT precision while being interpretable.
PREDICTORS OF COMPUTER USE IN COMMUNITY-DWELLING ETHNICALLY DIVERSE OLDER ADULTS
Werner, Julie M.; Carlson, Mike; Jordan-Marsh, Maryalice; Clark, Florence
2011-01-01
Objective In this study we analyzed self-reported computer use, demographic variables, psychosocial variables, and health and well-being variables collected from 460 ethnically diverse, community-dwelling elders in order to investigate the relationship computer use has with demographics, well-being and other key psychosocial variables in older adults. Background Although younger elders with more education, those who employ active coping strategies, or those who are low in anxiety levels are thought to use computers at higher rates than others, previous research has produced mixed or inconclusive results regarding ethnic, gender, and psychological factors, or has concentrated on computer-specific psychological factors only (e.g., computer anxiety). Few such studies have employed large sample sizes or have focused on ethnically diverse populations of community-dwelling elders. Method With a large number of overlapping predictors, zero-order analysis alone is poorly equipped to identify variables that are independently associated with computer use. Accordingly, both zero-order and stepwise logistic regression analyses were conducted to determine the correlates of two types of computer use: email and general computer use. Results Results indicate that younger age, greater level of education, non-Hispanic ethnicity, behaviorally active coping style, general physical health, and role-related emotional health each independently predicted computer usage. Conclusion Study findings highlight differences in computer usage, especially in regard to Hispanic ethnicity and specific health and well-being factors. Application Potential applications of this research include future intervention studies, individualized computer-based activity programming, or customizable software and user interface design for older adults responsive to a variety of personal characteristics and capabilities. PMID:22046718
Predictors of computer use in community-dwelling, ethnically diverse older adults.
Werner, Julie M; Carlson, Mike; Jordan-Marsh, Maryalice; Clark, Florence
2011-10-01
In this study, we analyzed self-reported computer use, demographic variables, psychosocial variables, and health and well-being variables collected from 460 ethnically diverse, community-dwelling elders to investigate the relationship computer use has with demographics, well-being, and other key psychosocial variables in older adults. Although younger elders with more education, those who employ active coping strategies, or those who are low in anxiety levels are thought to use computers at higher rates than do others, previous research has produced mixed or inconclusive results regarding ethnic, gender, and psychological factors or has concentrated on computer-specific psychological factors only (e.g., computer anxiety). Few such studies have employed large sample sizes or have focused on ethnically diverse populations of community-dwelling elders. With a large number of overlapping predictors, zero-order analysis alone is poorly equipped to identify variables that are independently associated with computer use. Accordingly, both zero-order and stepwise logistic regression analyses were conducted to determine the correlates of two types of computer use: e-mail and general computer use. Results indicate that younger age, greater level of education, non-Hispanic ethnicity, behaviorally active coping style, general physical health, and role-related emotional health each independently predicted computer usage. Study findings highlight differences in computer usage, especially in regard to Hispanic ethnicity and specific health and well-being factors. Potential applications of this research include future intervention studies, individualized computer-based activity programming, or customizable software and user interface design for older adults responsive to a variety of personal characteristics and capabilities.
Effect of Contact Damage on the Strength of Ceramic Materials.
1982-10-01
variables that are important to erosion, and a multivariate , linear regression analysis is used to fit the data to the dimensional analysis. The...of Equations 7 and 8 by a multivariable regression analysis (room tem- perature data) Exponent Regression Standard error Computed coefficient of...1980) 593. WEAVER, Proc. Brit. Ceram. Soc. 22 (1973) 125. 39. P. W. BRIDGMAN, "Dimensional Analaysis ", (Yale 18. R. W. RICE, S. W. FREIMAN and P. F
Wrong Signs in Regression Coefficients
NASA Technical Reports Server (NTRS)
McGee, Holly
1999-01-01
When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.
Mean centering, multicollinearity, and moderators in multiple regression: The reconciliation redux.
Iacobucci, Dawn; Schneider, Matthew J; Popovich, Deidre L; Bakamitsos, Georgios A
2017-02-01
In this article, we attempt to clarify our statements regarding the effects of mean centering. In a multiple regression with predictors A, B, and A × B (where A × B serves as an interaction term), mean centering A and B prior to computing the product term can clarify the regression coefficients (which is good) and the overall model fit R 2 will remain undisturbed (which is also good).
Media Use and Health Outcomes in Adolescents: Findings from a Nationally Representative Survey
Casiano, Hygiea; Kinley, D. Jolene; Katz, Laurence Y.; Chartier, Mariette J.; Sareen, Jitender
2012-01-01
Objective: Examine the association between quantity of media use and health outcomes in adolescents. Method: Multiple logistic regression analyses were conducted with the Canadian Community Health Survey 1.1 (youth aged 12–19 (n=9137)) to determine the association between hours of use of television/videos, video games, and computers/Internet, and health outcomes including depression, alcohol dependence, binge drinking, suicidal ideation, help-seeking behaviour, risky sexual activity, and obesity. Results: Obesity was associated with frequent television/video use (Adjusted Odds Ratio (AOR) 1.10). Depression and risky sexual behaviour were less likely in frequent video game users (AOR 0.87 and 0.73). Binge drinking was less likely in frequent users of video games (AOR 0.92) and computers/Internet (AOR 0.90). Alcohol dependence was less likely in frequent computer/Internet users (AOR 0.89). Conclusions: Most health outcomes, except for obesity, were not associated with using media in youth. Further research into the appropriate role of media will help harness its full potential. PMID:23133464
Estimating population diversity with CatchAll
Bunge, John; Woodard, Linda; Böhning, Dankmar; Foster, James A.; Connolly, Sean; Allen, Heather K.
2012-01-01
Motivation: The massive data produced by next-generation sequencing require advanced statistical tools. We address estimating the total diversity or species richness in a population. To date, only relatively simple methods have been implemented in available software. There is a need for software employing modern, computationally intensive statistical analyses including error, goodness-of-fit and robustness assessments. Results: We present CatchAll, a fast, easy-to-use, platform-independent program that computes maximum likelihood estimates for finite-mixture models, weighted linear regression-based analyses and coverage-based non-parametric methods, along with outlier diagnostics. Given sample ‘frequency count’ data, CatchAll computes 12 different diversity estimates and applies a model-selection algorithm. CatchAll also derives discounted diversity estimates to adjust for possibly uncertain low-frequency counts. It is accompanied by an Excel-based graphics program. Availability: Free executable downloads for Linux, Windows and Mac OS, with manual and source code, at www.northeastern.edu/catchall. Contact: jab18@cornell.edu PMID:22333246
Foglia, L.; Hill, Mary C.; Mehl, Steffen W.; Burlando, P.
2009-01-01
We evaluate the utility of three interrelated means of using data to calibrate the fully distributed rainfall‐runoff model TOPKAPI as applied to the Maggia Valley drainage area in Switzerland. The use of error‐based weighting of observation and prior information data, local sensitivity analysis, and single‐objective function nonlinear regression provides quantitative evaluation of sensitivity of the 35 model parameters to the data, identification of data types most important to the calibration, and identification of correlations among parameters that contribute to nonuniqueness. Sensitivity analysis required only 71 model runs, and regression required about 50 model runs. The approach presented appears to be ideal for evaluation of models with long run times or as a preliminary step to more computationally demanding methods. The statistics used include composite scaled sensitivities, parameter correlation coefficients, leverage, Cook's D, and DFBETAS. Tests suggest predictive ability of the calibrated model typical of hydrologic models.
Estimation of Flood Discharges at Selected Recurrence Intervals for Streams in New Hampshire
Olson, Scott A.
2009-01-01
This report provides estimates of flood discharges at selected recurrence intervals for streamgages in and adjacent to New Hampshire and equations for estimating flood discharges at recurrence intervals of 2-, 5-, 10-, 25-, 50-, 100-, and 500-years for ungaged, unregulated, rural streams in New Hampshire. The equations were developed using generalized least-squares regression. Flood-frequency and drainage-basin characteristics from 117 streamgages were used in developing the equations. The drainage-basin characteristics used as explanatory variables in the regression equations include drainage area, mean April precipitation, percentage of wetland area, and main channel slope. The average standard error of prediction for estimating the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence interval flood discharges with these equations are 30.0, 30.8, 32.0, 34.2, 36.0, 38.1, and 43.4 percent, respectively. Flood discharges at selected recurrence intervals for selected streamgages were computed following the guidelines in Bulletin 17B of the U.S. Interagency Advisory Committee on Water Data. To determine the flood-discharge exceedence probabilities at streamgages in New Hampshire, a new generalized skew coefficient map covering the State was developed. The standard error of the data on new map is 0.298. To improve estimates of flood discharges at selected recurrence intervals for 20 streamgages with short-term records (10 to 15 years), record extension using the two-station comparison technique was applied. The two-station comparison method uses data from a streamgage with long-term record to adjust the frequency characteristics at a streamgage with a short-term record. A technique for adjusting a flood-discharge frequency curve computed from a streamgage record with results from the regression equations is described in this report. Also, a technique is described for estimating flood discharge at a selected recurrence interval for an ungaged site upstream or downstream from a streamgage using a drainage-area adjustment. The final regression equations and the flood-discharge frequency data used in this study will be available in StreamStats. StreamStats is a World Wide Web application providing automated regression-equation solutions for user-selected sites on streams.
Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969
Analytical learning and term-rewriting systems
NASA Technical Reports Server (NTRS)
Laird, Philip; Gamble, Evan
1990-01-01
Analytical learning is a set of machine learning techniques for revising the representation of a theory based on a small set of examples of that theory. When the representation of the theory is correct and complete but perhaps inefficient, an important objective of such analysis is to improve the computational efficiency of the representation. Several algorithms with this purpose have been suggested, most of which are closely tied to a first order logical language and are variants of goal regression, such as the familiar explanation based generalization (EBG) procedure. But because predicate calculus is a poor representation for some domains, these learning algorithms are extended to apply to other computational models. It is shown that the goal regression technique applies to a large family of programming languages, all based on a kind of term rewriting system. Included in this family are three language families of importance to artificial intelligence: logic programming, such as Prolog; lambda calculus, such as LISP; and combinatorial based languages, such as FP. A new analytical learning algorithm, AL-2, is exhibited that learns from success but is otherwise quite different from EBG. These results suggest that term rewriting systems are a good framework for analytical learning research in general, and that further research should be directed toward developing new techniques.
Olson, Scott A.; Tasker, Gary D.; Johnston, Craig M.
2003-01-01
Estimates of the magnitude and frequency of streamflow are needed to safely and economically design bridges, culverts, and other structures in or near streams. These estimates also are used for managing floodplains, identifying flood-hazard areas, and establishing flood-insurance rates, but may be required at ungaged sites where no observed flood data are available for streamflow-frequency analysis. This report describes equations for estimating flow-frequency characteristics at ungaged, unregulated streams in Vermont. In the past, regression equations developed to estimate streamflow statistics required users to spend hours manually measuring basin characteristics for the stream site of interest. This report also describes the accompanying customized geographic information system (GIS) tool that automates the measurement of basin characteristics and calculation of corresponding flow statistics. The tool includes software that computes the accuracy of the results and adjustments for expected probability and for streamflow data of a nearby stream-gaging station that is either upstream or downstream and within 50 percent of the drainage area of the site where the flow-frequency characteristics are being estimated. The custom GIS can be linked to the National Flood Frequency program, adding the ability to plot peak-flow-frequency curves and synthetic hydrographs and to compute adjustments for urbanization.
Liu, Yang; Chiaromonte, Francesca; Li, Bing
2017-06-01
In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package "sSDR," publicly available on CRAN, includes all procedures necessary to implement the sOLS approach. © 2016, The International Biometric Society.
Management of health care expenditure by soft computing methodology
NASA Astrophysics Data System (ADS)
Maksimović, Goran; Jović, Srđan; Jovanović, Radomir; Aničić, Obrad
2017-01-01
In this study was managed the health care expenditure by soft computing methodology. The main goal was to predict the gross domestic product (GDP) according to several factors of health care expenditure. Soft computing methodologies were applied since GDP prediction is very complex task. The performances of the proposed predictors were confirmed with the simulation results. According to the results, support vector regression (SVR) has better prediction accuracy compared to other soft computing methodologies. The soft computing methods benefit from the soft computing capabilities of global optimization in order to avoid local minimum issues.
Value of Information Analysis for Time-lapse Seismic Data by Simulation-Regression
NASA Astrophysics Data System (ADS)
Dutta, G.; Mukerji, T.; Eidsvik, J.
2016-12-01
A novel method to estimate the Value of Information (VOI) of time-lapse seismic data in the context of reservoir development is proposed. VOI is a decision analytic metric quantifying the incremental value that would be created by collecting information prior to making a decision under uncertainty. The VOI has to be computed before collecting the information and can be used to justify its collection. Previous work on estimating the VOI of geophysical data has involved explicit approximation of the posterior distribution of reservoir properties given the data and then evaluating the prospect values for that posterior distribution of reservoir properties. Here, we propose to directly estimate the prospect values given the data by building a statistical relationship between them using regression. Various regression techniques such as Partial Least Squares Regression (PLSR), Multivariate Adaptive Regression Splines (MARS) and k-Nearest Neighbors (k-NN) are used to estimate the VOI, and the results compared. For a univariate Gaussian case, the VOI obtained from simulation-regression has been shown to be close to the analytical solution. Estimating VOI by simulation-regression is much less computationally expensive since the posterior distribution of reservoir properties given each possible dataset need not be modeled and the prospect values need not be evaluated for each such posterior distribution of reservoir properties. This method is flexible, since it does not require rigid model specification of posterior but rather fits conditional expectations non-parametrically from samples of values and data.
Computer Simulation of Human Service Program Evaluations.
ERIC Educational Resources Information Center
Trochim, William M. K.; Davis, James E.
1985-01-01
Describes uses of computer simulations for the context of human service program evaluation. Presents simple mathematical models for most commonly used human service outcome evaluation designs (pretest-posttest randomized experiment, pretest-posttest nonequivalent groups design, and regression-discontinuity design). Translates models into single…
Hakala, Paula T; Saarni, Lea A; Ketola, Ritva L; Rahkola, Erja T; Salminen, Jouko J; Rimpelä, Arja H
2010-01-11
The use of computers has increased among adolescents, as have musculoskeletal symptoms. There is evidence that these symptoms can be reduced through an ergonomics approach and through education. The purpose of this study was to examine where adolescents had received ergonomic instructions related to computer use, and whether receiving these instructions was associated with a reduced prevalence of computer-associated health complaints. Mailed survey with nationally representative sample of 12 to 18-year-old Finns in 2001 (n = 7292, response rate 70%). In total, 6961 youths reported using a computer. We tested the associations of computer use time and received ergonomic instructions (predictor variables) with computer-associated health complaints (outcome variables) using logistic regression analysis. To prevent computer-associated complaints, 61.2% reported having been instructed to arrange their desk/chair/screen in the right position, 71.5% to take rest breaks. The older age group (16-18 years) reported receiving instructions or being self-instructed more often than the 12- to 14-year-olds (p < 0.001). Among both age groups the sources of instructions included school (33.1%), family (28.6%), self (self-instructed) (12.5%), ICT-related (8.6%), friends (1.5%) and health professionals (0.8%). Receiving instructions was not related to lower prevalence of computer-associated health complaints. This report shows that ergonomic instructions on how to prevent computer-related musculoskeletal problems fail to reach a substantial number of children. Furthermore, the reported sources of instructions vary greatly in terms of reliability.
2010-01-01
Background The use of computers has increased among adolescents, as have musculoskeletal symptoms. There is evidence that these symptoms can be reduced through an ergonomics approach and through education. The purpose of this study was to examine where adolescents had received ergonomic instructions related to computer use, and whether receiving these instructions was associated with a reduced prevalence of computer-associated health complaints. Methods Mailed survey with nationally representative sample of 12 to 18-year-old Finns in 2001 (n = 7292, response rate 70%). In total, 6961 youths reported using a computer. We tested the associations of computer use time and received ergonomic instructions (predictor variables) with computer-associated health complaints (outcome variables) using logistic regression analysis. Results To prevent computer-associated complaints, 61.2% reported having been instructed to arrange their desk/chair/screen in the right position, 71.5% to take rest breaks. The older age group (16-18 years) reported receiving instructions or being self-instructed more often than the 12- to 14-year-olds (p < 0.001). Among both age groups the sources of instructions included school (33.1%), family (28.6%), self (self-instructed) (12.5%), ICT-related (8.6%), friends (1.5%) and health professionals (0.8%). Receiving instructions was not related to lower prevalence of computer-associated health complaints. Conclusions This report shows that ergonomic instructions on how to prevent computer-related musculoskeletal problems fail to reach a substantial number of children. Furthermore, the reported sources of instructions vary greatly in terms of reliability. PMID:20064250
ERIC Educational Resources Information Center
Lee, Young-Jin
2015-01-01
This study investigates whether information saved in the log files of a computer-based tutor can be used to predict the problem solving performance of students. The log files of a computer-based physics tutoring environment called Andes Physics Tutor was analyzed to build a logistic regression model that predicted success and failure of students'…
Home Computer Use and the Development of Human Capital. NBER Working Paper No. 15814
ERIC Educational Resources Information Center
Malamud, Ofer; Pop-Eleches, Cristian
2010-01-01
This paper uses a regression discontinuity design to estimate the effect of home computers on child and adolescent outcomes. We collected survey data from households who participated in a unique government program in Romania which allocated vouchers for the purchase of a home computer to low-income children based on a simple ranking of family…
Van Epps, J Scott; Chew, Douglas W; Vorp, David A
2009-10-01
Certain arteries (e.g., coronary, femoral, etc.) are exposed to cyclic flexure due to their tethering to surrounding tissue beds. It is believed that such stimuli result in a spatially variable biomechanical stress distribution, which has been implicated as a key modulator of remodeling associated with atherosclerotic lesion localization. In this study we utilized a combined ex vivo experimental/computational methodology to address the hypothesis that local variations in shear and mural stress associated with cyclic flexure influence the distribution of early markers of atherogenesis. Bilateral porcine femoral arteries were surgically harvested and perfused ex vivo under pulsatile arterial conditions. One of the paired vessels was exposed to cyclic flexure (0-0.7 cm(-1)) at 1 Hz for 12 h. During the last hour, the perfusate was supplemented with Evan's blue dye-labeled albumin. A custom tissue processing protocol was used to determine the spatial distribution of endothelial permeability, apoptosis, and proliferation. Finite element and computational fluid dynamics techniques were used to determine the mural and shear stress distributions, respectively, for each perfused segment. Biological data obtained experimentally and mechanical stress data estimated computationally were combined in an experiment-specific manner using multiple linear regression analyses. Arterial segments exposed to cyclic flexure had significant increases in intimal and medial apoptosis (3.42+/-1.02 fold, p=0.029) with concomitant increases in permeability (1.14+/-0.04 fold, p=0.026). Regression analyses revealed specific mural stress measures including circumferential stress at systole, and longitudinal pulse stress were quantitatively correlated with the distribution of permeability and apoptosis. The results demonstrated that local variation in mechanical stress in arterial segments subjected to cyclic flexure indeed influence the extent and spatial distribution of the early atherogenic markers. In addition, the importance of including mural stresses in the investigation of vascular mechanopathobiology was highlighted. Specific example results were used to describe a potential mechanism by which systemic risk factors can lead to a heterogeneous disease.
Logistic regression for circular data
NASA Astrophysics Data System (ADS)
Al-Daffaie, Kadhem; Khan, Shahjahan
2017-05-01
This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.
Magnitude and frequency of floods in small drainage basins in Idaho
Thomas, C.A.; Harenberg, W.A.; Anderson, J.M.
1973-01-01
A method is presented in this report for determining magnitude and frequency of floods on streams with drainage areas between 0.5 and 200 square miles. The method relates basin characteristics, including drainage area, percentage of forest cover, percentage of water area, latitude, and longitude, with peak flow characteristics. Regression equations for each of eight regions are presented for determination of QIQ/ the peak discharge, which, on the average, will be exceeded once in 10 years. Peak flows, Q25 and Q 50 , can then be estimated from Q25/Q10 and Q-50/Q-10 ratios developed for each region. Nomographs are included which solve the equations for basins between 1 and 50 square miles. The regional regression equations were developed using multiple regression techniques. Annual peaks for 303 sites were analyzed in the study. These included all records on unregulated streams with drainage areas less than about 500 square miles with 10 years or more of record or which could readily be extended to 10 years on the basis of nearby streams. The log-Pearson Type III method as modified and a digital computer were employed to estimate magnitude and frequency of floods for each of the 303 gaged sites. A large number of physical and climatic basin characteristics were determined for each of the gaged sites. The multiple regression method was then applied to determine the equations relating the floodflows and the most significant basin characteristics. For convenience of the users, several equations were simplified and some complex characteristics were deleted at the sacrifice of some increase in the standard error. Standard errors of estimate and many other statistical data were computed in the analysis process and are available in the Boise district office files. The analysis showed that QIQ was the best defined and most practical index flood for determination of the Q25 and 0,50 flood estimates.Regression equations are not developed because of poor definition for areas which total about 20,000 square miles, most of which are in southern Idaho. These areas are described in the report to prevent use of regression equations where they do not apply. They include urbanized areas, streams affected by regulation or diversion by works of man, unforested areas, streams with gaining or losing reaches, streams draining alluvial valleys and the Snake Plain, intense thunderstorm areas, and scattered areas where records indicate recurring floods which depart from the regional equations. Maximum flows of record and basin locations are summarized in tables and maps. The analysis indicates deficiencies in data exist. To improve knowledge regarding flood characteristics in poorly defined areas, the following data-collection programs are recommended. Gages should be operated on a few selected small streams for an extended period to define floods at long recurrence intervals. Crest-stage gages should be operated in representative basins in urbanized areas, newly developed irrigated areas and grasslands, and in unforested areas. Unusual floods should continue to be measured at miscellaneous sites on regulated streams and in intense thunderstorm-prone areas. The relationship between channel geometry and floodflow characteristics should be investigated as an alternative or supplement to operation of gaging stations. Documentation of historic flood data from newspapers and other sources would improve the basic flood-data base.
Kohn, Michael S.; Stevens, Michael R.; Harden, Tessa M.; Godaire, Jeanne E.; Klinger, Ralph E.; Mommandi, Amanullah
2016-09-09
The U.S. Geological Survey (USGS), in cooperation with the Colorado Department of Transportation, developed regional-regression equations for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, 0.2-percent annual exceedance-probability discharge (AEPD) for natural streamflow in eastern Colorado. A total of 188 streamgages, consisting of 6,536 years of record and a mean of approximately 35 years of record per streamgage, were used to develop the peak-streamflow regional-regression equations. The estimated AEPDs for each streamgage were computed using the USGS software program PeakFQ. The AEPDs were determined using systematic data through water year 2013. Based on previous studies conducted in Colorado and neighboring States and on the availability of data, 72 characteristics (57 basin and 15 climatic characteristics) were evaluated as candidate explanatory variables in the regression analysis. Paleoflood and non-exceedance bound ages were established based on reconnaissance-level methods. Multiple lines of evidence were used at each streamgage to arrive at a conclusion (age estimate) to add a higher degree of certainty to reconnaissance-level estimates. Paleoflood or nonexceedance bound evidence was documented at 41 streamgages, and 3 streamgages had previously collected paleoflood data.To determine the peak discharge of a paleoflood or non-exceedanc bound, two different hydraulic models were used.The mean standard error of prediction (SEP) for all 8 AEPDs was reduced approximately 25 percent compared to the previous flood-frequency study. For paleoflood data to be effective in reducing the SEP in eastern Colorado, a larger ratio than 44 of 188 (23 percent) streamgages would need paleoflood data and that paleoflood data would need to increase the record length by more than 25 years for the 1-percent AEPD. The greatest reduction in SEP for the peak-streamflow regional-regression equations was observed when additional new basin characteristics were included in the peak-streamflow regional-regression equations and when eastern Colorado was divided into two separate hydrologic regions. To make further reductions in the uncertainties of the peak-streamflow regional-regression equations in the Foothills and Plains hydrologic regions, additional streamgages or crest-stage gages are needed to collect peak-streamflow data on natural streams in eastern Colorado.Generalized-Least Squares regression was used to compute the final peak-streamflow regional-regression equations for peak-streamflow. Dividing eastern Colorado into two new individual regions at –104° longitude resulted in peak-streamflow regional-regression equations with the smallest SEP. The new hydrologic region located between –104° longitude and the Kansas-Nebraska State line will be designated the Plains hydrologic region and the hydrologic region comprising the rest of eastern Colorado located west of the –104° longitude and east of the Rocky Mountains and below 7,500 feet in the South Platte River Basin and below 9,000 feet in the Arkansas River Basin will be designated the Foothills hydrologic region.
Extrinsic local regression on manifold-valued data
Lin, Lizhen; St Thomas, Brian; Zhu, Hongtu; Dunson, David B.
2017-01-01
We propose an extrinsic regression framework for modeling data with manifold valued responses and Euclidean predictors. Regression with manifold responses has wide applications in shape analysis, neuroscience, medical imaging and many other areas. Our approach embeds the manifold where the responses lie onto a higher dimensional Euclidean space, obtains a local regression estimate in that space, and then projects this estimate back onto the image of the manifold. Outside the regression setting both intrinsic and extrinsic approaches have been proposed for modeling i.i.d manifold-valued data. However, to our knowledge our work is the first to take an extrinsic approach to the regression problem. The proposed extrinsic regression framework is general, computationally efficient and theoretically appealing. Asymptotic distributions and convergence rates of the extrinsic regression estimates are derived and a large class of examples are considered indicating the wide applicability of our approach. PMID:29225385
Short-term outcome of 1,465 computer-navigated primary total knee replacements 2005–2008
2011-01-01
Background and purpose Improvement of positioning and alignment by the use of computer-assisted surgery (CAS) might improve longevity and function in total knee replacements, but there is little evidence. In this study, we evaluated the short-term results of computer-navigated knee replacements based on data from the Norwegian Arthroplasty Register. Patients and methods Primary total knee replacements without patella resurfacing, reported to the Norwegian Arthroplasty Register during the years 2005–2008, were evaluated. The 5 most common implants and the 3 most common navigation systems were selected. Cemented, uncemented, and hybrid knees were included. With the risk of revision for any cause as the primary endpoint and intraoperative complications and operating time as secondary outcomes, 1,465 computer-navigated knee replacements (CAS) and 8,214 conventionally operated knee replacements (CON) were compared. Kaplan-Meier survival analysis and Cox regression analysis with adjustment for age, sex, prosthesis brand, fixation method, previous knee surgery, preoperative diagnosis, and ASA category were used. Results Kaplan-Meier estimated survival at 2 years was 98% (95% CI: 97.5–98.3) in the CON group and 96% (95% CI: 95.0–97.8) in the CAS group. The adjusted Cox regression analysis showed a higher risk of revision in the CAS group (RR = 1.7, 95% CI: 1.1–2.5; p = 0.02). The LCS Complete knee had a higher risk of revision with CAS than with CON (RR = 2.1, 95% CI: 1.3–3.4; p = 0.004)). The differences were not statistically significant for the other prosthesis brands. Mean operating time was 15 min longer in the CAS group. Interpretation With the introduction of computer-navigated knee replacement surgery in Norway, the short-term risk of revision has increased for computer-navigated replacement with the LCS Complete. The mechanisms of failure of these implantations should be explored in greater depth, and in this study we have not been able to draw conclusions regarding causation. PMID:21504309
Computation of major solute concentrations and loads in German rivers using regression analysis.
Steele, T.D.
1980-01-01
Regression functions between concentrations of several inorganic solutes and specific conductance and between specific conductance and stream discharge were derived from intermittent samples collected for 2 rivers in West Germany. These functions, in conjunction with daily records of streamflow, were used to determine monthly and annual solute loadings. -from Author
Bayesian Asymmetric Regression as a Means to Estimate and Evaluate Oral Reading Fluency Slopes
ERIC Educational Resources Information Center
Solomon, Benjamin G.; Forsberg, Ole J.
2017-01-01
Bayesian techniques have become increasingly present in the social sciences, fueled by advances in computer speed and the development of user-friendly software. In this paper, we forward the use of Bayesian Asymmetric Regression (BAR) to monitor intervention responsiveness when using Curriculum-Based Measurement (CBM) to assess oral reading…
Logarithmic Transformations in Regression: Do You Transform Back Correctly?
ERIC Educational Resources Information Center
Dambolena, Ismael G.; Eriksen, Steven E.; Kopcso, David P.
2009-01-01
The logarithmic transformation is often used in regression analysis for a variety of purposes such as the linearization of a nonlinear relationship between two or more variables. We have noticed that when this transformation is applied to the response variable, the computation of the point estimate of the conditional mean of the original response…
Section 3. The SPARROW Surface Water-Quality Model: Theory, Application and User Documentation
Schwarz, G.E.; Hoos, A.B.; Alexander, R.B.; Smith, R.A.
2006-01-01
SPARROW (SPAtially Referenced Regressions On Watershed attributes) is a watershed modeling technique for relating water-quality measurements made at a network of monitoring stations to attributes of the watersheds containing the stations. The core of the model consists of a nonlinear regression equation describing the non-conservative transport of contaminants from point and diffuse sources on land to rivers and through the stream and river network. The model predicts contaminant flux, concentration, and yield in streams and has been used to evaluate alternative hypotheses about the important contaminant sources and watershed properties that control transport over large spatial scales. This report provides documentation for the SPARROW modeling technique and computer software to guide users in constructing and applying basic SPARROW models. The documentation gives details of the SPARROW software, including the input data and installation requirements, and guidance in the specification, calibration, and application of basic SPARROW models, as well as descriptions of the model output and its interpretation. The documentation is intended for both researchers and water-resource managers with interest in using the results of existing models and developing and applying new SPARROW models. The documentation of the model is presented in two parts. Part 1 provides a theoretical and practical introduction to SPARROW modeling techniques, which includes a discussion of the objectives, conceptual attributes, and model infrastructure of SPARROW. Part 1 also includes background on the commonly used model specifications and the methods for estimating and evaluating parameters, evaluating model fit, and generating water-quality predictions and measures of uncertainty. Part 2 provides a user's guide to SPARROW, which includes a discussion of the software architecture and details of the model input requirements and output files, graphs, and maps. The text documentation and computer software are available on the Web at http://usgs.er.gov/sparrow/sparrow-mod/.
A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary
NASA Astrophysics Data System (ADS)
Gillis, Nicolas; Luce, Robert
2018-01-01
A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.
Borzekowski, Dina L G; Robinson, Thomas N
2005-07-01
Media can influence aspects of a child's physical, social, and cognitive development; however, the associations between a child's household media environment, media use, and academic achievement have yet to be determined. To examine relationships among a child's household media environment, media use, and academic achievement. During a single academic year, data were collected through classroom surveys and telephone interviews from an ethnically diverse sample of third grade students and their parents from 6 northern California public elementary schools. The majority of our analyses derive from spring 2000 data, including academic achievement assessed through the mathematics, reading, and language arts sections of the Stanford Achievement Test. We fit linear regression models to determine the associations between variations in household media and performance on the standardized tests, adjusting for demographic and media use variables. The household media environment is significantly associated with students' performance on the standardized tests. It was found that having a bedroom television set was significantly and negatively associated with students' test scores, while home computer access and use were positively associated with the scores. Regression models significantly predicted up to 24% of the variation in the scores. Absence of a bedroom television combined with access to a home computer was consistently associated with the highest standardized test scores. This study adds to the growing literature reporting that having a bedroom television set may be detrimental to young elementary school children. It also suggests that having and using a home computer may be associated with better academic achievement.
Glassman, E Katelyn; Hughes, Michelle L
2013-01-01
Current cochlear implants (CIs) have telemetry capabilities for measuring the electrically evoked compound action potential (ECAP). Neural Response Telemetry (Cochlear) and Neural Response Imaging (Advanced Bionics [AB]) can measure ECAP responses across a range of stimulus levels to obtain an amplitude growth function. Software-specific algorithms automatically mark the leading negative peak, N1, and the following positive peak/plateau, P2, and apply linear regression to estimate ECAP threshold. Alternatively, clinicians may apply expert judgments to modify the peak markers placed by the software algorithms, or use visual detection to identify the lowest level yielding a measurable ECAP response. The goals of this study were to: (1) assess the variability between human and computer decisions for (a) marking N1 and P2 and (b) determining linear-regression threshold (LRT) and visual-detection threshold (VDT); and (2) compare LRT and VDT methods within and across human- and computer-decision methods. ECAP amplitude-growth functions were measured for three electrodes in each of 20 ears (10 Cochlear Nucleus® 24RE/CI512, and 10 AB CII/90K). LRT, defined as the current level yielding an ECAP with zero amplitude, was calculated for both computer- (C-LRT) and human-picked peaks (H-LRT). VDT, defined as the lowest level resulting in a measurable ECAP response, was also calculated for both computer- (C-VDT) and human-picked peaks (H-VDT). Because Neural Response Imaging assigns peak markers to all waveforms but does not include waveforms with amplitudes less than 20 μV in its regression calculation, C-VDT for AB subjects was defined as the lowest current level yielding an amplitude of 20 μV or more. Overall, there were significant correlations between human and computer decisions for peak-marker placement, LRT, and VDT for both manufacturers (r = 0.78-1.00, p < 0.001). For Cochlear devices, LRT and VDT correlated equally well for both computer- and human-picked peaks (r = 0.98-0.99, p < 0.001), which likely reflects the well-defined Neural Response Telemetry algorithm and the lower noise floor in the 24RE and CI512 devices. For AB devices, correlations between LRT and VDT for both peak-picker methods were weaker than for Cochlear devices (r = 0.69-0.85, p < 0.001), which likely reflect the higher noise floor of the system. Disagreement between computer and human decisions regarding the presence of an ECAP response occurred for 5 % of traces for Cochlear devices and 2.1 % of traces for AB devices. Results indicate that human and computer peak-picking methods can be used with similar accuracy for both Cochlear and AB devices. Either C-VDT or C-LRT can be used with equal confidence for Cochlear 24RE and CI512 recipients because both methods are strongly correlated with human decisions. However, for AB devices, greater variability exists between different threshold-determination methods. This finding should be considered in the context of using ECAP measures to assist with programming CIs.
NASA Astrophysics Data System (ADS)
Wang, Lunche; Kisi, Ozgur; Zounemat-Kermani, Mohammad; Li, Hui
2017-01-01
Pan evaporation (Ep) plays important roles in agricultural water resources management. One of the basic challenges is modeling Ep using limited climatic parameters because there are a number of factors affecting the evaporation rate. This study investigated the abilities of six different soft computing methods, multi-layer perceptron (MLP), generalized regression neural network (GRNN), fuzzy genetic (FG), least square support vector machine (LSSVM), multivariate adaptive regression spline (MARS), adaptive neuro-fuzzy inference systems with grid partition (ANFIS-GP), and two regression methods, multiple linear regression (MLR) and Stephens and Stewart model (SS) in predicting monthly Ep. Long-term climatic data at various sites crossing a wide range of climates during 1961-2000 are used for model development and validation. The results showed that the models have different accuracies in different climates and the MLP model performed superior to the other models in predicting monthly Ep at most stations using local input combinations (for example, the MAE (mean absolute errors), RMSE (root mean square errors), and determination coefficient (R2) are 0.314 mm/day, 0.405 mm/day and 0.988, respectively for HEB station), while GRNN model performed better in Tibetan Plateau (MAE, RMSE and R2 are 0.459 mm/day, 0.592 mm/day and 0.932, respectively). The accuracies of above models ranked as: MLP, GRNN, LSSVM, FG, ANFIS-GP, MARS and MLR. The overall results indicated that the soft computing techniques generally performed better than the regression methods, but MLR and SS models can be more preferred at some climatic zones instead of complex nonlinear models, for example, the BJ (Beijing), CQ (Chongqing) and HK (Haikou) stations. Therefore, it can be concluded that Ep could be successfully predicted using above models in hydrological modeling studies.
Regression-based adaptive sparse polynomial dimensional decomposition for sensitivity analysis
NASA Astrophysics Data System (ADS)
Tang, Kunkun; Congedo, Pietro; Abgrall, Remi
2014-11-01
Polynomial dimensional decomposition (PDD) is employed in this work for global sensitivity analysis and uncertainty quantification of stochastic systems subject to a large number of random input variables. Due to the intimate structure between PDD and Analysis-of-Variance, PDD is able to provide simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to polynomial chaos (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of the standard method unaffordable for real engineering applications. In order to address this problem of curse of dimensionality, this work proposes a variance-based adaptive strategy aiming to build a cheap meta-model by sparse-PDD with PDD coefficients computed by regression. During this adaptive procedure, the model representation by PDD only contains few terms, so that the cost to resolve repeatedly the linear system of the least-square regression problem is negligible. The size of the final sparse-PDD representation is much smaller than the full PDD, since only significant terms are eventually retained. Consequently, a much less number of calls to the deterministic model is required to compute the final PDD coefficients.
NASA Astrophysics Data System (ADS)
Song, Lu-Kai; Wen, Jie; Fei, Cheng-Wei; Bai, Guang-Chen
2018-05-01
To improve the computing efficiency and precision of probabilistic design for multi-failure structure, a distributed collaborative probabilistic design method-based fuzzy neural network of regression (FR) (called as DCFRM) is proposed with the integration of distributed collaborative response surface method and fuzzy neural network regression model. The mathematical model of DCFRM is established and the probabilistic design idea with DCFRM is introduced. The probabilistic analysis of turbine blisk involving multi-failure modes (deformation failure, stress failure and strain failure) was investigated by considering fluid-structure interaction with the proposed method. The distribution characteristics, reliability degree, and sensitivity degree of each failure mode and overall failure mode on turbine blisk are obtained, which provides a useful reference for improving the performance and reliability of aeroengine. Through the comparison of methods shows that the DCFRM reshapes the probability of probabilistic analysis for multi-failure structure and improves the computing efficiency while keeping acceptable computational precision. Moreover, the proposed method offers a useful insight for reliability-based design optimization of multi-failure structure and thereby also enriches the theory and method of mechanical reliability design.
Peak-flow characteristics of Virginia streams
Austin, Samuel H.; Krstolic, Jennifer L.; Wiegand, Ute
2011-01-01
Peak-flow annual exceedance probabilities, also called probability-percent chance flow estimates, and regional regression equations are provided describing the peak-flow characteristics of Virginia streams. Statistical methods are used to evaluate peak-flow data. Analysis of Virginia peak-flow data collected from 1895 through 2007 is summarized. Methods are provided for estimating unregulated peak flow of gaged and ungaged streams. Station peak-flow characteristics identified by fitting the logarithms of annual peak flows to a Log Pearson Type III frequency distribution yield annual exceedance probabilities of 0.5, 0.4292, 0.2, 0.1, 0.04, 0.02, 0.01, 0.005, and 0.002 for 476 streamgaging stations. Stream basin characteristics computed using spatial data and a geographic information system are used as explanatory variables in regional regression model equations for six physiographic regions to estimate regional annual exceedance probabilities at gaged and ungaged sites. Weighted peak-flow values that combine annual exceedance probabilities computed from gaging station data and from regional regression equations provide improved peak-flow estimates. Text, figures, and lists are provided summarizing selected peak-flow sites, delineated physiographic regions, peak-flow estimates, basin characteristics, regional regression model equations, error estimates, definitions, data sources, and candidate regression model equations. This study supersedes previous studies of peak flows in Virginia.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brink, Carsten, E-mail: carsten.brink@rsyd.dk; Laboratory of Radiation Physics, Odense University Hospital; Bernchou, Uffe
2014-07-15
Purpose: Large interindividual variations in volume regression of non-small cell lung cancer (NSCLC) are observable on standard cone beam computed tomography (CBCT) during fractionated radiation therapy. Here, a method for automated assessment of tumor volume regression is presented and its potential use in response adapted personalized radiation therapy is evaluated empirically. Methods and Materials: Automated deformable registration with calculation of the Jacobian determinant was applied to serial CBCT scans in a series of 99 patients with NSCLC. Tumor volume at the end of treatment was estimated on the basis of the first one third and two thirds of the scans.more » The concordance between estimated and actual relative volume at the end of radiation therapy was quantified by Pearson's correlation coefficient. On the basis of the estimated relative volume, the patients were stratified into 2 groups having volume regressions below or above the population median value. Kaplan-Meier plots of locoregional disease-free rate and overall survival in the 2 groups were used to evaluate the predictive value of tumor regression during treatment. Cox proportional hazards model was used to adjust for other clinical characteristics. Results: Automatic measurement of the tumor regression from standard CBCT images was feasible. Pearson's correlation coefficient between manual and automatic measurement was 0.86 in a sample of 9 patients. Most patients experienced tumor volume regression, and this could be quantified early into the treatment course. Interestingly, patients with pronounced volume regression had worse locoregional tumor control and overall survival. This was significant on patient with non-adenocarcinoma histology. Conclusions: Evaluation of routinely acquired CBCT images during radiation therapy provides biological information on the specific tumor. This could potentially form the basis for personalized response adaptive therapy.« less
Ashrafi, Mahnaz; Bahmanabadi, Akram; Akhond, Mohammad Reza; Arabipoor, Arezoo
2015-11-01
To evaluate demographic, medical history and clinical cycle characteristics of infertile non-polycystic ovary syndrome (NPCOS) women with the purpose of investigating their associations with the prevalence of moderate-to-severe OHSS. In this retrospective study, among 7073 in vitro fertilization and/or intracytoplasmic sperm injection (IVF/ICSI) cycles, 86 cases of NPCO patients who developed moderate-to-severe OHSS while being treated with IVF/ICSI cycles were analyzed during the period of January 2008 to December 2010 at Royan Institute. To review the OHSS risk factors, 172 NPCOS patients without developing OHSS, treated at the same period of time, were selected randomly by computer as control group. We used multiple logistic regression in a backward manner to build a prediction model. The regression analysis revealed that the variables, including age [odds ratio (OR) 0.9, confidence interval (CI) 0.81-0.99], antral follicles count (OR 4.3, CI 2.7-6.9), infertility cause (tubal factor, OR 11.5, CI 1.1-51.3), hypothyroidism (OR 3.8, CI 1.5-9.4) and positive history of ovarian surgery (OR 0.2, CI 0.05-0.9) were the most important predictors of OHSS. The regression model had an area under curve of 0.94, presenting an allowable discriminative performance that was equal with two strong predictive variables, including the number of follicles and serum estradiol level on human chorionic gonadotropin day. The predictive regression model based on primary characteristics of NPCOS patients had equal specificity in comparison with two mentioned strong predictive variables. Therefore, it may be beneficial to apply this model before the beginning of ovarian stimulation protocol.
NASA Astrophysics Data System (ADS)
Saputro, Dewi Retno Sari; Widyaningsih, Purnami
2017-08-01
In general, the parameter estimation of GWOLR model uses maximum likelihood method, but it constructs a system of nonlinear equations, making it difficult to find the solution. Therefore, an approximate solution is needed. There are two popular numerical methods: the methods of Newton and Quasi-Newton (QN). Newton's method requires large-scale time in executing the computation program since it contains Jacobian matrix (derivative). QN method overcomes the drawback of Newton's method by substituting derivative computation into a function of direct computation. The QN method uses Hessian matrix approach which contains Davidon-Fletcher-Powell (DFP) formula. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is categorized as the QN method which has the DFP formula attribute of having positive definite Hessian matrix. The BFGS method requires large memory in executing the program so another algorithm to decrease memory usage is needed, namely Low Memory BFGS (LBFGS). The purpose of this research is to compute the efficiency of the LBFGS method in the iterative and recursive computation of Hessian matrix and its inverse for the GWOLR parameter estimation. In reference to the research findings, we found out that the BFGS and LBFGS methods have arithmetic operation schemes, including O(n2) and O(nm).
Lee, Michael T.; Asquith, William H.; Oden, Timothy D.
2012-01-01
In December 2005, the U.S. Geological Survey (USGS), in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (Escherichia coli and total coliform), atrazine, and suspended sediment at two USGS streamflow-gaging stations that represent watersheds contributing to Lake Houston (08068500 Spring Creek near Spring, Tex., and 08070200 East Fork San Jacinto River near New Caney, Tex.). Data from the discrete water-quality samples collected during 2005–9, in conjunction with continuously monitored real-time data that included streamflow and other physical water-quality properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), were used to develop regression models for the estimation of concentrations of water-quality constituents of substantial source watersheds to Lake Houston. The potential explanatory variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, and time (to account for seasonal variations inherent in some water-quality data). The response variables (the selected constituents) at each site were nitrite plus nitrate nitrogen, total phosphorus, total organic carbon, E. coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities to serve as potential surrogate variables to estimate concentrations of the selected constituents through statistical regression. Statistical regression also facilitates accompanying estimates of uncertainty in the form of prediction intervals. Each regression model potentially can be used to estimate concentrations of a given constituent in real time. Among other regression diagnostics, the diagnostics used as indicators of general model reliability and reported herein include the adjusted R-squared, the residual standard error, residual plots, and p-values. Adjusted R-squared values for the Spring Creek models ranged from .582–.922 (dimensionless). The residual standard errors ranged from .073–.447 (base-10 logarithm). Adjusted R-squared values for the East Fork San Jacinto River models ranged from .253–.853 (dimensionless). The residual standard errors ranged from .076–.388 (base-10 logarithm). In conjunction with estimated concentrations, constituent loads can be estimated by multiplying the estimated concentration by the corresponding streamflow and by applying the appropriate conversion factor. The regression models presented in this report are site specific, that is, they are specific to the Spring Creek and East Fork San Jacinto River streamflow-gaging stations; however, the general methods that were developed and documented could be applied to most perennial streams for the purpose of estimating real-time water quality data.
Data-driven discovery of partial differential equations.
Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan
2017-04-01
We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.
Garmy, Pernilla; Clausson, Eva K; Nyberg, Per; Jakobsson, Ulf
2014-06-01
The aim of this cross-sectional study was to investigate the prevalence of overweight and obesity in children and adolescents (6-16 years), and relationships between being overweight and sleep, experiencing of fatigue, enjoyment of school, and time spent in watching television and in sitting at the computer. Trained school nurses measured the weight and height of 2891 children aged 6, 7, 10, 14, and 16, and distributed a questionnaire to them regarding television and computer habits, sleep, and enjoyment of school. Overweight, obesity included, was present in 16.1% of the study population. Relationships between lifestyle factors and overweight were studied using multivariate logistic regression analysis. Having a bedroom television and spending more than 2 h a day watching television were found to be associated with overweight (OR 1.26 and 1.55 respectively). No association was found between overweight and time spent at the computer, short sleep duration, enjoyment of school, tiredness at school, or difficulties in sleeping and waking up. It is recommended that the school health service discuss with pupils their media habits so as to promote their maintaining a healthy lifestyle. © 2013 Wiley Publishing Asia Pty Ltd.
Mapping Bone Mineral Density Obtained by Quantitative Computed Tomography to Bone Volume Fraction
NASA Technical Reports Server (NTRS)
Pennline, James A.; Mulugeta, Lealem
2017-01-01
Methods for relating or mapping estimates of volumetric Bone Mineral Density (vBMD) obtained by Quantitative Computed Tomography to Bone Volume Fraction (BVF) are outlined mathematically. The methods are based on definitions of bone properties, cited experimental studies and regression relations derived from them for trabecular bone in the proximal femur. Using an experimental range of values in the intertrochanteric region obtained from male and female human subjects, age 18 to 49, the BVF values calculated from four different methods were compared to the experimental average and numerical range. The BVF values computed from the conversion method used data from two sources. One source provided pre bed rest vBMD values in the intertrochanteric region from 24 bed rest subject who participated in a 70 day study. Another source contained preflight vBMD values from 18 astronauts who spent 4 to 6 months on the ISS. To aid the use of a mapping from BMD to BVF, the discussion includes how to formulate them for purpose of computational modeling. An application of the conversions would be used to aid in modeling of time varying changes in vBMD as it relates to changes in BVF via bone remodeling and/or modeling.
Garmy, Pernilla; Clausson, Eva K; Nyberg, Per; Jakobsson, Ulf
2014-01-01
The aim of this cross-sectional study was to investigate the prevalence of overweight and obesity in children and adolescents (6–16 years), and relationships between being overweight and sleep, experiencing of fatigue, enjoyment of school, and time spent in watching television and in sitting at the computer. Trained school nurses measured the weight and height of 2891 children aged 6, 7, 10, 14, and 16, and distributed a questionnaire to them regarding television and computer habits, sleep, and enjoyment of school. Overweight, obesity included, was present in 16.1% of the study population. Relationships between lifestyle factors and overweight were studied using multivariate logistic regression analysis. Having a bedroom television and spending more than 2 h a day watching television were found to be associated with overweight (OR 1.26 and 1.55 respectively). No association was found between overweight and time spent at the computer, short sleep duration, enjoyment of school, tiredness at school, or difficulties in sleeping and waking up. It is recommended that the school health service discuss with pupils their media habits so as to promote their maintaining a healthy lifestyle. PMID:23796145
USDA-ARS?s Scientific Manuscript database
Data on individual daily feed intake, bi-weekly BW, and carcass composition were obtained on 1,212 crossbred steers, in Cycle VII of the Germplasm Evaluation Project at the U.S. Meat Animal Research Center. Within animal regressions of cumulative feed intake and BW on linear and quadratic days on fe...
Liley, Helen; Zhang, Ju; Firth, Elwyn; Fernandez, Justin; Besier, Thor
2017-11-01
Population variance in bone shape is an important consideration when applying the results of subject-specific computational models to a population. In this letter, we demonstrate the ability of partial least squares regression to provide an improved shape prediction of the equine third metacarpal epiphysis, using two easily obtained measurements.
ERIC Educational Resources Information Center
Beauducel, Andre
2007-01-01
It was investigated whether commonly used factor score estimates lead to the same reproduced covariance matrix of observed variables. This was achieved by means of Schonemann and Steiger's (1976) regression component analysis, since it is possible to compute the reproduced covariance matrices of the regression components corresponding to different…
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Generating patient specific pseudo-CT of the head from MR using atlas-based regression
NASA Astrophysics Data System (ADS)
Sjölund, J.; Forsberg, D.; Andersson, M.; Knutsson, H.
2015-01-01
Radiotherapy planning and attenuation correction of PET images require simulation of radiation transport. The necessary physical properties are typically derived from computed tomography (CT) images, but in some cases, including stereotactic neurosurgery and combined PET/MR imaging, only magnetic resonance (MR) images are available. With these applications in mind, we describe how a realistic, patient-specific, pseudo-CT of the head can be derived from anatomical MR images. We refer to the method as atlas-based regression, because of its similarity to atlas-based segmentation. Given a target MR and an atlas database comprising MR and CT pairs, atlas-based regression works by registering each atlas MR to the target MR, applying the resulting displacement fields to the corresponding atlas CTs and, finally, fusing the deformed atlas CTs into a single pseudo-CT. We use a deformable registration algorithm known as the Morphon and augment it with a certainty mask that allows a tailoring of the influence certain regions are allowed to have on the registration. Moreover, we propose a novel method of fusion, wherein the collection of deformed CTs is iteratively registered to their joint mean and find that the resulting mean CT becomes more similar to the target CT. However, the voxelwise median provided even better results; at least as good as earlier work that required special MR imaging techniques. This makes atlas-based regression a good candidate for clinical use.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, I-Ming; Chen, Po-Lin; Huang, Chun-Yang
PurposeThe purpose of this study was to determine factors associated with entire aortic remodeling after thoracic endovascular aortic repair (TEVAR) in patients with type B dissection.Materials and MethodsThe patients with type B (IIIb) dissections who underwent TEVAR from 2006 to 2013 with minimum of 2 years of follow-up computed tomography data were retrospectively reviewed. Based on the status of false lumen remodeling of entire aorta, patients were divided into three groups: complete regression, total thrombosis, and inadequate regression with patent abdominal false lumen.ResultsA total of 90 patients (72 males, 18 females; mean age 56.6 ± 16.4 years) were included and divided into the completemore » regression (n = 22), total thrombosis (n = 18), and inadequate regression (n = 50) groups. Multivariate logistic regression analysis indicated that dissection extension to iliac arteries, increased preoperative number of dissection tear over abdominal aorta, and decreased preoperative abdominal aorta bifurcation true lumen ratio, as compared between the inadequate and complete regression groups, were associated with a persistent false lumen (odds ratio = 33.33, 2.304, and 0.021; all, p ≤ 0.012). Comparison of 6, 12, and 24 months postoperative data revealed no significant differences at any level, suggesting that the true lumen area ratio might not change after 6 months postoperatively.ConclusionsIncreased preoperative numbers of dissection tear around the abdominal visceral branches, dissection extension to the iliac arteries, and decreased preoperative true lumen area ratio of abdominal aorta are predictive of entire aortic remodeling after TEVAR in patients with type B dissection.Level of EvidenceIII.« less
Pareto fronts for multiobjective optimization design on materials data
NASA Astrophysics Data System (ADS)
Gopakumar, Abhijith; Balachandran, Prasanna; Gubernatis, James E.; Lookman, Turab
Optimizing multiple properties simultaneously is vital in materials design. Here we apply infor- mation driven, statistical optimization strategies blended with machine learning methods, to address multi-objective optimization tasks on materials data. These strategies aim to find the Pareto front consisting of non-dominated data points from a set of candidate compounds with known character- istics. The objective is to find the pareto front in as few additional measurements or calculations as possible. We show how exploration of the data space to find the front is achieved by using uncer- tainties in predictions from regression models. We test our proposed design strategies on multiple, independent data sets including those from computations as well as experiments. These include data sets for Max phases, piezoelectrics and multicomponent alloys.
A Computer Program for Preliminary Data Analysis
Dennis L. Schweitzer
1967-01-01
ABSTRACT. -- A computer program written in FORTRAN has been designed to summarize data. Class frequencies, means, and standard deviations are printed for as many as 100 independent variables. Cross-classifications of an observed dependent variable and of a dependent variable predicted by a multiple regression equation can also be generated.
An Adaptive Cross-Architecture Combination Method for Graph Traversal
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Song, Shuaiwen; Kerbyson, Darren J.
2014-06-18
Breadth-First Search (BFS) is widely used in many real-world applications including computational biology, social networks, and electronic design automation. The combination method, using both top-down and bottom-up techniques, is the most effective BFS approach. However, current combination methods rely on trial-and-error and exhaustive search to locate the optimal switching point, which may cause significant runtime overhead. To solve this problem, we design an adaptive method based on regression analysis to predict an optimal switching point for the combination method at runtime within less than 0.1% of the BFS execution time.
Factors influencing exemplary science teachers' levels of computer use
NASA Astrophysics Data System (ADS)
Hakverdi, Meral
This study examines exemplary science teachers' use of technology in science instruction, factors influencing their level of computer use, their level of knowledge/skills in using specific computer applications for science instruction, their use of computer-related applications/tools during their instruction, and their students' use of computer applications/tools in or for their science class. After a relevant review of the literature certain variables were selected for analysis. These variables included personal self-efficacy in teaching with computers, outcome expectancy, pupil-control ideology, level of computer use, age, gender, teaching experience, personal computer use, professional computer use and science teachers' level of knowledge/skills in using specific computer applications for science instruction. The sample for this study includes middle and high school science teachers who received the Presidential Award for Excellence in Science Teaching Award (sponsored by the White House and the National Science Foundation) between the years 1997 and 2003 from all 50 states and U.S. territories. Award-winning science teachers were contacted about the survey via e-mail or letter with an enclosed return envelope. Of the 334 award-winning science teachers, usable responses were received from 92 science teachers, which made a response rate of 27.5%. Analysis of the survey responses indicated that exemplary science teachers have a variety of knowledge/skills in using computer related applications/tools. The most commonly used computer applications/tools are information retrieval via the Internet, presentation tools, online communication, digital cameras, and data collection probes. Results of the study revealed that students' use of technology in their science classroom is highly correlated with the frequency of their science teachers' use of computer applications/tools. The results of the multiple regression analysis revealed that personal self-efficacy related to the exemplary science teachers' level of computer use suggesting that computer use is dependent on perceived abilities at using computers. The teachers' use of computer-related applications/tools during class, and their personal self-efficacy, age, and gender are highly related with their level of knowledge/skills in using specific computer applications for science instruction. The teachers' level of knowledge/skills in using specific computer applications for science instruction and gender related to their use of computer-related applications/tools during class and the students' use of computer-related applications/tools in or for their science class. In conclusion, exemplary science teachers need assistance in learning and using computer-related applications/tool in their science class.
Soft computing techniques toward modeling the water supplies of Cyprus.
Iliadis, L; Maris, F; Tachos, S
2011-10-01
This research effort aims in the application of soft computing techniques toward water resources management. More specifically, the target is the development of reliable soft computing models capable of estimating the water supply for the case of "Germasogeia" mountainous watersheds in Cyprus. Initially, ε-Regression Support Vector Machines (ε-RSVM) and fuzzy weighted ε-RSVMR models have been developed that accept five input parameters. At the same time, reliable artificial neural networks have been developed to perform the same job. The 5-fold cross validation approach has been employed in order to eliminate bad local behaviors and to produce a more representative training data set. Thus, the fuzzy weighted Support Vector Regression (SVR) combined with the fuzzy partition has been employed in an effort to enhance the quality of the results. Several rational and reliable models have been produced that can enhance the efficiency of water policy designers. Copyright © 2011 Elsevier Ltd. All rights reserved.
Mosbrucker, Adam; Spicer, Kurt R.; Christianson, Tami; Uhrich, Mark A.
2015-01-01
Fluvial sediment, a vital surface water resource, is hazardous in excess. Suspended sediment, the most prevalent source of impairment of river systems, can adversely affect flood control, navigation, fisheries and aquatic ecosystems, recreation, and water supply (e.g., Rasmussen et al., 2009; Qu, 2014). Monitoring programs typically focus on suspended-sediment concentration (SSC) and discharge (SSQ). These time-series data are used to study changes to basin hydrology, geomorphology, and ecology caused by disturbances. The U.S. Geological Survey (USGS) has traditionally used physical sediment sample-based methods (Edwards and Glysson, 1999; Nolan et al., 2005; Gray et al., 2008) to compute SSC and SSQ from continuous streamflow data using a sediment transport-curve (e.g., Walling, 1977) or hydrologic interpretation (Porterfield, 1972). Accuracy of these data is typically constrained by the resources required to collect and analyze intermittent physical samples. Quantifying SSC using continuous instream turbidity is rapidly becoming common practice among sediment monitoring programs. Estimations of SSC and SSQ are modeled from linear regression analysis of concurrent turbidity and physical samples. Sediment-surrogate technologies such as turbidity promise near real-time information, increased accuracy, and reduced cost compared to traditional physical sample-based methods (Walling, 1977; Uhrich and Bragg, 2003; Gray and Gartner, 2009; Rasmussen et al., 2009; Landers et al., 2012; Landers and Sturm, 2013; Uhrich et al., 2014). Statistical comparisons among SSQ computation methods show that turbidity-SSC regression models can have much less uncertainty than streamflow-based sediment transport-curves or hydrologic interpretation (Walling, 1977; Lewis, 1996; Glysson et al., 2001; Lee et al., 2008). However, computation of SSC and SSQ records from continuous instream turbidity data is not without challenges; some of these include environmental fouling, calibration, and data range among sensors. Of greatest interest to many programs is a hysteresis in the relationship between turbidity and SSC, attributed to temporal variation of particle size distribution (Landers and Sturm, 2013; Uhrich et al., 2014). This phenomenon causes increased uncertainty in regression-estimated values of SSC, due to changes in nephelometric reflectance off the varying grain sizes in suspension (Uhrich et al., 2014). Here, we assess the feasibility and application of close-range remote sensing to quantify SSC and particle size distribution of a disturbed, and highly-turbid, river system. We use a consumer-grade digital camera to acquire imagery of the river surface and a depth-integrating sampler to collect concurrent suspended-sediment samples. We then develop two empirical linear regression models to relate image spectral information to concentrations of fine sediment (clay to silt) and total suspended sediment. Before presenting our regression model development, we briefly summarize each data-acquisition method.
Sun, Yu; Reynolds, Hayley M; Wraith, Darren; Williams, Scott; Finnegan, Mary E; Mitchell, Catherine; Murphy, Declan; Haworth, Annette
2018-04-26
There are currently no methods to estimate cell density in the prostate. This study aimed to develop predictive models to estimate prostate cell density from multiparametric magnetic resonance imaging (mpMRI) data at a voxel level using machine learning techniques. In vivo mpMRI data were collected from 30 patients before radical prostatectomy. Sequences included T2-weighted imaging, diffusion-weighted imaging and dynamic contrast-enhanced imaging. Ground truth cell density maps were computed from histology and co-registered with mpMRI. Feature extraction and selection were performed on mpMRI data. Final models were fitted using three regression algorithms including multivariate adaptive regression spline (MARS), polynomial regression (PR) and generalised additive model (GAM). Model parameters were optimised using leave-one-out cross-validation on the training data and model performance was evaluated on test data using root mean square error (RMSE) measurements. Predictive models to estimate voxel-wise prostate cell density were successfully trained and tested using the three algorithms. The best model (GAM) achieved a RMSE of 1.06 (± 0.06) × 10 3 cells/mm 2 and a relative deviation of 13.3 ± 0.8%. Prostate cell density can be quantitatively estimated non-invasively from mpMRI data using high-quality co-registered data at a voxel level. These cell density predictions could be used for tissue classification, treatment response evaluation and personalised radiotherapy.
Mullaney, John R.; Schwarz, Gregory E.
2013-01-01
The total nitrogen load to Long Island Sound from Connecticut and contributing areas to the north was estimated for October 1998 to September 2009. Discrete measurements of total nitrogen concentrations and continuous flow data from 37 water-quality monitoring stations in the Long Island Sound watershed were used to compute total annual nitrogen yields and loads. Total annual computed yields and basin characteristics were used to develop a generalized-least squares regression model for use in estimating the total nitrogen yields from unmonitored areas in coastal and central Connecticut. Significant variables in the regression included the percentage of developed land, percentage of row crops, point-source nitrogen yields from wastewater-treatment facilities, and annual mean streamflow. Computed annual median total nitrogen yields at individual monitoring stations ranged from less than 2,000 pounds per square mile in mostly forested basins (typically less than 10 percent developed land) to more than 13,000 pounds per square mile in urban basins (greater than 40 percent developed) with wastewater-treatment facilities and in one agricultural basin. Medians of computed total annual nitrogen yields for water years 1999–2009 at most stations were similar to those previously computed for water years 1988–98. However, computed medians of annual yields at several stations, including the Naugatuck River, Quinnipiac River, and Hockanum River, were lower than during 1988–98. Nitrogen yields estimated for 26 unmonitored areas downstream from monitoring stations ranged from less than 2,000 pounds per square mile to 34,000 pounds per square mile. Computed annual total nitrogen loads at the farthest downstream monitoring stations were combined with the corresponding estimates for the downstream unmonitored areas for a combined estimate of the total nitrogen load from the entire study area. Resulting combined total nitrogen loads ranged from 38 to 68 million pounds per year during water years 1999–2009. Total annual loads from the monitored basins represent 63 to 74 percent of the total load. Computed annual nitrogen loads from four stations near the Massachusetts border with Connecticut represent 52 to 54 percent of the total nitrogen load during water years 2008–9, the only years with data for all the border sites. During the latter part of the 1999–2009 study period, total nitrogen loads to Long Island Sound from the study area appeared to increase slightly. The apparent increase in loads may be due to higher than normal streamflows, which consequently increased nonpoint nitrogen loads during the study, offsetting major reductions of nitrogen from wastewater-treatment facilities. Nitrogen loads from wastewater treatment facilities declined as much as 2.3 million pounds per year in areas of Connecticut upstream from the monitoring stations and as much as 5.8 million pounds per year in unmonitored areas downstream in coastal and central Connecticut.
Li, Richard Y.; Di Felice, Rosa; Rohs, Remo; Lidar, Daniel A.
2018-01-01
Transcription factors regulate gene expression, but how these proteins recognize and specifically bind to their DNA targets is still debated. Machine learning models are effective means to reveal interaction mechanisms. Here we studied the ability of a quantum machine learning approach to predict binding specificity. Using simplified datasets of a small number of DNA sequences derived from actual binding affinity experiments, we trained a commercially available quantum annealer to classify and rank transcription factor binding. The results were compared to state-of-the-art classical approaches for the same simplified datasets, including simulated annealing, simulated quantum annealing, multiple linear regression, LASSO, and extreme gradient boosting. Despite technological limitations, we find a slight advantage in classification performance and nearly equal ranking performance using the quantum annealer for these fairly small training data sets. Thus, we propose that quantum annealing might be an effective method to implement machine learning for certain computational biology problems. PMID:29652405
System Identification Applied to Dynamic CFD Simulation and Wind Tunnel Data
NASA Technical Reports Server (NTRS)
Murphy, Patrick C.; Klein, Vladislav; Frink, Neal T.; Vicroy, Dan D.
2011-01-01
Demanding aerodynamic modeling requirements for military and civilian aircraft have provided impetus for researchers to improve computational and experimental techniques. Model validation is a key component for these research endeavors so this study is an initial effort to extend conventional time history comparisons by comparing model parameter estimates and their standard errors using system identification methods. An aerodynamic model of an aircraft performing one-degree-of-freedom roll oscillatory motion about its body axes is developed. The model includes linear aerodynamics and deficiency function parameters characterizing an unsteady effect. For estimation of unknown parameters two techniques, harmonic analysis and two-step linear regression, were applied to roll-oscillatory wind tunnel data and to computational fluid dynamics (CFD) simulated data. The model used for this study is a highly swept wing unmanned aerial combat vehicle. Differences in response prediction, parameters estimates, and standard errors are compared and discussed
[Theory, method and application of method R on estimation of (co)variance components].
Liu, Wen-Zhong
2004-07-01
Theory, method and application of Method R on estimation of (co)variance components were reviewed in order to make the method be reasonably used. Estimation requires R values,which are regressions of predicted random effects that are calculated using complete dataset on predicted random effects that are calculated using random subsets of the same data. By using multivariate iteration algorithm based on a transformation matrix,and combining with the preconditioned conjugate gradient to solve the mixed model equations, the computation efficiency of Method R is much improved. Method R is computationally inexpensive,and the sampling errors and approximate credible intervals of estimates can be obtained. Disadvantages of Method R include a larger sampling variance than other methods for the same data,and biased estimates in small datasets. As an alternative method, Method R can be used in larger datasets. It is necessary to study its theoretical properties and broaden its application range further.
Ching, Travers; Zhu, Xun; Garmire, Lana X
2018-04-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.
Curran, Janet H.; Barth, Nancy A.; Veilleux, Andrea G.; Ourso, Robert T.
2016-03-16
Estimates of the magnitude and frequency of floods are needed across Alaska for engineering design of transportation and water-conveyance structures, flood-insurance studies, flood-plain management, and other water-resource purposes. This report updates methods for estimating flood magnitude and frequency in Alaska and conterminous basins in Canada. Annual peak-flow data through water year 2012 were compiled from 387 streamgages on unregulated streams with at least 10 years of record. Flood-frequency estimates were computed for each streamgage using the Expected Moments Algorithm to fit a Pearson Type III distribution to the logarithms of annual peak flows. A multiple Grubbs-Beck test was used to identify potentially influential low floods in the time series of peak flows for censoring in the flood frequency analysis.For two new regional skew areas, flood-frequency estimates using station skew were computed for stations with at least 25 years of record for use in a Bayesian least-squares regression analysis to determine a regional skew value. The consideration of basin characteristics as explanatory variables for regional skew resulted in improvements in precision too small to warrant the additional model complexity, and a constant model was adopted. Regional Skew Area 1 in eastern-central Alaska had a regional skew of 0.54 and an average variance of prediction of 0.45, corresponding to an effective record length of 22 years. Regional Skew Area 2, encompassing coastal areas bordering the Gulf of Alaska, had a regional skew of 0.18 and an average variance of prediction of 0.12, corresponding to an effective record length of 59 years. Station flood-frequency estimates for study sites in regional skew areas were then recomputed using a weighted skew incorporating the station skew and regional skew. In a new regional skew exclusion area outside the regional skew areas, the density of long-record streamgages was too sparse for regional analysis and station skew was used for all estimates. Final station flood frequency estimates for all study streamgages are presented for the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities.Regional multiple-regression analysis was used to produce equations for estimating flood frequency statistics from explanatory basin characteristics. Basin characteristics, including physical and climatic variables, were updated for all study streamgages using a geographical information system and geospatial source data. Screening for similar-sized nested basins eliminated hydrologically redundant sites, and screening for eligibility for analysis of explanatory variables eliminated regulated peaks, outburst peaks, and sites with indeterminate basin characteristics. An ordinary least‑squares regression used flood-frequency statistics and basin characteristics for 341 streamgages (284 in Alaska and 57 in Canada) to determine the most suitable combination of basin characteristics for a flood-frequency regression model and to explore regional grouping of streamgages for explaining variability in flood-frequency statistics across the study area. The most suitable model for explaining flood frequency used drainage area and mean annual precipitation as explanatory variables for the entire study area as a region. Final regression equations for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probability discharge in Alaska and conterminous basins in Canada were developed using a generalized least-squares regression. The average standard error of prediction for the regression equations for the various annual exceedance probabilities ranged from 69 to 82 percent, and the pseudo-coefficient of determination (pseudo-R2) ranged from 85 to 91 percent.The regional regression equations from this study were incorporated into the U.S. Geological Survey StreamStats program for a limited area of the State—the Cook Inlet Basin. StreamStats is a national web-based geographic information system application that facilitates retrieval of streamflow statistics and associated information. StreamStats retrieves published data for gaged sites and, for user-selected ungaged sites, delineates drainage areas from topographic and hydrographic data, computes basin characteristics, and computes flood frequency estimates using the regional regression equations.
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Zhang, J; Feng, J-Y; Ni, Y-L; Wen, Y-J; Niu, Y; Tamba, C L; Yue, C; Song, Q; Zhang, Y-M
2017-06-01
Multilocus genome-wide association studies (GWAS) have become the state-of-the-art procedure to identify quantitative trait nucleotides (QTNs) associated with complex traits. However, implementation of multilocus model in GWAS is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multilocus GWAS under polygenic background control. We used an algorithm of model transformation that whitened the covariance matrix of the polygenic matrix K and environmental noise. Markers on one chromosome were included simultaneously in a multilocus model and least angle regression was used to select the most potentially associated single-nucleotide polymorphisms (SNPs), whereas the markers on the other chromosomes were used to calculate kinship matrix as polygenic background control. The selected SNPs in multilocus model were further detected for their association with the trait by empirical Bayes and likelihood ratio test. We herein refer to this method as the pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes). Results from simulation studies showed that pLARmEB was more powerful in QTN detection and more accurate in QTN effect estimation, had less false positive rate and required less computing time than Bayesian hierarchical generalized linear model, efficient mixed model association (EMMA) and least angle regression plus empirical Bayes. pLARmEB, multilocus random-SNP-effect mixed linear model and fast multilocus random-SNP-effect EMMA methods had almost equal power of QTN detection in simulation experiments. However, only pLARmEB identified 48 previously reported genes for 7 flowering time-related traits in Arabidopsis thaliana.
NASA Astrophysics Data System (ADS)
Mirniaharikandehei, Seyedehnafiseh; Hollingsworth, Alan B.; Patel, Bhavika; Heidari, Morteza; Liu, Hong; Zheng, Bin
2018-05-01
This study aims to investigate the feasibility of identifying a new quantitative imaging marker based on false-positives generated by a computer-aided detection (CAD) scheme to help predict short-term breast cancer risk. An image dataset including four view mammograms acquired from 1044 women was retrospectively assembled. All mammograms were originally interpreted as negative by radiologists. In the next subsequent mammography screening, 402 women were diagnosed with breast cancer and 642 remained negative. An existing CAD scheme was applied ‘as is’ to process each image. From CAD-generated results, four detection features including the total number of (1) initial detection seeds and (2) the final detected false-positive regions, (3) average and (4) sum of detection scores, were computed from each image. Then, by combining the features computed from two bilateral images of left and right breasts from either craniocaudal or mediolateral oblique view, two logistic regression models were trained and tested using a leave-one-case-out cross-validation method to predict the likelihood of each testing case being positive in the next subsequent screening. The new prediction model yielded the maximum prediction accuracy with an area under a ROC curve of AUC = 0.65 ± 0.017 and the maximum adjusted odds ratio of 4.49 with a 95% confidence interval of (2.95, 6.83). The results also showed an increasing trend in the adjusted odds ratio and risk prediction scores (p < 0.01). Thus, this study demonstrated that CAD-generated false-positives might include valuable information, which needs to be further explored for identifying and/or developing more effective imaging markers for predicting short-term breast cancer risk.
Determining degree of optic nerve edema from color fundus photography
NASA Astrophysics Data System (ADS)
Agne, Jason; Wang, Jui-Kai; Kardon, Randy H.; Garvin, Mona K.
2015-03-01
Swelling of the optic nerve head (ONH) is subjectively assessed by clinicians using the Frisén scale. It is believed that a direct measurement of the ONH volume would serve as a better representation of the swelling. However, a direct measurement requires optic nerve imaging with spectral domain optical coherence tomography (SD-OCT) and 3D segmentation of the resulting images, which is not always available during clinical evaluation. Furthermore, telemedical imaging of the eye at remote locations is more feasible with non-mydriatic fundus cameras which are less costly than OCT imagers. Therefore, there is a critical need to develop a more quantitative analysis of optic nerve swelling on a continuous scale, similar to SD-OCT. Here, we select features from more commonly available 2D fundus images and use them to predict ONH volume. Twenty-six features were extracted from each of 48 color fundus images. The features include attributes of the blood vessels, optic nerve head, and peripapillary retina areas. These features were used in a regression analysis to predict ONH volume, as computed by a segmentation of the SD-OCT image. The results of the regression analysis yielded a mean square error of 2.43 mm3 and a correlation coefficient between computed and predicted volumes of R = 0:771, which suggests that ONH volume may be predicted from fundus features alone.
Tachinami, H; Tomihara, K; Fujiwara, K; Nakamori, K; Noguchi, M
2017-11-01
A retrospective cohort study was performed to assess the clinical usefulness of combination assessment using computed tomography (CT) images in patients undergoing third molar extraction. This study included 85 patients (124 extraction sites). The relationship between cortication status, buccolingual position, and shape of the inferior alveolar canal (IAC) on CT images and the incidence of inferior alveolar nerve (IAN) injury after third molar extraction was evaluated. IAN injury was observed at eight of the 124 sites (6.5%), and in five of 19 sites (26.3%) in which cortication was absent+the IAC had a lingual position+the IAC had a dumbbell shape. Significant relationships were found between IAN injury and the three IAC factors (cortication status, IAC position, and IAC shape; P=0.0001). In patients with the three IAC factors, logistic regression analysis indicated a strong association between these factors and IAN injury (P=0.007). An absence of cortication, a lingually positioned IAC, and a dumbbell-shaped IAC are considered to indicate a high risk of IAN injury according to the logistic regression analysis (P=0.007). These results suggest that a combined assessment of these three IAC factors could be useful for the improved prediction of IAN injury. Copyright © 2017 International Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Maggin, Daniel M.; Swaminathan, Hariharan; Rogers, Helen J.; O'Keeffe, Breda V.; Sugai, George; Horner, Robert H.
2011-01-01
A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of…
Predictive and mechanistic multivariate linear regression models for reaction development
Santiago, Celine B.; Guo, Jing-Yao
2018-01-01
Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
NASA Technical Reports Server (NTRS)
Rogers, David
1991-01-01
G/SPLINES are a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm with Holland's Genetic Algorithm. In this hybrid, the incremental search is replaced by a genetic search. The G/SPLINE algorithm exhibits performance comparable to that of the MARS algorithm, requires fewer least squares computations, and allows significantly larger problems to be considered.
Validation of Core Temperature Estimation Algorithm
2016-01-29
plot of observed versus estimated core temperature with the line of identity (dashed) and the least squares regression line (solid) and line equation...estimated PSI with the line of identity (dashed) and the least squares regression line (solid) and line equation in the top left corner. (b) Bland...for comparison. The root mean squared error (RMSE) was also computed, as given by Equation 2.
Lu, Lee-Jane W.; Nishino, Thomas K.; Khamapirad, Tuenchit; Grady, James J; Leonard, Morton H.; Brunder, Donald G.
2009-01-01
Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2=0.93) and %density (R2=0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies. PMID:17671343
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
NASA Astrophysics Data System (ADS)
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-03-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-01-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states. PMID:26996254
Acute imaging does not improve ASTRAL score's accuracy despite having a prognostic value.
Ntaios, George; Papavasileiou, Vasileios; Faouzi, Mohamed; Vanacker, Peter; Wintermark, Max; Michel, Patrik
2014-10-01
The ASTRAL score was recently shown to reliably predict three-month functional outcome in patients with acute ischemic stroke. The study aims to investigate whether information from multimodal imaging increases ASTRAL score's accuracy. All patients registered in the ASTRAL registry until March 2011 were included. In multivariate logistic-regression analyses, we added covariates derived from parenchymal, vascular, and perfusion imaging to the 6-parameter model of the ASTRAL score. If a specific imaging covariate remained an independent predictor of three-month modified Rankin score>2, the area-under-the-curve (AUC) of this new model was calculated and compared with ASTRAL score's AUC. We also performed similar logistic regression analyses in arbitrarily chosen patient subgroups. When added to the ASTRAL score, the following covariates on admission computed tomography/magnetic resonance imaging-based multimodal imaging were not significant predictors of outcome: any stroke-related acute lesion, any nonstroke-related lesions, chronic/subacute stroke, leukoaraiosis, significant arterial pathology in ischemic territory on computed tomography angiography/magnetic resonance angiography/Doppler, significant intracranial arterial pathology in ischemic territory, and focal hypoperfusion on perfusion-computed tomography. The Alberta Stroke Program Early CT score on plain imaging and any significant extracranial arterial pathology on computed tomography angiography/magnetic resonance angiography/Doppler were independent predictors of outcome (odds ratio: 0·93, 95% CI: 0·87-0·99 and odds ratio: 1·49, 95% CI: 1·08-2·05, respectively) but did not increase ASTRAL score's AUC (0·849 vs. 0·850, and 0·8563 vs. 0·8564, respectively). In exploratory analyses in subgroups of different prognosis, age or stroke severity, no covariate was found to increase ASTRAL score's AUC, either. The addition of information derived from multimodal imaging does not increase ASTRAL score's accuracy to predict functional outcome despite having an independent prognostic value. More selected radiological parameters applied in specific subgroups of stroke patients may add prognostic value of multimodal imaging. © 2014 World Stroke Organization.
Ida, Hirofumi; Fukuhara, Kazunobu; Sawada, Misako; Ishii, Motonobu
2011-01-01
The purpose of this study was to determine the quantitative relationships between the server's motion and the receiver's anticipation using a computer graphic animation of tennis serves. The test motions were determined by capturing the motion of a model player and estimating the computational perturbations caused by modulating the rotation of the player's elbow and forearm joints. Eight experienced and eight novice players rated their anticipation of the speed, direction, and spin of the ball on a visual analogue scale. The experienced players significantly altered some of their anticipatory judgment depending on the percentage of both the forearm and elbow modulations, while the novice players indicated no significant changes. Multiple regression analyses, including that of the racket's kinematic parameters immediately before racket-ball impact as independent variables, showed that the experienced players demonstrated a higher coefficient of determination than the novice players in their anticipatory judgment of the ball direction. The results have implications on the understanding of the functional relation between a player's motion and the opponent's anticipatory judgment during real play.
Krug, Rodrigo de Rosso; Silva, Anna Quialheiro Abreu da; Schneider, Ione Jayce Ceola; Ramos, Luiz Roberto; d'Orsi, Eleonora; Xavier, André Junqueira
2017-04-01
To estimate the effect of participating in cognitive cooperation groups, mediated by computers and the internet, on the Mini-Mental State Examination (MMSE) percent variation of outpatients with memory complaints attending two memory clinics. A prospective controlled intervention study carried out from 2006 to 2013 with 293 elders. The intervention group (n = 160) attended a cognitive cooperation group (20 sessions of 1.5 hours each). The control group (n = 133) received routine medical care. Outcome was the percent variation in the MMSE. Control variables included gender, age, marital status, schooling, hypertension, diabetes, dyslipidaemia, hypothyroidism, depression, vascular diseases, polymedication, use of benzodiazepines, exposure to tobacco, sedentary lifestyle, obesity and functional capacity. The final model was obtained by multivariate linear regression. The intervention group obtained an independent positive variation of 24.39% (CI 95% = 14.86/33.91) in the MMSE compared to the control group. The results suggested that cognitive cooperation groups, mediated by computers and the internet, are associated with cognitive status improvement of older adults in memory clinics.
Estimation of Unsteady Aerodynamic Models from Dynamic Wind Tunnel Data
NASA Technical Reports Server (NTRS)
Murphy, Patrick; Klein, Vladislav
2011-01-01
Demanding aerodynamic modelling requirements for military and civilian aircraft have motivated researchers to improve computational and experimental techniques and to pursue closer collaboration in these areas. Model identification and validation techniques are key components for this research. This paper presents mathematical model structures and identification techniques that have been used successfully to model more general aerodynamic behaviours in single-degree-of-freedom dynamic testing. Model parameters, characterizing aerodynamic properties, are estimated using linear and nonlinear regression methods in both time and frequency domains. Steps in identification including model structure determination, parameter estimation, and model validation, are addressed in this paper with examples using data from one-degree-of-freedom dynamic wind tunnel and water tunnel experiments. These techniques offer a methodology for expanding the utility of computational methods in application to flight dynamics, stability, and control problems. Since flight test is not always an option for early model validation, time history comparisons are commonly made between computational and experimental results and model adequacy is inferred by corroborating results. An extension is offered to this conventional approach where more general model parameter estimates and their standard errors are compared.
Wu, Haifeng; Sun, Tao; Wang, Jingjing; Li, Xia; Wang, Wei; Huo, Da; Lv, Pingxin; He, Wen; Wang, Keyang; Guo, Xiuhua
2013-08-01
The objective of this study was to investigate the method of the combination of radiological and textural features for the differentiation of malignant from benign solitary pulmonary nodules by computed tomography. Features including 13 gray level co-occurrence matrix textural features and 12 radiological features were extracted from 2,117 CT slices, which came from 202 (116 malignant and 86 benign) patients. Lasso-type regularization to a nonlinear regression model was applied to select predictive features and a BP artificial neural network was used to build the diagnostic model. Eight radiological and two textural features were obtained after the Lasso-type regularization procedure. Twelve radiological features alone could reach an area under the ROC curve (AUC) of 0.84 in differentiating between malignant and benign lesions. The 10 selected characters improved the AUC to 0.91. The evaluation results showed that the method of selecting radiological and textural features appears to yield more effective in the distinction of malignant from benign solitary pulmonary nodules by computed tomography.
Stone, Mandy L.; Graham, Jennifer L.; Gatotho, Jackline W.
2013-01-01
Cheney Reservoir, located in south-central Kansas, is one of the primary water supplies for the city of Wichita, Kansas. The U.S. Geological Survey has operated a continuous real-time water-quality monitoring station in Cheney Reservoir since 2001; continuously measured physicochemical properties include specific conductance, pH, water temperature, dissolved oxygen, turbidity, fluorescence (wavelength range 650 to 700 nanometers; estimate of total chlorophyll), and reservoir elevation. Discrete water-quality samples were collected during 2001 through 2009 and analyzed for sediment, nutrients, taste-and-odor compounds, cyanotoxins, phytoplankton community composition, actinomycetes bacteria, and other water-quality measures. Regression models were developed to establish relations between discretely sampled constituent concentrations and continuously measured physicochemical properties to compute concentrations of constituents that are not easily measured in real time. The water-quality information in this report is important to the city of Wichita because it allows quantification and characterization of potential constituents of concern in Cheney Reservoir. This report updates linear regression models published in 2006 that were based on data collected during 2001 through 2003. The update uses discrete and continuous data collected during May 2001 through December 2009. Updated models to compute dissolved solids, sodium, chloride, and suspended solids were similar to previously published models. However, several other updated models changed substantially from previously published models. In addition to updating relations that were previously developed, models also were developed for four new constituents, including magnesium, dissolved phosphorus, actinomycetes bacteria, and the cyanotoxin microcystin. In addition, a conversion factor of 0.74 was established to convert the Yellow Springs Instruments (YSI) model 6026 turbidity sensor measurements to the newer YSI model 6136 sensor at the Cheney Reservoir site. Because a high percentage of geosmin and microcystin data were below analytical detection thresholds (censored data), multiple logistic regression was used to develop models that best explained the probability of geosmin and microcystin concentrations exceeding relevant thresholds. The geosmin and microcystin models are particularly important because geosmin is a taste-and-odor compound and microcystin is a cyanotoxin.
Time Advice and Learning Questions in Computer Simulations
ERIC Educational Resources Information Center
Rey, Gunter Daniel
2011-01-01
Students (N = 101) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without time advice) x 3 (with learning questions and corrective feedback, with…
ERIC Educational Resources Information Center
Lee, Young-Jin
2017-01-01
Purpose: The purpose of this paper is to develop a quantitative model of problem solving performance of students in the computer-based mathematics learning environment. Design/methodology/approach: Regularized logistic regression was used to create a quantitative model of problem solving performance of students that predicts whether students can…
ERIC Educational Resources Information Center
Ibrahim, Sara
2017-01-01
The insider security threat causes new and dangerous dimensions in cloud computing. Those internal threats are originated from contractors or the business partners' input that have access to the systems. A study of trustworthiness and transparency might assist the organizations to monitor employees' activity more cautiously on cloud technologies…
Zakerian, SA; Subramaniam, ID
2011-01-01
Background: With computers rapidly carving a niche in virtually every nook and crevice of today’s fast-paced society, musculoskeletal disorders are becoming more prevalent among computer users, which comprise a wide spectrum of the Malaysian population, including office workers. While extant literature depicts extensive research on musculoskeletal disorders in general, the five dimensions of psychosocial work factors (job demands, job contentment, job control, computer-related problems and social interaction) attributed to work-related musculoskeletal disorders have been neglected. This study examines the aforementioned elements in detail, pertaining to their relationship with musculoskeletal disorders, focusing in particular, on 120 office workers at Malaysian public sector organizations, whose jobs require intensive computer usage. Methods: Research was conducted between March and July 2009 in public service organizations in Malaysia. This study was conducted via a survey utilizing self-complete questionnaires and diary. The relationship between psychosocial work factors and musculoskeletal discomfort was ascertained through regression analyses, which revealed that some factors were more important than others were. Results: The results indicate a significant relationship among psychosocial work factors and musculoskeletal discomfort among computer users. Several of these factors such as job control, computer-related problem and social interaction of psychosocial work factors are found to be more important than others in musculoskeletal discomfort. Conclusion: With computer usage on the rise among users, the prevalence of musculoskeletal discomfort could lead to unnecessary disabilities, hence, the vital need for greater attention to be given on this aspect in the work place, to alleviate to some extent, potential problems in future. PMID:23113058
Jackman, Patrick; Sun, Da-Wen; Elmasry, Gamal
2012-08-01
A new algorithm for the conversion of device dependent RGB colour data into device independent L*a*b* colour data without introducing noticeable error has been developed. By combining a linear colour space transform and advanced multiple regression methodologies it was possible to predict L*a*b* colour data with less than 2.2 colour units of error (CIE 1976). By transforming the red, green and blue colour components into new variables that better reflect the structure of the L*a*b* colour space, a low colour calibration error was immediately achieved (ΔE(CAL) = 14.1). Application of a range of regression models on the data further reduced the colour calibration error substantially (multilinear regression ΔE(CAL) = 5.4; response surface ΔE(CAL) = 2.9; PLSR ΔE(CAL) = 2.6; LASSO regression ΔE(CAL) = 2.1). Only the PLSR models deteriorated substantially under cross validation. The algorithm is adaptable and can be easily recalibrated to any working computer vision system. The algorithm was tested on a typical working laboratory computer vision system and delivered only a very marginal loss of colour information ΔE(CAL) = 2.35. Colour features derived on this system were able to safely discriminate between three classes of ham with 100% correct classification whereas colour features measured on a conventional colourimeter were not. Copyright © 2012 Elsevier Ltd. All rights reserved.
Incremental online learning in high dimensions.
Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan
2005-12-01
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Use of partial least squares regression to impute SNP genotypes in Italian cattle breeds.
Dimauro, Corrado; Cellesi, Massimo; Gaspa, Giustino; Ajmone-Marsan, Paolo; Steri, Roberto; Marras, Gabriele; Macciotta, Nicolò P P
2013-06-05
The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available.
Xuan, Min; Zhou, Fengsheng; Ding, Yan; Zhu, Qiaoying; Dong, Ji; Zhou, Hao; Cheng, Jun; Jiang, Xiao; Wu, Pengxi
2018-04-01
To review the diagnostic accuracy of contrast-enhanced ultrasound (CEUS) used to detect residual or recurrent liver tumors after radiofrequency ablation (RFA). This technique uses contrast-enhanced computer tomography or/and contrast-enhanced magnetic resonance imaging as the gold standard of investigation. MEDLINE, EMBASE, and COCHRANE were systematically searched for all potentially eligible studies comparing CEUS with the reference standard that follows RFA. Risk of bias and applicability concerns were addressed by adopting the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Pooled point estimates for sensitivity, specificity, positive and negative likelihood ratios, and diagnostic odds ratios (DOR) with 95% CI were computed before plotting the sROC (summary receiver operating characteristic) curve. Meta-regression and subgroup analysis were used to identify the source of the heterogeneity that was detected. Publication bias was evaluated using Deeks' funnel plot asymmetry test. Ten eligible studies on 1162 lesions that occurred between 2001 and 2016 were included in the final analysis. The quality of the included studies assessed by the QUADAS-2 tool was considered reasonable. The pooled sensitivity and specificity of CEUS in detecting residual or recurrent liver tumors had the following values: 0.90 (95% CI 0.85-0.94) and 1.00 (95% CI 0.99-1.00), respectively. Overall DOR was 420.10 (95% CI 142.30-1240.20). The sources of heterogeneity could not be precisely identified by meta-regression or subgroup analysis. No evidence of publication bias was found. This study confirmed that CEUS exhibits high sensitivity and specificity in assessing therapeutic responses to RFA for liver tumors.
Griffee, Karen; Swindell, Sam; O'Keefe, Stephen L; Stroebel, Sandra S; Beard, Keith W; Kuo, Shih-Ya; Stroupe, Walter
2016-10-01
Retrospective data from 1,821 women and 1,064 men with one or more siblings, provided anonymously using a computer-assisted self-interview, were used to identify risk factors for sibling incest (SI); 137 were participants in SI. In order of decreasing predictive power, the risk factors identified by the multiple logistic regression analysis included ever having shared a bed for sleeping with a sibling, parent-child incest (PCI), family nudity, low levels of maternal affection, and ever having shared a tub bath with a sibling. The results were consistent with the idea that SI in many families was the cumulative result of four types of parental behaviors: (a) factors that lower external barriers to sexual behavior (e.g., permitting co-sleeping or co-bathing of sibling dyads), (b) factors that encourage nudity of children within the nuclear family and permit children to see the parent's genitals, (c) factors that lead to the siblings relying on one another for affection (e.g., diminished maternal affection), and (d) factors that eroticize young children (e.g., child sexual abuse [CSA] by a parent). Thirty-eight of the 137 SI participants were participants in coerced sibling incest (CSI). In order of decreasing predictive power, risk factors for CSI identified by multiple logistic regression analysis included ever having shared a bed for sleeping with a brother, PCI, witnessing parental physical fighting, and family nudity. SI was more likely to have been reported as CSI if the sibling had touched the reporting sibling's genitals, and less likely to have been reported as CSI if the siblings had shared a bed. © The Author(s) 2014.
Osawa, Kazuhiro; Nakanishi, Rine; Miyoshi, Toru; Rahmani, Sina; Ceponiene, Indre; Nezarat, Negin; Kanisawa, Mitsuru; Qi, Hong; Jayawardena, Eranthi; Kim, Nicholas; Ito, Hiroshi; Budoff, Matthew J
2018-04-26
Increased arterial stiffness is reportedly associated with cardiac remodelling, including the left atrium and left ventricle, in middle-aged and older adults. However, little is known about this association in young adults. In total, 73 patients (44 (60%) men) aged 25 to 45 years with suspected coronary artery disease were included in the analysis. The left atrial volume index (LAVI), left ventricular volume index (LVVI), and left ventricular mass index (LVMI) were measured using coronary computed tomography angiography (CCTA). Arterial stiffness was assessed with the cardio-ankle vascular index (CAVI). An abnormally high CAVI was defined as that above the age- and sex-specific cut-off points of the CAVI. Compared with patients with a normal CAVI, those with an abnormally high CAVI were older and had a greater prevalence of diabetes mellitus, higher diastolic blood pressure, greater coronary artery calcification score, and a greater LAVI (33.5±10.3 vs. 43.0±10.3mL/m 2 , p <0.01). In contrast, there were no significant differences in the LVVI or LVMI between the subgroups with a normal CAVI and an abnormally high CAVI. Multivariate linear regression analysis showed that the LAVI was significantly associated with an abnormally high CAVI (standardised regression coefficient=0.283, p=0.03). The present study demonstrated that increased arterial stiffness is associated with the LAVI, which reflects the early stages of cardiac remodelling, independent of various comorbidity factors in young adults with suspected coronary artery disease. Copyright © 2018 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.
Messay, Temesguen; Hardie, Russell C; Tuinstra, Timothy R
2015-05-01
We present new pulmonary nodule segmentation algorithms for computed tomography (CT). These include a fully-automated (FA) system, a semi-automated (SA) system, and a hybrid system. Like most traditional systems, the new FA system requires only a single user-supplied cue point. On the other hand, the SA system represents a new algorithm class requiring 8 user-supplied control points. This does increase the burden on the user, but we show that the resulting system is highly robust and can handle a variety of challenging cases. The proposed hybrid system starts with the FA system. If improved segmentation results are needed, the SA system is then deployed. The FA segmentation engine has 2 free parameters, and the SA system has 3. These parameters are adaptively determined for each nodule in a search process guided by a regression neural network (RNN). The RNN uses a number of features computed for each candidate segmentation. We train and test our systems using the new Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) data. To the best of our knowledge, this is one of the first nodule-specific performance benchmarks using the new LIDC-IDRI dataset. We also compare the performance of the proposed methods with several previously reported results on the same data used by those other methods. Our results suggest that the proposed FA system improves upon the state-of-the-art, and the SA system offers a considerable boost over the FA system. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
A Streamflow Statistics (StreamStats) Web Application for Ohio
Koltun, G.F.; Kula, Stephanie P.; Puskas, Barry M.
2006-01-01
A StreamStats Web application was developed for Ohio that implements equations for estimating a variety of streamflow statistics including the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year peak streamflows, mean annual streamflow, mean monthly streamflows, harmonic mean streamflow, and 25th-, 50th-, and 75th-percentile streamflows. StreamStats is a Web-based geographic information system application designed to facilitate the estimation of streamflow statistics at ungaged locations on streams. StreamStats can also serve precomputed streamflow statistics determined from streamflow-gaging station data. The basic structure, use, and limitations of StreamStats are described in this report. To facilitate the level of automation required for Ohio's StreamStats application, the technique used by Koltun (2003)1 for computing main-channel slope was replaced with a new computationally robust technique. The new channel-slope characteristic, referred to as SL10-85, differed from the National Hydrography Data based channel slope values (SL) reported by Koltun (2003)1 by an average of -28.3 percent, with the median change being -13.2 percent. In spite of the differences, the two slope measures are strongly correlated. The change in channel slope values resulting from the change in computational method necessitated revision of the full-model equations for flood-peak discharges originally presented by Koltun (2003)1. Average standard errors of prediction for the revised full-model equations presented in this report increased by a small amount over those reported by Koltun (2003)1, with increases ranging from 0.7 to 0.9 percent. Mean percentage changes in the revised regression and weighted flood-frequency estimates relative to regression and weighted estimates reported by Koltun (2003)1 were small, ranging from -0.72 to -0.25 percent and -0.22 to 0.07 percent, respectively.
VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.
Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro
2016-01-01
In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.
Rosenkrantz, Andrew B; Doshi, Ankur M; Ginocchio, Luke A; Aphinyanaphongs, Yindalon
2016-12-01
This study aimed to assess the performance of a text classification machine-learning model in predicting highly cited articles within the recent radiological literature and to identify the model's most influential article features. We downloaded from PubMed the title, abstract, and medical subject heading terms for 10,065 articles published in 25 general radiology journals in 2012 and 2013. Three machine-learning models were applied to predict the top 10% of included articles in terms of the number of citations to the article in 2014 (reflecting the 2-year time window in conventional impact factor calculations). The model having the highest area under the curve was selected to derive a list of article features (words) predicting high citation volume, which was iteratively reduced to identify the smallest possible core feature list maintaining predictive power. Overall themes were qualitatively assigned to the core features. The regularized logistic regression (Bayesian binary regression) model had highest performance, achieving an area under the curve of 0.814 in predicting articles in the top 10% of citation volume. We reduced the initial 14,083 features to 210 features that maintain predictivity. These features corresponded with topics relating to various imaging techniques (eg, diffusion-weighted magnetic resonance imaging, hyperpolarized magnetic resonance imaging, dual-energy computed tomography, computed tomography reconstruction algorithms, tomosynthesis, elastography, and computer-aided diagnosis), particular pathologies (prostate cancer; thyroid nodules; hepatic adenoma, hepatocellular carcinoma, non-alcoholic fatty liver disease), and other topics (radiation dose, electroporation, education, general oncology, gadolinium, statistics). Machine learning can be successfully applied to create specific feature-based models for predicting articles likely to achieve high influence within the radiological literature. Copyright © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Isotani, Shuji; Shimoyama, Hirofumi; Yokota, Isao; Noma, Yasuhiro; Kitamura, Kousuke; China, Toshiyuki; Saito, Keisuke; Hisasue, Shin-ichi; Ide, Hisamitsu; Muto, Satoru; Yamaguchi, Raizo; Ukimura, Osamu; Gill, Inderbir S; Horie, Shigeo
2015-10-01
The predictive model of postoperative renal function may impact on planning nephrectomy. To develop the novel predictive model using combination of clinical indices with computer volumetry to measure the preserved renal cortex volume (RCV) using multidetector computed tomography (MDCT), and to prospectively validate performance of the model. Total 60 patients undergoing radical nephrectomy from 2011 to 2013 participated, including a development cohort of 39 patients and an external validation cohort of 21 patients. RCV was calculated by voxel count using software (Vincent, FUJIFILM). Renal function before and after radical nephrectomy was assessed via the estimated glomerular filtration rate (eGFR). Factors affecting postoperative eGFR were examined by regression analysis to develop the novel model for predicting postoperative eGFR with a backward elimination method. The predictive model was externally validated and the performance of the model was compared with that of the previously reported models. The postoperative eGFR value was associated with age, preoperative eGFR, preserved renal parenchymal volume (RPV), preserved RCV, % of RPV alteration, and % of RCV alteration (p < 0.01). The significant correlated variables for %eGFR alteration were %RCV preservation (r = 0.58, p < 0.01) and %RPV preservation (r = 0.54, p < 0.01). We developed our regression model as follows: postoperative eGFR = 57.87 - 0.55(age) - 15.01(body surface area) + 0.30(preoperative eGFR) + 52.92(%RCV preservation). Strong correlation was seen between postoperative eGFR and the calculated estimation model (r = 0.83; p < 0.001). The external validation cohort (n = 21) showed our model outperformed previously reported models. Combining MDCT renal volumetry and clinical indices might yield an important tool for predicting postoperative renal function.
A generalized right truncated bivariate Poisson regression model with applications to health data.
Islam, M Ataharul; Chowdhury, Rafiqul I
2017-01-01
A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.
A generalized right truncated bivariate Poisson regression model with applications to health data
Islam, M. Ataharul; Chowdhury, Rafiqul I.
2017-01-01
A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Patnaik, Surya N.
2000-01-01
A preliminary aircraft engine design methodology is being developed that utilizes a cascade optimization strategy together with neural network and regression approximation methods. The cascade strategy employs different optimization algorithms in a specified sequence. The neural network and regression methods are used to approximate solutions obtained from the NASA Engine Performance Program (NEPP), which implements engine thermodynamic cycle and performance analysis models. The new methodology is proving to be more robust and computationally efficient than the conventional optimization approach of using a single optimization algorithm with direct reanalysis. The methodology has been demonstrated on a preliminary design problem for a novel subsonic turbofan engine concept that incorporates a wave rotor as a cycle-topping device. Computations of maximum thrust were obtained for a specific design point in the engine mission profile. The results (depicted in the figure) show a significant improvement in the maximum thrust obtained using the new methodology in comparison to benchmark solutions obtained using NEPP in a manual design mode.
NASA Astrophysics Data System (ADS)
Reis, D. S.; Stedinger, J. R.; Martins, E. S.
2005-10-01
This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.
A baseline-free procedure for transformation models under interval censorship.
Gu, Ming Gao; Sun, Liuquan; Zuo, Guoxin
2005-12-01
An important property of Cox regression model is that the estimation of regression parameters using the partial likelihood procedure does not depend on its baseline survival function. We call such a procedure baseline-free. Using marginal likelihood, we show that an baseline-free procedure can be derived for a class of general transformation models under interval censoring framework. The baseline-free procedure results a simplified and stable computation algorithm for some complicated and important semiparametric models, such as frailty models and heteroscedastic hazard/rank regression models, where the estimation procedures so far available involve estimation of the infinite dimensional baseline function. A detailed computational algorithm using Markov Chain Monte Carlo stochastic approximation is presented. The proposed procedure is demonstrated through extensive simulation studies, showing the validity of asymptotic consistency and normality. We also illustrate the procedure with a real data set from a study of breast cancer. A heuristic argument showing that the score function is a mean zero martingale is provided.
Probabilistic forecasting for extreme NO2 pollution episodes.
Aznarte, José L
2017-10-01
In this study, we investigate the convenience of quantile regression to predict extreme concentrations of NO 2 . Contrarily to the usual point-forecasting, where a single value is forecast for each horizon, probabilistic forecasting through quantile regression allows for the prediction of the full probability distribution, which in turn allows to build models specifically fit for the tails of this distribution. Using data from the city of Madrid, including NO 2 concentrations as well as meteorological measures, we build models that predict extreme NO 2 concentrations, outperforming point-forecasting alternatives, and we prove that the predictions are accurate, reliable and sharp. Besides, we study the relative importance of the independent variables involved, and show how the important variables for the median quantile are different than those important for the upper quantiles. Furthermore, we present a method to compute the probability of exceedance of thresholds, which is a simple and comprehensible manner to present probabilistic forecasts maximizing their usefulness. Copyright © 2017 Elsevier Ltd. All rights reserved.
An analysis of the magnitude and frequency of floods on Oahu, Hawaii
Nakahara, R.H.
1980-01-01
An analysis of available peak-flow data for the island of Oahu, Hawaii, was made by using multiple regression techniques which related flood-frequency data to basin and climatic characteristics for 74 gaging stations on Oahu. In the analysis, several different groupings of stations were investigated, including divisions by geographic location and size of drainage area. The grouping consisting of two leeward divisions and one windward division produced the best results. Drainage basins ranged in area from 0.03 to 45.7 square miles. Equations relating flood magnitudes of selected frequencies to basin characteristics were developed for the three divisions of Oahu. These equations can be used to estimate the magnitude and frequency of floods for any site, gaged or ungaged, for any desired recurrence interval from 2 to 100 years. Data on basin characteristics, flood magnitudes for various recurrence intervals from individual station-frequency curves, and computed flood magnitudes by use of the regression equation are tabulated to provide the needed data. (USGS)
A simplified computational fluid-dynamic approach to the oxidizer injector design in hybrid rockets
NASA Astrophysics Data System (ADS)
Di Martino, Giuseppe D.; Malgieri, Paolo; Carmicino, Carmine; Savino, Raffaele
2016-12-01
Fuel regression rate in hybrid rockets is non-negligibly affected by the oxidizer injection pattern. In this paper a simplified computational approach developed in an attempt to optimize the oxidizer injector design is discussed. Numerical simulations of the thermo-fluid-dynamic field in a hybrid rocket are carried out, with a commercial solver, to investigate into several injection configurations with the aim of increasing the fuel regression rate and minimizing the consumption unevenness, but still favoring the establishment of flow recirculation at the motor head end, which is generated with an axial nozzle injector and has been demonstrated to promote combustion stability, and both larger efficiency and regression rate. All the computations have been performed on the configuration of a lab-scale hybrid rocket motor available at the propulsion laboratory of the University of Naples with typical operating conditions. After a preliminary comparison between the two baseline limiting cases of an axial subsonic nozzle injector and a uniform injection through the prechamber, a parametric analysis has been carried out by varying the oxidizer jet flow divergence angle, as well as the grain port diameter and the oxidizer mass flux to study the effect of the flow divergence on heat transfer distribution over the fuel surface. Some experimental firing test data are presented, and, under the hypothesis that fuel regression rate and surface heat flux are proportional, the measured fuel consumption axial profiles are compared with the predicted surface heat flux showing fairly good agreement, which allowed validating the employed design approach. Finally an optimized injector design is proposed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Kunkun, E-mail: ktg@illinois.edu; Inria Bordeaux – Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence; Congedo, Pietro M.
The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable formore » real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.« less
Deep learning for biomarker regression: application to osteoporosis and emphysema on chest CT scans
NASA Astrophysics Data System (ADS)
González, Germán.; Washko, George R.; San José Estépar, Raúl
2018-03-01
Introduction: Biomarker computation using deep-learning often relies on a two-step process, where the deep learning algorithm segments the region of interest and then the biomarker is measured. We propose an alternative paradigm, where the biomarker is estimated directly using a regression network. We showcase this image-tobiomarker paradigm using two biomarkers: the estimation of bone mineral density (BMD) and the estimation of lung percentage of emphysema from CT scans. Materials and methods: We use a large database of 9,925 CT scans to train, validate and test the network for which reference standard BMD and percentage emphysema have been already computed. First, the 3D dataset is reduced to a set of canonical 2D slices where the organ of interest is visible (either spine for BMD or lungs for emphysema). This data reduction is performed using an automatic object detector. Second, The regression neural network is composed of three convolutional layers, followed by a fully connected and an output layer. The network is optimized using a momentum optimizer with an exponential decay rate, using the root mean squared error as cost function. Results: The Pearson correlation coefficients obtained against the reference standards are r = 0.940 (p < 0.00001) and r = 0.976 (p < 0.00001) for BMD and percentage emphysema respectively. Conclusions: The deep-learning regression architecture can learn biomarkers from images directly, without indicating the structures of interest. This approach simplifies the development of biomarker extraction algorithms. The proposed data reduction based on object detectors conveys enough information to compute the biomarkers of interest.
Casero-Alonso, V; López-Fidalgo, J; Torsney, B
2017-01-01
Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A secure distributed logistic regression protocol for the detection of rare adverse drug events
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-01-01
Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models. PMID:22871397
A secure distributed logistic regression protocol for the detection of rare adverse drug events.
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-05-01
There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models.
BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.
Sogancioglu, Gizem; Öztürk, Hakime; Özgür, Arzucan
2017-07-15
The amount of information available in textual format is rapidly increasing in the biomedical domain. Therefore, natural language processing (NLP) applications are becoming increasingly important to facilitate the retrieval and analysis of these data. Computing the semantic similarity between sentences is an important component in many NLP tasks including text retrieval and summarization. A number of approaches have been proposed for semantic sentence similarity estimation for generic English. However, our experiments showed that such approaches do not effectively cover biomedical knowledge and produce poor results for biomedical text. We propose several approaches for sentence-level semantic similarity computation in the biomedical domain, including string similarity measures and measures based on the distributed vector representations of sentences learned in an unsupervised manner from a large biomedical corpus. In addition, ontology-based approaches are presented that utilize general and domain-specific ontologies. Finally, a supervised regression based model is developed that effectively combines the different similarity computation metrics. A benchmark data set consisting of 100 sentence pairs from the biomedical literature is manually annotated by five human experts and used for evaluating the proposed methods. The experiments showed that the supervised semantic sentence similarity computation approach obtained the best performance (0.836 correlation with gold standard human annotations) and improved over the state-of-the-art domain-independent systems up to 42.6% in terms of the Pearson correlation metric. A web-based system for biomedical semantic sentence similarity computation, the source code, and the annotated benchmark data set are available at: http://tabilab.cmpe.boun.edu.tr/BIOSSES/ . gizemsogancioglu@gmail.com or arzucan.ozgur@boun.edu.tr. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Massachusetts Shoreline Change Mapping and Analysis Project, 2013 Update
Thieler, E. Robert; Smith, Theresa L.; Knisel, Julia M.; Sampson, Daniel W.
2013-01-01
Information on rates and trends of shoreline change can be used to improve the understanding of the underlying causes and potential effects of coastal erosion on coastal populations and infrastructure and can support informed coastal management decisions. In this report, we summarize the changes in the historical positions of the shoreline of the Massachusetts coast for the 165 years from 1844 through 2009. The study area includes the Massachusetts coastal region from Salisbury to Westport, including Cape Cod, as well as Martha’s Vineyard, Nantucket, and the Elizabeth Islands. New statewide shoreline data were developed for approximately 1,804 kilometers (1,121 miles) of shoreline using color aerial orthoimagery from 2008 and 2009 and topographic lidar from 2007. The shoreline data were integrated with existing historical shoreline data from the U.S. Geological Survey (USGS) and Massachusetts Office of Coastal Zone Management (CZM) to compute long- (about 150 years) and short-term (about 30 years) rates of shoreline change. A linear regression method was used to calculate long- and short-term rates of shoreline change at 26,510 transects along the Massachusetts coast. In locations where shoreline data were insufficient to use the linear regression method, short-term rates were calculated using an end-point method. Long-term rates of shoreline change are calculated with (LTw) and without (LTwo) shorelines from the 1970s and 1994 to examine the effect of removing these data on measured rates of change. Regionally averaged rates are used to assess the general characteristics of the two-rate computations, and we find that (1) the rates of change for both LTw and LTwo are essentially the same; (2) including more data slightly reduces the uncertainty of the rate, which is expected as the number of shorelines increases; and (3) the data for the shorelines from the 1970s and 1994 are not outliers with respect to the long-term trend. These findings are true for regional averages, but may not hold at specific transects.
Li, Jipeng; Li, Yangyang; Zhang, Yongxing; Zhao, Qinghua
2013-01-01
Purpose This study investigates the neck/shoulder pain (NSP) and low back pain (LBP) among current high school students in Shanghai and explores the relationship between these pains and their possible influences, including digital products, physical activity, and psychological status. Methods An anonymous self-assessment was administered to 3,600 students across 30 high schools in Shanghai. This questionnaire examined the prevalence of NSP and LBP and the level of physical activity as well as the use of mobile phones, personal computers (PC) and tablet computers (Tablet). The CES-D (Center for Epidemiological Studies Depression) scale was also included in the survey. The survey data were analyzed using the chi-square test, univariate logistic analyses and a multivariate logistic regression model. Results Three thousand sixteen valid questionnaires were received including 1,460 (48.41%) from male respondents and 1,556 (51.59%) from female respondents. The high school students in this study showed NSP and LBP rates of 40.8% and 33.1%, respectively, and the prevalence of both influenced by the student’s grade, use of digital products, and mental status; these factors affected the rates of NSP and LBP to varying degrees. The multivariate logistic regression analysis revealed that Gender, grade, soreness after exercise, PC using habits, tablet use, sitting time after school and academic stress entered the final model of NSP, while the final model of LBP consisted of gender, grade, soreness after exercise, PC using habits, mobile phone use, sitting time after school, academic stress and CES-D score. Conclusions High school students in Shanghai showed high prevalence of NSP and LBP that were closely related to multiple factors. Appropriate interventions should be implemented to reduce the occurrences of NSP and LBP. PMID:24147114
Plan View Pattern Control for Steel Plates through Constrained Locally Weighted Regression
NASA Astrophysics Data System (ADS)
Shigemori, Hiroyasu; Nambu, Koji; Nagao, Ryo; Araki, Tadashi; Mizushima, Narihito; Kano, Manabu; Hasebe, Shinji
A technique for performing parameter identification in a locally weighted regression model using foresight information on the physical properties of the object of interest as constraints was proposed. This method was applied to plan view pattern control of steel plates, and a reduction of shape nonconformity (crop) at the plate head end was confirmed by computer simulation based on real operation data.
Low-flow characteristics of Virginia streams
Austin, Samuel H.; Krstolic, Jennifer L.; Wiegand, Ute
2011-01-01
Low-flow annual non-exceedance probabilities (ANEP), called probability-percent chance (P-percent chance) flow estimates, regional regression equations, and transfer methods are provided describing the low-flow characteristics of Virginia streams. Statistical methods are used to evaluate streamflow data. Analysis of Virginia streamflow data collected from 1895 through 2007 is summarized. Methods are provided for estimating low-flow characteristics of gaged and ungaged streams. The 1-, 4-, 7-, and 30-day average streamgaging station low-flow characteristics for 290 long-term, continuous-record, streamgaging stations are determined, adjusted for instances of zero flow using a conditional probability adjustment method, and presented for non-exceedance probabilities of 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.05, 0.02, 0.01, and 0.005. Stream basin characteristics computed using spatial data and a geographic information system are used as explanatory variables in regional regression equations to estimate annual non-exceedance probabilities at gaged and ungaged sites and are summarized for 290 long-term, continuous-record streamgaging stations, 136 short-term, continuous-record streamgaging stations, and 613 partial-record streamgaging stations. Regional regression equations for six physiographic regions use basin characteristics to estimate 1-, 4-, 7-, and 30-day average low-flow annual non-exceedance probabilities at gaged and ungaged sites. Weighted low-flow values that combine computed streamgaging station low-flow characteristics and annual non-exceedance probabilities from regional regression equations provide improved low-flow estimates. Regression equations developed using the Maintenance of Variance with Extension (MOVE.1) method describe the line of organic correlation (LOC) with an appropriate index site for low-flow characteristics at 136 short-term, continuous-record streamgaging stations and 613 partial-record streamgaging stations. Monthly streamflow statistics computed on the individual daily mean streamflows of selected continuous-record streamgaging stations and curves describing flow-duration are presented. Text, figures, and lists are provided summarizing low-flow estimates, selected low-flow sites, delineated physiographic regions, basin characteristics, regression equations, error estimates, definitions, and data sources. This study supersedes previous studies of low flows in Virginia.
Factors that Influence the Success of Male and Female Computer Programming Students in College
NASA Astrophysics Data System (ADS)
Clinkenbeard, Drew A.
As the demand for a technologically skilled work force grows, experience and skill in computer science have become increasingly valuable for college students. However, the number of students graduating with computer science degrees is not growing proportional to this need. Traditionally several groups are underrepresented in this field, notably women and students of color. This study investigated elements of computer science education that influence academic achievement in beginning computer programming courses. The goal of the study was to identify elements that increase success in computer programming courses. A 38-item questionnaire was developed and administered during the Spring 2016 semester at California State University Fullerton (CSUF). CSUF is an urban public university comprised of about 40,000 students. Data were collected from three beginning programming classes offered at CSUF. In total 411 questionnaires were collected resulting in a response rate of 58.63%. Data for the study were grouped into three broad categories of variables. These included academic and background variables; affective variables; and peer, mentor, and role-model variables. A conceptual model was developed to investigate how these variables might predict final course grade. Data were analyzed using statistical techniques such as linear regression, factor analysis, and path analysis. Ultimately this study found that peer interactions, comfort with computers, computer self-efficacy, self-concept, and perception of achievement were the best predictors of final course grade. In addition, the analyses showed that male students exhibited higher levels of computer self-efficacy and self-concept compared to female students, even when they achieved comparable course grades. Implications and explanations of these findings are explored, and potential policy changes are offered.
Verloigne, Maïté; Van Lippevelde, Wendy; Bere, Elling; Manios, Yannis; Kovács, Éva; Grillenberger, Monika; Maes, Lea; Brug, Johannes; De Bourdeaudhuij, Ilse
2015-09-18
The aim was to investigate which individual and family environmental factors are related to television and computer time separately in 10- to-12-year-old children within and across five European countries (Belgium, Germany, Greece, Hungary, Norway). Data were used from the ENERGY-project. Children and one of their parents completed a questionnaire, including questions on screen time behaviours and related individual and family environmental factors. Family environmental factors included social, political, economic and physical environmental factors. Complete data were obtained from 2022 child-parent dyads (53.8 % girls, mean child age 11.2 ± 0.8 years; mean parental age 40.5 ± 5.1 years). To examine the association between individual and family environmental factors (i.e. independent variables) and television/computer time (i.e. dependent variables) in each country, multilevel regression analyses were performed using MLwiN 2.22, adjusting for children's sex and age. In all countries, children reported more television and/or computer time, if children and their parents thought that the maximum recommended level for watching television and/or using the computer was higher and if children had a higher preference for television watching and/or computer use and a lower self-efficacy to control television watching and/or computer use. Most physical and economic environmental variables were not significantly associated with television or computer time. Slightly more individual factors were related to children's computer time and more parental social environmental factors to children's television time. We also found different correlates across countries: parental co-participation in television watching was significantly positively associated with children's television time in all countries, except for Greece. A higher level of parental television and computer time was only associated with a higher level of children's television and computer time in Hungary. Having rules regarding children's television time was related to less television time in all countries, except for Belgium and Norway. Most evidence was found for an association between screen time and individual and parental social environmental factors, which means that future interventions aiming to reduce screen time should focus on children's individual beliefs and habits as well parental social factors. As we identified some different correlates for television and computer time and across countries, cross-European interventions could make small adaptations per specific screen time activity and lay different emphases per country.
Stability-Constrained Aerodynamic Shape Optimization with Applications to Flying Wings
NASA Astrophysics Data System (ADS)
Mader, Charles Alexander
A set of techniques is developed that allows the incorporation of flight dynamics metrics as an additional discipline in a high-fidelity aerodynamic optimization. Specifically, techniques for including static stability constraints and handling qualities constraints in a high-fidelity aerodynamic optimization are demonstrated. These constraints are developed from stability derivative information calculated using high-fidelity computational fluid dynamics (CFD). Two techniques are explored for computing the stability derivatives from CFD. One technique uses an automatic differentiation adjoint technique (ADjoint) to efficiently and accurately compute a full set of static and dynamic stability derivatives from a single steady solution. The other technique uses a linear regression method to compute the stability derivatives from a quasi-unsteady time-spectral CFD solution, allowing for the computation of static, dynamic and transient stability derivatives. Based on the characteristics of the two methods, the time-spectral technique is selected for further development, incorporated into an optimization framework, and used to conduct stability-constrained aerodynamic optimization. This stability-constrained optimization framework is then used to conduct an optimization study of a flying wing configuration. This study shows that stability constraints have a significant impact on the optimal design of flying wings and that, while static stability constraints can often be satisfied by modifying the airfoil profiles of the wing, dynamic stability constraints can require a significant change in the planform of the aircraft in order for the constraints to be satisfied.
Instructional Advice, Time Advice and Learning Questions in Computer Simulations
ERIC Educational Resources Information Center
Rey, Gunter Daniel
2010-01-01
Undergraduate students (N = 97) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without instructional advice) x 2 (with or without time advice) x 2…
Robert Zahner; Albert R. Stage
1966-01-01
A method is described for computing daily values of moisture stress on forest vegetation, or water deficits, based on the differences between Thornthwaite's potential evapotranspiration and computed soil-moisture depletion. More realistic functions are used for soil-moisture depletion on specific soil types than have been customary. These functions relate daily...
Teychenne, Megan; Hinkley, Trina
2016-01-01
Objectives Anxiety is a serious illness and women (including mothers with young children) are at particular risk. Although physical activity (PA) may reduce anxiety risk, little research has investigated the link between sedentary behaviour and anxiety risk. The aim of this study was to examine the association between screen-based sedentary behaviour and anxiety symptoms, independent of PA, amongst mothers with young children. Methods During 2013–2014, 528 mothers with children aged 2–5 years completed self-report measures of recreational screen-based sedentary behaviour (TV/DVD/video viewing, computer/e-games/hand held device use) and anxiety symptoms (using the Hospital Anxiety and Depression Scale, HADS-A). Linear regression analyses examined the cross-sectional association between screen-based sedentary behaviour and anxiety symptoms. Results In models that adjusted for key demographic and behavioural covariates (including moderate- to vigorous-intensity PA, MVPA), computer/device use (B = 0.212; 95% CI = 0.048, 0.377) and total screen time (B = 0.109; 95% CI = 0.014, 0.205) were positively associated with heightened anxiety symptoms. TV viewing was not associated with anxiety symptoms in either model. Conclusions Higher levels of recreational computer or handheld device use and overall screen time may be linked to higher risk of anxiety symptoms in mothers with young children, independent of MVPA. Further longitudinal and intervention research is required to determine temporal associations. PMID:27191953
Risser, Dennis W.; Thompson, Ronald E.; Stuckey, Marla H.
2008-01-01
A method was developed for making estimates of long-term, mean annual ground-water recharge from streamflow data at 80 streamflow-gaging stations in Pennsylvania. The method relates mean annual base-flow yield derived from the streamflow data (as a proxy for recharge) to the climatic, geologic, hydrologic, and physiographic characteristics of the basins (basin characteristics) by use of a regression equation. Base-flow yield is the base flow of a stream divided by the drainage area of the basin, expressed in inches of water basinwide. Mean annual base-flow yield was computed for the period of available streamflow record at continuous streamflow-gaging stations by use of the computer program PART, which separates base flow from direct runoff on the streamflow hydrograph. Base flow provides a reasonable estimate of recharge for basins where streamflow is mostly unaffected by upstream regulation, diversion, or mining. Twenty-eight basin characteristics were included in the exploratory regression analysis as possible predictors of base-flow yield. Basin characteristics found to be statistically significant predictors of mean annual base-flow yield during 1971-2000 at the 95-percent confidence level were (1) mean annual precipitation, (2) average maximum daily temperature, (3) percentage of sand in the soil, (4) percentage of carbonate bedrock in the basin, and (5) stream channel slope. The equation for predicting recharge was developed using ordinary least-squares regression. The standard error of prediction for the equation on log-transformed data was 9.7 percent, and the coefficient of determination was 0.80. The equation can be used to predict long-term, mean annual recharge rates for ungaged basins, providing that the explanatory basin characteristics can be determined and that the underlying assumption is accepted that base-flow yield derived from PART is a reasonable estimate of ground-water recharge rates. For example, application of the equation for 370 hydrologic units in Pennsylvania predicted a range of ground-water recharge from about 6.0 to 22 inches per year. A map of the predicted recharge illustrates the general magnitude and variability of recharge throughout Pennsylvania.
Computational neurorehabilitation: modeling plasticity and learning to predict recovery.
Reinkensmeyer, David J; Burdet, Etienne; Casadio, Maura; Krakauer, John W; Kwakkel, Gert; Lang, Catherine E; Swinnen, Stephan P; Ward, Nick S; Schweighofer, Nicolas
2016-04-30
Despite progress in using computational approaches to inform medicine and neuroscience in the last 30 years, there have been few attempts to model the mechanisms underlying sensorimotor rehabilitation. We argue that a fundamental understanding of neurologic recovery, and as a result accurate predictions at the individual level, will be facilitated by developing computational models of the salient neural processes, including plasticity and learning systems of the brain, and integrating them into a context specific to rehabilitation. Here, we therefore discuss Computational Neurorehabilitation, a newly emerging field aimed at modeling plasticity and motor learning to understand and improve movement recovery of individuals with neurologic impairment. We first explain how the emergence of robotics and wearable sensors for rehabilitation is providing data that make development and testing of such models increasingly feasible. We then review key aspects of plasticity and motor learning that such models will incorporate. We proceed by discussing how computational neurorehabilitation models relate to the current benchmark in rehabilitation modeling - regression-based, prognostic modeling. We then critically discuss the first computational neurorehabilitation models, which have primarily focused on modeling rehabilitation of the upper extremity after stroke, and show how even simple models have produced novel ideas for future investigation. Finally, we conclude with key directions for future research, anticipating that soon we will see the emergence of mechanistic models of motor recovery that are informed by clinical imaging results and driven by the actual movement content of rehabilitation therapy as well as wearable sensor-based records of daily activity.
TV Time but Not Computer Time Is Associated with Cardiometabolic Risk in Dutch Young Adults
Altenburg, Teatske M.; de Kroon, Marlou L. A.; Renders, Carry M.; HiraSing, Remy; Chinapaw, Mai J. M.
2013-01-01
Background TV time and total sedentary time have been positively related to biomarkers of cardiometabolic risk in adults. We aim to examine the association of TV time and computer time separately with cardiometabolic biomarkers in young adults. Additionally, the mediating role of waist circumference (WC) is studied. Methods and Findings Data of 634 Dutch young adults (18–28 years; 39% male) were used. Cardiometabolic biomarkers included indicators of overweight, blood pressure, blood levels of fasting plasma insulin, cholesterol, glucose, triglycerides and a clustered cardiometabolic risk score. Linear regression analyses were used to assess the cross-sectional association of self-reported TV and computer time with cardiometabolic biomarkers, adjusting for demographic and lifestyle factors. Mediation by WC was checked using the product-of-coefficient method. TV time was significantly associated with triglycerides (B = 0.004; CI = [0.001;0.05]) and insulin (B = 0.10; CI = [0.01;0.20]). Computer time was not significantly associated with any of the cardiometabolic biomarkers. We found no evidence for WC to mediate the association of TV time or computer time with cardiometabolic biomarkers. Conclusions We found a significantly positive association of TV time with cardiometabolic biomarkers. In addition, we found no evidence for WC as a mediator of this association. Our findings suggest a need to distinguish between TV time and computer time within future guidelines for screen time. PMID:23460900
Feature selection using probabilistic prediction of support vector regression.
Yang, Jian-Bo; Ong, Chong-Jin
2011-06-01
This paper presents a new wrapper-based feature selection method for support vector regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiments show that the proposed method generally performs better than, or at least as well as, the existing methods, with notable advantage when the dataset is sparse.
Machine learning action parameters in lattice quantum chromodynamics
NASA Astrophysics Data System (ADS)
Shanahan, Phiala E.; Trewartha, Daniel; Detmold, William
2018-05-01
Numerical lattice quantum chromodynamics studies of the strong interaction are important in many aspects of particle and nuclear physics. Such studies require significant computing resources to undertake. A number of proposed methods promise improved efficiency of lattice calculations, and access to regions of parameter space that are currently computationally intractable, via multi-scale action-matching approaches that necessitate parametric regression of generated lattice datasets. The applicability of machine learning to this regression task is investigated, with deep neural networks found to provide an efficient solution even in cases where approaches such as principal component analysis fail. The high information content and complex symmetries inherent in lattice QCD datasets require custom neural network layers to be introduced and present opportunities for further development.
A new model for estimating total body water from bioelectrical resistance
NASA Technical Reports Server (NTRS)
Siconolfi, S. F.; Kear, K. T.
1992-01-01
Estimation of total body water (T) from bioelectrical resistance (R) is commonly done by stepwise regression models with height squared over R, H(exp 2)/R, age, sex, and weight (W). Polynomials of H(exp 2)/R have not been included in these models. We examined the validity of a model with third order polynomials and W. Methods: T was measured with oxygen-18 labled water in 27 subjects. R at 50 kHz was obtained from electrodes placed on the hand and foot while subjects were in the supine position. A stepwise regression equation was developed with 13 subjects (age 31.5 plus or minus 6.2 years, T 38.2 plus or minus 6.6 L, W 65.2 plus or minus 12.0 kg). Correlations, standard error of estimates and mean differences were computed between T and estimated T's from the new (N) model and other models. Evaluations were completed with the remaining 14 subjects (age 32.4 plus or minus 6.3 years, T 40.3 plus or minus 8 L, W 70.2 plus or minus 12.3 kg) and two of its subgroups (high and low) Results: A regression equation was developed from the model. The only significant mean difference was between T and one of the earlier models. Conclusion: Third order polynomials in regression models may increase the accuracy of estimating total body water. Evaluating the model with a larger population is needed.
Solving large test-day models by iteration on data and preconditioned conjugate gradient.
Lidauer, M; Strandén, I; Mäntysaari, E A; Pösö, J; Kettunen, A
1999-12-01
A preconditioned conjugate gradient method was implemented into an iteration on a program for data estimation of breeding values, and its convergence characteristics were studied. An algorithm was used as a reference in which one fixed effect was solved by Gauss-Seidel method, and other effects were solved by a second-order Jacobi method. Implementation of the preconditioned conjugate gradient required storing four vectors (size equal to number of unknowns in the mixed model equations) in random access memory and reading the data at each round of iteration. The preconditioner comprised diagonal blocks of the coefficient matrix. Comparison of algorithms was based on solutions of mixed model equations obtained by a single-trait animal model and a single-trait, random regression test-day model. Data sets for both models used milk yield records of primiparous Finnish dairy cows. Animal model data comprised 665,629 lactation milk yields and random regression test-day model data of 6,732,765 test-day milk yields. Both models included pedigree information of 1,099,622 animals. The animal model ¿random regression test-day model¿ required 122 ¿305¿ rounds of iteration to converge with the reference algorithm, but only 88 ¿149¿ were required with the preconditioned conjugate gradient. To solve the random regression test-day model with the preconditioned conjugate gradient required 237 megabytes of random access memory and took 14% of the computation time needed by the reference algorithm.
Computer use, symptoms, and quality of life.
Hayes, John R; Sheedy, James E; Stelmack, Joan A; Heaney, Catherine A
2007-08-01
To model the effects of computer use on reported visual and physical symptoms and to measure the effects upon quality of life measures. A survey of 1000 university employees (70.5% adjusted response rate) assessed visual and physical symptoms, job, physical and mental demands, ability to control/influence work, amount of work at a computer, computer work environment, relations with others at work, life and job satisfaction, and quality of life. Data were analyzed to determine whether self-reported eye symptoms are associated with perceived quality of life. The study also explored the factors that are associated with eye symptoms. Structural equation modeling and multiple regression analyses were used to assess the hypotheses. Seventy percent of the employees used some form of vision correction during computer use, 2.9% used glasses specifically prescribed for computer use, and 8% had had refractive surgery. Employees spent an average of 6 h per day at the computer. In a multiple regression framework, the latent variable eye symptoms was significantly associated with a composite quality of life variable (p = 0.02) after adjusting for job quality, job satisfaction, supervisor relations, co-worker relations, mental and physical load of the job, and job demand. Age and gender were not significantly associated with symptoms. After adjusting for age, gender, ergonomics, hours at the computer, and exercise, eye symptoms were significantly associated with physical symptoms (p < 0.001) accounting for 48% of the variance. Environmental variability at work was associated with eye symptoms and eye symptoms demonstrated a significant impact on quality of life and physical symptoms.
HOME COMPUTER USE AND THE DEVELOPMENT OF HUMAN CAPITAL*
Malamud, Ofer; Pop-Eleches, Cristian
2012-01-01
This paper uses a regression discontinuity design to estimate the effect of home computers on child and adolescent outcomes by exploiting a voucher program in Romania. Our main results indicate that home computers have both positive and negative effects on the development of human capital. Children who won a voucher to purchase a computer had significantly lower school grades but show improved computer skills. There is also some evidence that winning a voucher increased cognitive skills, as measured by Raven’s Progressive Matrices. We do not find much evidence for an effect on non-cognitive outcomes. Parental rules regarding homework and computer use attenuate the effects of computer ownership, suggesting that parental monitoring and supervision may be important mediating factors. PMID:22719135
Content analysis of antiretroviral adherence enhancing interview reports.
Kamal, Susan; Nulty, Paul; Bugnon, Olivier; Cavassini, Matthias; Schneider, Marie P
2018-05-17
To identify factors associated with low or high antiretroviral (ARV) adherence through computational text analysis of an adherence enhancing programme interview reports. Using text from 8428 interviews with 522 patients, we constructed a term-frequency matrix for each patient, retaining words that occurred at least ten times overall and used in at least six interviews with six different patients. The text included both the pharmacist's and the patient's verbalizations. We investigated their association with an adherence threshold (above or below 90%) using a regularized logistic regression model. In addition to this data-driven approach, we studied the contexts of words with a focus group. Analysis resulted in 7608 terms associated with low or high adherence. Terms associated with low adherence included disruption in daily schedule, side effects, socio-economic factors, stigma, cognitive factors and smoking. Terms associated with high adherence included fixed medication intake timing, no side effects and positive psychological state. Computational text analysis helps to analyze a large corpus of adherence enhancing interviews. It confirms main known themes affecting ARV adherence and sheds light on new emerging themes. Health care providers should be aware of factors that are associated with low or high adherence. This knowledge should reinforce the supporting factors and try to resolve the barriers together with the patient. Copyright © 2018 Elsevier B.V. All rights reserved.
Tan, Germaine Xin Yi; Jamil, Muhammad; Tee, Nicole Gui Zhen; Zhong, Liang; Yap, Choon Hwai
2015-11-01
Recent animal studies have provided evidence that prenatal blood flow fluid mechanics may play a role in the pathogenesis of congenital cardiovascular malformations. To further these researches, it is important to have an imaging technique for small animal embryos with sufficient resolution to support computational fluid dynamics studies, and that is also non-invasive and non-destructive to allow for subject-specific, longitudinal studies. In the current study, we developed such a technique, based on ultrasound biomicroscopy scans on chick embryos. Our technique included a motion cancelation algorithm to negate embryonic body motion, a temporal averaging algorithm to differentiate blood spaces from tissue spaces, and 3D reconstruction of blood volumes in the embryo. The accuracy of the reconstructed models was validated with direct stereoscopic measurements. A computational fluid dynamics simulation was performed to model fluid flow in the generated construct of a Hamburger-Hamilton (HH) stage 27 embryo. Simulation results showed that there were divergent streamlines and a low shear region at the carotid duct, which may be linked to the carotid duct's eventual regression and disappearance by HH stage 34. We show that our technique has sufficient resolution to produce accurate geometries for computational fluid dynamics simulations to quantify embryonic cardiovascular fluid mechanics.
Data-driven discovery of partial differential equations
Rudy, Samuel H.; Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan
2017-01-01
We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg–de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable. PMID:28508044
Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi
2018-04-01
Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.
Bayesian isotonic density regression
Wang, Lianming; Dunson, David B.
2011-01-01
Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distributions, and are well supported in many applications. A rich variety of Bayesian methods have been proposed for density regression, but it is not clear whether such priors have full support so that any true data-generating model can be accurately approximated. This article develops a new class of density regression models that incorporate stochastic-ordering constraints which are natural when a response tends to increase or decrease monotonely with a predictor. Theory is developed showing large support. Methods are developed for hypothesis testing, with posterior computation relying on a simple Gibbs sampler. Frequentist properties are illustrated in a simulation study, and an epidemiology application is considered. PMID:22822259
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms
NASA Astrophysics Data System (ADS)
Yadav, B.; Hatfield, K.
2017-12-01
We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
Ding, Xiuhua; Su, Shaoyong; Nandakumar, Kannabiran; Wang, Xiaoling; Fardo, David W
2014-01-01
Large-scale genetic studies are often composed of related participants, and utilizing familial relationships can be cumbersome and computationally challenging. We present an approach to efficiently handle sequencing data from complex pedigrees that incorporates information from rare variants as well as common variants. Our method employs a 2-step procedure that sequentially regresses out correlation from familial relatedness and then uses the resulting phenotypic residuals in a penalized regression framework to test for associations with variants within genetic units. The operating characteristics of this approach are detailed using simulation data based on a large, multigenerational cohort.
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Predicting pork loin intramuscular fat using computer vision system.
Liu, J-H; Sun, X; Young, J M; Bachmeier, L A; Newman, D J
2018-09-01
The objective of this study was to investigate the ability of computer vision system to predict pork intramuscular fat percentage (IMF%). Center-cut loin samples (n = 85) were trimmed of subcutaneous fat and connective tissue. Images were acquired and pixels were segregated to estimate image IMF% and 18 image color features for each image. Subjective IMF% was determined by a trained grader. Ether extract IMF% was calculated using ether extract method. Image color features and image IMF% were used as predictors for stepwise regression and support vector machine models. Results showed that subjective IMF% had a correlation of 0.81 with ether extract IMF% while the image IMF% had a 0.66 correlation with ether extract IMF%. Accuracy rates for regression models were 0.63 for stepwise and 0.75 for support vector machine. Although subjective IMF% has shown to have better prediction, results from computer vision system demonstrates the potential of being used as a tool in predicting pork IMF% in the future. Copyright © 2018 Elsevier Ltd. All rights reserved.
Techniques for estimating flood-peak discharges of rural, unregulated streams in Ohio
Koltun, G.F.; Roberts, J.W.
1990-01-01
Multiple-regression equations are presented for estimating flood-peak discharges having recurrence intervals of 2, 5, 10, 25, 50, and 100 years at ungaged sites on rural, unregulated streams in Ohio. The average standard errors of prediction for the equations range from 33.4% to 41.4%. Peak discharge estimates determined by log-Pearson Type III analysis using data collected through the 1987 water year are reported for 275 streamflow-gaging stations. Ordinary least-squares multiple-regression techniques were used to divide the State into three regions and to identify a set of basin characteristics that help explain station-to- station variation in the log-Pearson estimates. Contributing drainage area, main-channel slope, and storage area were identified as suitable explanatory variables. Generalized least-square procedures, which include historical flow data and account for differences in the variance of flows at different gaging stations, spatial correlation among gaging station records, and variable lengths of station record were used to estimate the regression parameters. Weighted peak-discharge estimates computed as a function of the log-Pearson Type III and regression estimates are reported for each station. A method is provided to adjust regression estimates for ungaged sites by use of weighted and regression estimates for a gaged site located on the same stream. Limitations and shortcomings cited in an earlier report on the magnitude and frequency of floods in Ohio are addressed in this study. Geographic bias is no longer evident for the Maumee River basin of northwestern Ohio. No bias is found to be associated with the forested-area characteristic for the range used in the regression analysis (0.0 to 99.0%), nor is this characteristic significant in explaining peak discharges. Surface-mined area likewise is not significant in explaining peak discharges, and the regression equations are not biased when applied to basins having approximately 30% or less surface-mined area. Analyses of residuals indicate that the equations tend to overestimate flood-peak discharges for basins having approximately 30% or more surface-mined area. (USGS)
Schuurmann, Richte C L; van Noort, Kim; Overeem, Simon P; Ouriel, Kenneth; Jordan, William D; Muhs, Bart E; 't Mannetje, Yannick; Reijnen, Michel; Fioole, Bram; Ünlü, Çağdaş; Brummel, Peter; de Vries, Jean-Paul P M
2017-06-01
To evaluate the association between aortic curvature and other preoperative anatomical characteristics and late (>1 year) type Ia endoleak and endograft migration in endovascular aneurysm repair (EVAR) patients. Eight high-volume EVAR centers contributed 116 EVAR patients (mean age 81±7 years; 103 men) to the study: 36 patients (mean age 82±7 years; 31 men) with endograft migration and/or type Ia endoleak diagnosed >1 year after the initial EVAR and 80 controls without early or late complications. Aortic curvature was calculated from the preoperative computed tomography scan as the maximum and average curvature over 5 predefined aortic segments: the entire infrarenal aortic neck, aneurysm sac, and the suprarenal, juxtarenal, and infrarenal aorta. Other morphological characteristics included neck length, neck diameter, mural neck calcification and thrombus, suprarenal and infrarenal angulation, and largest aneurysm sac diameter. Independent risk factors were identified using backward stepwise logistic regression. Relevant cutoff values for each of the variables in the final regression model were determined with the receiver operator characteristic curve. Logistic regression identified maximum curvature over the length of the aneurysm sac (>47 m -1 ; p=0.023), largest aneurysm sac diameter (>56 mm; p=0.028), and mural neck thrombus (>11° circumference; p<0.001) as independent predictors of late migration and type Ia endoleak. Aortic curvature is a predictor for late type Ia endoleak and endograft migration after EVAR. These findings suggest that aortic curvature is a better parameter than angulation to predict post-EVAR failure and should be included as a hostile neck parameter.
James R. Wallis
1965-01-01
Written in Fortran IV and MAP, this computer program can handle up to 120 variables, and retain 40 principal components. It can perform simultaneous regression of up to 40 criterion variables upon the varimax rotated factor weight matrix. The columns and rows of all output matrices are labeled by six-character alphanumeric names. Data input can be from punch cards or...
NASA Astrophysics Data System (ADS)
Tang, Kunkun; Congedo, Pietro M.; Abgrall, Rémi
2016-06-01
The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable for real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.
Kenney, Erica L; Gortmaker, Steven L
2017-03-01
To quantify the relationships between youth use of television (TV) and other screen devices, including smartphones and tablets, and obesity risk factors. TV and other screen device use, including smartphones, tablets, computers, and/or videogames, was self-reported by a nationally representative, cross-sectional sample of 24 800 US high school students (2013-2015 Youth Risk Behavior Surveys). Students also reported on health behaviors including sugar-sweetened beverage (SSB) intake, physical activity, sleep, and weight and height. Sex-stratified logistic regression models, adjusting for the sampling design, estimated associations between TV and other screen device use and SSB intake, physical activity, sleep, and obesity. Approximately 20% of participants used other screen devices for ≥5 hours daily. Watching TV ≥5 hours daily was associated with daily SSB consumption (aOR = 2.72, 95% CI: 2.23, 3.32) and obesity (aOR = 1.78, 95% CI: 1.40, 2.27). Using other screen devices ≥5 hours daily was associated with daily SSB consumption (aOR = 1.98, 95% CI: 1.69, 2.32), inadequate physical activity (aOR = 1.94, 95% CI: 1.69, 2.25), and inadequate sleep (aOR = 1.79, 95% CI: 1.54, 2.08). Using smartphones, tablets, computers, and videogames is associated with several obesity risk factors. Although further study is needed, families should be encouraged to limit both TV viewing and newer screen devices. Copyright © 2016 Elsevier Inc. All rights reserved.
McCarthy, Peter M.
2006-01-01
The Yellowstone River is very important in a variety of ways to the residents of southeastern Montana; however, it is especially vulnerable to spilled contaminants. In 2004, the U.S. Geological Survey, in cooperation with Montana Department of Environmental Quality, initiated a study to develop a computer program to rapidly estimate instream travel times and concentrations of a potential contaminant in the Yellowstone River using regression equations developed in 1999 by the U.S. Geological Survey. The purpose of this report is to describe these equations and their limitations, describe the development of a computer program to apply the equations to the Yellowstone River, and provide detailed instructions on how to use the program. This program is available online at [http://pubs.water.usgs.gov/sir2006-5057/includes/ytot.xls]. The regression equations provide estimates of instream travel times and concentrations in rivers where little or no contaminant-transport data are available. Equations were developed and presented for the most probable flow velocity and the maximum probable flow velocity. These velocity estimates can then be used to calculate instream travel times and concentrations of a potential contaminant. The computer program was developed so estimation equations for instream travel times and concentrations can be solved quickly for sites along the Yellowstone River between Corwin Springs and Sidney, Montana. The basic types of data needed to run the program are spill data, streamflow data, and data for locations of interest along the Yellowstone River. Data output from the program includes spill location, river mileage at specified locations, instantaneous discharge, mean-annual discharge, drainage area, and channel slope. Travel times and concentrations are provided for estimates of the most probable velocity of the peak concentration and the maximum probable velocity of the peak concentration. Verification of estimates of instream travel times and concentrations for the Yellowstone River requires information about the flow velocity throughout the 520 mi of river in the study area. Dye-tracer studies would provide the best data about flow velocities and would provide the best verification of instream travel times and concentrations estimated from this computer program; however, data from such studies does not currently (2006) exist and new studies would be expensive and time-consuming. An alternative approach used in this study for verification of instream travel times is based on the use of flood-wave velocities determined from recorded streamflow hydrographs at selected mainstem streamflow-gaging stations along the Yellowstone River. The ratios of flood-wave velocity to the most probable velocity for the base flow estimated from the computer program are within the accepted range of 2.5 to 4.0 and indicate that flow velocities estimated from the computer program are reasonable for the Yellowstone River. The ratios of flood-wave velocity to the maximum probable velocity are within a range of 1.9 to 2.8 and indicate that the maximum probable flow velocities estimated from the computer program, which corresponds to the shortest travel times and maximum probable concentrations, are conservative and reasonable for the Yellowstone River.
Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach.
Stollenwerk, Björn; Welchowski, Thomas; Vogl, Matthias; Stock, Stephanie
2016-04-01
Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.
ERIC Educational Resources Information Center
Wendt, Jillian L.; Nisbet, Deanna L.
2017-01-01
This study examined the predictive relationship among international students' sense of community, perceived learning, and end-of-course grades in computer-mediated, U.S. graduate-level courses. The community of inquiry (CoI) framework served as the theoretical foundation for the study. Step-wise hierarchical multiple regression showed no…
Approximate l-fold cross-validation with Least Squares SVM and Kernel Ridge Regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, Richard E; Zhang, Hao; Parker, Lynne Edwards
2013-01-01
Kernel methods have difficulties scaling to large modern data sets. The scalability issues are based on computational and memory requirements for working with a large matrix. These requirements have been addressed over the years by using low-rank kernel approximations or by improving the solvers scalability. However, Least Squares Support VectorMachines (LS-SVM), a popular SVM variant, and Kernel Ridge Regression still have several scalability issues. In particular, the O(n^3) computational complexity for solving a single model, and the overall computational complexity associated with tuning hyperparameters are still major problems. We address these problems by introducing an O(n log n) approximate l-foldmore » cross-validation method that uses a multi-level circulant matrix to approximate the kernel. In addition, we prove our algorithm s computational complexity and present empirical runtimes on data sets with approximately 1 million data points. We also validate our approximate method s effectiveness at selecting hyperparameters on real world and standard benchmark data sets. Lastly, we provide experimental results on using a multi-level circulant kernel approximation to solve LS-SVM problems with hyperparameters selected using our method.« less
Häberle, Lothar; Hack, Carolin C; Heusinger, Katharina; Wagner, Florian; Jud, Sebastian M; Uder, Michael; Beckmann, Matthias W; Schulz-Wendtland, Rüdiger; Wittenberg, Thomas; Fasching, Peter A
2017-08-30
Tumors in radiologically dense breast were overlooked on mammograms more often than tumors in low-density breasts. A fast reproducible and automated method of assessing percentage mammographic density (PMD) would be desirable to support decisions whether ultrasonography should be provided for women in addition to mammography in diagnostic mammography units. PMD assessment has still not been included in clinical routine work, as there are issues of interobserver variability and the procedure is quite time consuming. This study investigated whether fully automatically generated texture features of mammograms can replace time-consuming semi-automatic PMD assessment to predict a patient's risk of having an invasive breast tumor that is visible on ultrasound but masked on mammography (mammography failure). This observational study included 1334 women with invasive breast cancer treated at a hospital-based diagnostic mammography unit. Ultrasound was available for the entire cohort as part of routine diagnosis. Computer-based threshold PMD assessments ("observed PMD") were carried out and 363 texture features were obtained from each mammogram. Several variable selection and regression techniques (univariate selection, lasso, boosting, random forest) were applied to predict PMD from the texture features. The predicted PMD values were each used as new predictor for masking in logistic regression models together with clinical predictors. These four logistic regression models with predicted PMD were compared among themselves and with a logistic regression model with observed PMD. The most accurate masking prediction was determined by cross-validation. About 120 of the 363 texture features were selected for predicting PMD. Density predictions with boosting were the best substitute for observed PMD to predict masking. Overall, the corresponding logistic regression model performed better (cross-validated AUC, 0.747) than one without mammographic density (0.734), but less well than the one with the observed PMD (0.753). However, in patients with an assigned mammography failure risk >10%, covering about half of all masked tumors, the boosting-based model performed at least as accurately as the original PMD model. Automatically generated texture features can replace semi-automatically determined PMD in a prediction model for mammography failure, such that more than 50% of masked tumors could be discovered.
Schenk, Liam N.; Anderson, Chauncey W.; Diaz, Paul; Stewart, Marc A.
2016-12-22
Executive SummarySuspended-sediment and total phosphorus loads were computed for two sites in the Upper Klamath Basin on the Wood and Williamson Rivers, the two main tributaries to Upper Klamath Lake. High temporal resolution turbidity and acoustic backscatter data were used to develop surrogate regression models to compute instantaneous concentrations and loads on these rivers. Regression models for the Williamson River site showed strong correlations of turbidity with total phosphorus and suspended-sediment concentrations (adjusted coefficients of determination [Adj R2]=0.73 and 0.95, respectively). Regression models for the Wood River site had relatively poor, although statistically significant, relations of turbidity with total phosphorus, and turbidity and acoustic backscatter with suspended sediment concentration, with high prediction uncertainty. Total phosphorus loads for the partial 2014 water year (excluding October and November 2013) were 39 and 28 metric tons for the Williamson and Wood Rivers, respectively. These values are within the low range of phosphorus loads computed for these rivers from prior studies using water-quality data collected by the Klamath Tribes. The 2014 partial year total phosphorus loads on the Williamson and Wood Rivers are assumed to be biased low because of the absence of data from the first 2 months of water year 2014, and the drought conditions that were prevalent during that water year. Therefore, total phosphorus and suspended-sediment loads in this report should be considered as representative of a low-water year for the two study sites. Comparing loads from the Williamson and Wood River monitoring sites for November 2013–September 2014 shows that the Williamson and Sprague Rivers combined, as measured at the Williamson River site, contributed substantially more suspended sediment to Upper Klamath Lake than the Wood River, with 4,360 and 1,450 metric tons measured, respectively.Surrogate techniques have proven useful at the two study sites, particularly in using turbidity to compute suspended-sediment concentrations in the Williamson River. This proof-of-concept effort for computing total phosphorus concentrations using turbidity at the Williamson and Wood River sites also has shown that with additional samples over a wide range of flow regimes, high-temporal-resolution total phosphorus loads can be estimated on a daily, monthly, and annual basis, along with uncertainties for total phosphorus and suspended-sediment concentrations computed using regression models. Sediment-corrected backscatter at the Wood River has potential for estimating suspended-sediment loads from the Wood River Valley as well, with additional analysis of the variable streamflow measured at that site. Suspended-sediment and total phosphorus loads with a high level of temporal resolution will be useful to water managers, restoration practitioners, and scientists in the Upper Klamath Basin working toward the common goal of decreasing nutrient and sediment loads in Upper Klamath Lake.
Gotvald, Anthony J.; Barth, Nancy A.; Veilleux, Andrea G.; Parrett, Charles
2012-01-01
Methods for estimating the magnitude and frequency of floods in California that are not substantially affected by regulation or diversions have been updated. Annual peak-flow data through water year 2006 were analyzed for 771 streamflow-gaging stations (streamgages) in California having 10 or more years of data. Flood-frequency estimates were computed for the streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to logarithms of annual peak flows for each streamgage. Low-outlier and historic information were incorporated into the flood-frequency analysis, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low outliers. Special methods for fitting the distribution were developed for streamgages in the desert region in southeastern California. Additionally, basin characteristics for the streamgages were computed by using a geographical information system. Regional regression analysis, using generalized least squares regression, was used to develop a set of equations for estimating flows with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities for ungaged basins in California that are outside of the southeastern desert region. Flood-frequency estimates and basin characteristics for 630 streamgages were combined to form the final database used in the regional regression analysis. Five hydrologic regions were developed for the area of California outside of the desert region. The final regional regression equations are functions of drainage area and mean annual precipitation for four of the five regions. In one region, the Sierra Nevada region, the final equations are functions of drainage area, mean basin elevation, and mean annual precipitation. Average standard errors of prediction for the regression equations in all five regions range from 42.7 to 161.9 percent. For the desert region of California, an analysis of 33 streamgages was used to develop regional estimates of all three parameters (mean, standard deviation, and skew) of the log-Pearson Type III distribution. The regional estimates were then used to develop a set of equations for estimating flows with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities for ungaged basins. The final regional regression equations are functions of drainage area. Average standard errors of prediction for these regression equations range from 214.2 to 856.2 percent. Annual peak-flow data through water year 2006 were analyzed for eight streamgages in California having 10 or more years of data considered to be affected by urbanization. Flood-frequency estimates were computed for the urban streamgages by fitting a Pearson Type III distribution to logarithms of annual peak flows for each streamgage. Regression analysis could not be used to develop flood-frequency estimation equations for urban streams because of the limited number of sites. Flood-frequency estimates for the eight urban sites were graphically compared to flood-frequency estimates for 630 non-urban sites. The regression equations developed from this study will be incorporated into the U.S. Geological Survey (USGS) StreamStats program. The StreamStats program is a Web-based application that provides streamflow statistics and basin characteristics for USGS streamgages and ungaged sites of interest. StreamStats can also compute basin characteristics and provide estimates of streamflow statistics for ungaged sites when users select the location of a site along any stream in California.
Ranasinghe, Priyanga; Wickramasinghe, Sashimali A; Pieris, Wa Rasanga; Karunathilake, Indika; Constantine, Godwin R
2012-09-14
The use of computer assisted learning (CAL) has enhanced undergraduate medical education. CAL improves performance at examinations, develops problem solving skills and increases student satisfaction. The study evaluates computer literacy among first year medical students in Sri Lanka. The study was conducted at Faculty of Medicine, University of Colombo, Sri Lanka between August-September 2008. First year medical students (n = 190) were invited for the study. Data on computer literacy and associated factors were collected by an expert-validated pre-tested self-administered questionnaire. Computer literacy was evaluated by testing knowledge on 6 domains; common software packages, operating systems, database management and the usage of internet and E-mail. A linear regression was conducted using total score for computer literacy as the continuous dependant variable and other independent covariates. Sample size-181 (Response rate-95.3%), 49.7% were Males. Majority of the students (77.3%) owned a computer (Males-74.4%, Females-80.2%). Students have gained their present computer knowledge by; a formal training programme (64.1%), self learning (63.0%) or by peer learning (49.2%). The students used computers for predominately; word processing (95.6%), entertainment (95.0%), web browsing (80.1%) and preparing presentations (76.8%). Majority of the students (75.7%) expressed their willingness for a formal computer training programme at the faculty.Mean score for the computer literacy questionnaire was 48.4 ± 20.3, with no significant gender difference (Males-47.8 ± 21.1, Females-48.9 ± 19.6). There were 47.9% students that had a score less than 50% for the computer literacy questionnaire. Students from Colombo district, Western Province and Student owning a computer had a significantly higher mean score in comparison to other students (p < 0.001). In the linear regression analysis, formal computer training was the strongest predictor of computer literacy (β = 13.034), followed by using internet facility, being from Western province, using computers for Web browsing and computer programming, computer ownership and doing IT (Information Technology) as a subject in GCE (A/L) examination. Sri Lankan medical undergraduates had a low-intermediate level of computer literacy. There is a need to improve computer literacy, by increasing computer training in schools, or by introducing computer training in the initial stages of the undergraduate programme. These two options require improvement in infrastructure and other resources.
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
Methodology for Estimation of Flood Magnitude and Frequency for New Jersey Streams
Watson, Kara M.; Schopp, Robert D.
2009-01-01
Methodologies were developed for estimating flood magnitudes at the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence intervals for unregulated or slightly regulated streams in New Jersey. Regression equations that incorporate basin characteristics were developed to estimate flood magnitude and frequency for streams throughout the State by use of a generalized least squares regression analysis. Relations between flood-frequency estimates based on streamflow-gaging-station discharge and basin characteristics were determined by multiple regression analysis, and weighted by effective years of record. The State was divided into five hydrologically similar regions to refine the regression equations. The regression analysis indicated that flood discharge, as determined by the streamflow-gaging-station annual peak flows, is related to the drainage area, main channel slope, percentage of lake and wetland areas in the basin, population density, and the flood-frequency region, at the 95-percent confidence level. The standard errors of estimate for the various recurrence-interval floods ranged from 48.1 to 62.7 percent. Annual-maximum peak flows observed at streamflow-gaging stations through water year 2007 and basin characteristics determined using geographic information system techniques for 254 streamflow-gaging stations were used for the regression analysis. Drainage areas of the streamflow-gaging stations range from 0.18 to 779 mi2. Peak-flow data and basin characteristics for 191 streamflow-gaging stations located in New Jersey were used, along with peak-flow data for stations located in adjoining States, including 25 stations in Pennsylvania, 17 stations in New York, 16 stations in Delaware, and 5 stations in Maryland. Streamflow records for selected stations outside of New Jersey were included in the present study because hydrologic, physiographic, and geologic boundaries commonly extend beyond political boundaries. The StreamStats web application was developed cooperatively by the U.S. Geological Survey and the Environmental Systems Research Institute, Inc., and was designed for national implementation. This web application has been recently implemented for use in New Jersey. This program used in conjunction with a geographic information system provides the computation of values for selected basin characteristics, estimates of flood magnitudes and frequencies, and statistics for stream locations in New Jersey chosen by the user, whether the site is gaged or ungaged.
Repeated Kicking Actions in Karate: Effect on Technical Execution in Elite Practitioners.
Quinzi, Federico; Camomilla, Valentina; Di Mario, Alberto; Felici, Francesco; Sbriccoli, Paola
2016-04-01
Training in martial arts is commonly performed by repeating a technical action continuously for a given number of times. This study aimed to investigate if the repetition of the task alters the proper technical execution, limiting the training efficacy for the technical evaluation during competition. This aim was pursued analyzing lower-limb kinematics and muscle activation during repeated roundhouse kicks. Six junior karate practitioners performed continuously 20 repetitions of the kick. Hip and knee kinematics and sEMG of vastus lateralis, biceps (BF), and rectus femoris were recorded. For each repetition, hip abduction-adduction and flexion-extension and knee flexion-extension peak angular displacements and velocities, agonist and antagonist muscle activation were computed. Moreover, to monitor for the presence of myoelectric fatigue, if any, the median frequency of the sEMG was computed. All variables were normalized with respect to their individual maximum observed during the sequence of kicks. Linear regressions were fitted to each normalized parameter to test its relationship with the repetition number. Linear-regression analysis showed that, during the sequence, the athletes modified their technique: Knee flexion, BF median frequency, hip abduction, knee-extension angular velocity, and BF antagonist activation significantly decreased. Conversely, hip flexion increased significantly. Since karate combat competitions require proper technical execution, training protocols combining severe fatigue and technical actions should be carefully proposed because of technique adaptations. Moreover, trainers and karate masters should consider including specific strength exercises for the BF and more generally for knee flexors.
Measurement of left ventricular mass in vivo using gated nuclear magnetic resonance imaging.
Florentine, M S; Grosskreutz, C L; Chang, W; Hartnett, J A; Dunn, V D; Ehrhardt, J C; Fleagle, S R; Collins, S M; Marcus, M L; Skorton, D J
1986-07-01
Alterations of left ventricular mass occur in a variety of congenital and acquired heart diseases. In vivo determination of left ventricular mass, using several different techniques, has been previously reported. Problems inherent in some previous methods include the use of ionizing radiation, complicated geometric assumptions and invasive techniques. We tested the ability of gated nuclear magnetic resonance imaging to determine in vivo left ventricular mass in animals. By studying both dogs (n = 9) and cats (n = 2) of various sizes, a broad range of left ventricular mass (7 to 133 g) was examined. With a 0.5 tesla superconducting nuclear magnetic resonance imaging system the left ventricle was imaged in the transaxial plane and multiple adjacent 10 mm thick slices were obtained. Endocardial and epicardial edges were manually traced in each computer-displayed image. The wall area of each image was determined by computer and the areas were summed and multiplied by the slice thickness and the specific gravity of muscle, providing calculated left ventricular mass. Calculated left ventricular mass was compared with actual postmortem left ventricular mass using linear regression analysis. An excellent relation between calculated and actual mass was found (r = 0.95; SEE = 13.1 g; regression equation: magnetic resonance mass = 0.95 X actual mass + 14.8 g). Intraobserver and interobserver reproducibility were also excellent (r = 0.99). Thus, gated nuclear magnetic resonance imaging can accurately determine in vivo left ventricular mass in anesthetized animals.
An efficient surrogate-based simulation-optimization method for calibrating a regional MODFLOW model
NASA Astrophysics Data System (ADS)
Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.
2017-01-01
Simulation-optimization method entails a large number of model simulations, which is computationally intensive or even prohibitive if the model simulation is extremely time-consuming. Statistical models have been examined as a surrogate of the high-fidelity physical model during simulation-optimization process to tackle this problem. Among them, Multivariate Adaptive Regression Splines (MARS), a non-parametric adaptive regression method, is superior in overcoming problems of high-dimensions and discontinuities of the data. Furthermore, the stability and accuracy of MARS model can be improved by bootstrap aggregating methods, namely, bagging. In this paper, Bagging MARS (BMARS) method is integrated to a surrogate-based simulation-optimization framework to calibrate a three-dimensional MODFLOW model, which is developed to simulate the groundwater flow in an arid hardrock-alluvium region in northwestern Oman. The physical MODFLOW model is surrogated by the statistical model developed using BMARS algorithm. The surrogate model, which is fitted and validated using training dataset generated by the physical model, can approximate solutions rapidly. An efficient Sobol' method is employed to calculate global sensitivities of head outputs to input parameters, which are used to analyze their importance for the model outputs spatiotemporally. Only sensitive parameters are included in the calibration process to further improve the computational efficiency. Normalized root mean square error (NRMSE) between measured and simulated heads at observation wells is used as the objective function to be minimized during optimization. The reasonable history match between the simulated and observed heads demonstrated feasibility of this high-efficient calibration framework.
Online Statistical Modeling (Regression Analysis) for Independent Responses
NASA Astrophysics Data System (ADS)
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
A proposed method to detect kinematic differences between and within individuals.
Frost, David M; Beach, Tyson A C; McGill, Stuart M; Callaghan, Jack P
2015-06-01
The primary objective was to examine the utility of a novel method of detecting "actual" kinematic changes using the within-subject variation. Twenty firefighters were assigned to one of two groups (lifting or firefighting). Participants performed 25 repetitions of two lifting or firefighting tasks, in three sessions. The magnitude and within-subject variation of several discrete kinematic measures were computed. Sequential averages of each variable were used to derive a cubic, quadratic and linear regression equation. The efficacy of each equation was examined by contrasting participants' sequential means to their 25-trial mean±1SD and 2SD. The magnitude and within-subject variation of each dependent measure was repeatable for all tasks; however, each participant did not exhibit the same movement patterns as the group. The number of instances across all variables, tasks and testing sessions whereby the 25-trial mean±1SD was contained within the boundaries established by the regression equations increased as the aggregate scores included more trials. Each equation achieved success in at least 88% of all instances when three trials were included in the sequential mean (95% with five trials). The within-subject variation may offer a means to examine participant-specific changes without having to collect a large number of trials. Copyright © 2015 Elsevier Ltd. All rights reserved.
Rapp, Jennifer L.; Reilly, Pamela A.
2017-11-14
BackgroundThe U.S. Geological Survey (USGS), in cooperation with the Virginia Department of Environmental Quality (DEQ), reviewed a previously compiled set of linear regression models to assess their utility in defining the response of the aquatic biological community to streamflow depletion.As part of the 2012 Virginia Healthy Watersheds Initiative (HWI) study conducted by Tetra Tech, Inc., for the U.S. Environmental Protection Agency (EPA) and Virginia DEQ, a database with computed values of 72 hydrologic metrics, or indicators of hydrologic alteration (IHA), 37 fish metrics, and 64 benthic invertebrate metrics was compiled and quality assured. Hydrologic alteration was represented by simulation of streamflow record for a pre-water-withdrawal condition (baseline) without dams or developed land, compared to the simulated recent-flow condition (2008 withdrawal simulation) including dams and altered landscape to calculate a percent alteration of flow. Biological samples representing the existing populations represent a range of alteration in the biological community today.For this study, all 72 IHA metrics, which included more than 7,272 linear regression models, were considered. This extensive dataset provided the opportunity for hypothesis testing and prioritization of flow-ecology relations that have the potential to explain the effect(s) of hydrologic alteration on biological metrics in Virginia streams.
Random forest regression for magnetic resonance image synthesis.
Jog, Amod; Carass, Aaron; Roy, Snehashis; Pham, Dzung L; Prince, Jerry L
2017-01-01
By choosing different pulse sequences and their parameters, magnetic resonance imaging (MRI) can generate a large variety of tissue contrasts. This very flexibility, however, can yield inconsistencies with MRI acquisitions across datasets or scanning sessions that can in turn cause inconsistent automated image analysis. Although image synthesis of MR images has been shown to be helpful in addressing this problem, an inability to synthesize both T 2 -weighted brain images that include the skull and FLuid Attenuated Inversion Recovery (FLAIR) images has been reported. The method described herein, called REPLICA, addresses these limitations. REPLICA is a supervised random forest image synthesis approach that learns a nonlinear regression to predict intensities of alternate tissue contrasts given specific input tissue contrasts. Experimental results include direct image comparisons between synthetic and real images, results from image analysis tasks on both synthetic and real images, and comparison against other state-of-the-art image synthesis methods. REPLICA is computationally fast, and is shown to be comparable to other methods on tasks they are able to perform. Additionally REPLICA has the capability to synthesize both T 2 -weighted images of the full head and FLAIR images, and perform intensity standardization between different imaging datasets. Copyright © 2016 Elsevier B.V. All rights reserved.
Grimby-Ekman, Anna; Andersson, Eva M; Hagberg, Mats
2009-06-19
In the literature there are discussions on the choice of outcome and the need for more longitudinal studies of musculoskeletal disorders. The general aim of this longitudinal study was to analyze musculoskeletal neck pain, in a group of young adults. Specific aims were to determine whether psychosocial factors, computer use, high work/study demands, and lifestyle are long-term or short-term factors for musculoskeletal neck pain, and whether these factors are important for developing or ongoing musculoskeletal neck pain. Three regression models were used to analyze the different outcomes. Pain at present was analyzed with a marginal logistic model, for number of years with pain a Poisson regression model was used and for developing and ongoing pain a logistic model was used. Presented results are odds ratios and proportion ratios (logistic models) and rate ratios (Poisson model). The material consisted of web-based questionnaires answered by 1204 Swedish university students from a prospective cohort recruited in 2002. Perceived stress was a risk factor for pain at present (PR = 1.6), for developing pain (PR = 1.7) and for number of years with pain (RR = 1.3). High work/study demands was associated with pain at present (PR = 1.6); and with number of years with pain when the demands negatively affect home life (RR = 1.3). Computer use pattern (number of times/week with a computer session > or = 4 h, without break) was a risk factor for developing pain (PR = 1.7), but also associated with pain at present (PR = 1.4) and number of years with pain (RR = 1.2). Among life style factors smoking (PR = 1.8) was found to be associated to pain at present. The difference between men and women in prevalence of musculoskeletal pain was confirmed in this study. It was smallest for the outcome ongoing pain (PR = 1.4) compared to pain at present (PR = 2.4) and developing pain (PR = 2.5). By using different regression models different aspects of neck pain pattern could be addressed and the risk factors impact on pain pattern was identified. Short-term risk factors were perceived stress, high work/study demands and computer use pattern (break pattern). Those were also long-term risk factors. For developing pain perceived stress and computer use pattern were risk factors.
Grimby-Ekman, Anna; Andersson, Eva M; Hagberg, Mats
2009-01-01
Background In the literature there are discussions on the choice of outcome and the need for more longitudinal studies of musculoskeletal disorders. The general aim of this longitudinal study was to analyze musculoskeletal neck pain, in a group of young adults. Specific aims were to determine whether psychosocial factors, computer use, high work/study demands, and lifestyle are long-term or short-term factors for musculoskeletal neck pain, and whether these factors are important for developing or ongoing musculoskeletal neck pain. Methods Three regression models were used to analyze the different outcomes. Pain at present was analyzed with a marginal logistic model, for number of years with pain a Poisson regression model was used and for developing and ongoing pain a logistic model was used. Presented results are odds ratios and proportion ratios (logistic models) and rate ratios (Poisson model). The material consisted of web-based questionnaires answered by 1204 Swedish university students from a prospective cohort recruited in 2002. Results Perceived stress was a risk factor for pain at present (PR = 1.6), for developing pain (PR = 1.7) and for number of years with pain (RR = 1.3). High work/study demands was associated with pain at present (PR = 1.6); and with number of years with pain when the demands negatively affect home life (RR = 1.3). Computer use pattern (number of times/week with a computer session ≥ 4 h, without break) was a risk factor for developing pain (PR = 1.7), but also associated with pain at present (PR = 1.4) and number of years with pain (RR = 1.2). Among life style factors smoking (PR = 1.8) was found to be associated to pain at present. The difference between men and women in prevalence of musculoskeletal pain was confirmed in this study. It was smallest for the outcome ongoing pain (PR = 1.4) compared to pain at present (PR = 2.4) and developing pain (PR = 2.5). Conclusion By using different regression models different aspects of neck pain pattern could be addressed and the risk factors impact on pain pattern was identified. Short-term risk factors were perceived stress, high work/study demands and computer use pattern (break pattern). Those were also long-term risk factors. For developing pain perceived stress and computer use pattern were risk factors. PMID:19545386
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Wei-Chen; Maitra, Ranjan
2011-01-01
We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithmmore » (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.« less
Clifford support vector machines for classification, regression, and recurrence.
Bayro-Corrochano, Eduardo Jose; Arana-Daniel, Nancy
2010-11-01
This paper introduces the Clifford support vector machines (CSVM) as a generalization of the real and complex-valued support vector machines using the Clifford geometric algebra. In this framework, we handle the design of kernels involving the Clifford or geometric product. In this approach, one redefines the optimization variables as multivectors. This allows us to have a multivector as output. Therefore, we can represent multiple classes according to the dimension of the geometric algebra in which we work. We show that one can apply CSVM for classification and regression and also to build a recurrent CSVM. The CSVM is an attractive approach for the multiple input multiple output processing of high-dimensional geometric entities. We carried out comparisons between CSVM and the current approaches to solve multiclass classification and regression. We also study the performance of the recurrent CSVM with experiments involving time series. The authors believe that this paper can be of great use for researchers and practitioners interested in multiclass hypercomplex computing, particularly for applications in complex and quaternion signal and image processing, satellite control, neurocomputation, pattern recognition, computer vision, augmented virtual reality, robotics, and humanoids.
Wang, Shuang; Zhang, Yuchen; Dai, Wenrui; Lauter, Kristin; Kim, Miran; Tang, Yuzhe; Xiong, Hongkai; Jiang, Xiaoqian
2016-01-01
Motivation: Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individual’s privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. Results: We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e. evaluate P-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease datasets. Availability and implementation: Download HEALER at http://research.ucsd-dbmi.org/HEALER/ Contact: shw070@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26446135
Machine learning action parameters in lattice quantum chromodynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shanahan, Phiala; Trewartha, Daneil; Detmold, William
Numerical lattice quantum chromodynamics studies of the strong interaction underpin theoretical understanding of many aspects of particle and nuclear physics. Such studies require significant computing resources to undertake. A number of proposed methods promise improved efficiency of lattice calculations, and access to regions of parameter space that are currently computationally intractable, via multi-scale action-matching approaches that necessitate parametric regression of generated lattice datasets. The applicability of machine learning to this regression task is investigated, with deep neural networks found to provide an efficient solution even in cases where approaches such as principal component analysis fail. Finally, the high information contentmore » and complex symmetries inherent in lattice QCD datasets require custom neural network layers to be introduced and present opportunities for further development.« less
Machine learning action parameters in lattice quantum chromodynamics
Shanahan, Phiala; Trewartha, Daneil; Detmold, William
2018-05-16
Numerical lattice quantum chromodynamics studies of the strong interaction underpin theoretical understanding of many aspects of particle and nuclear physics. Such studies require significant computing resources to undertake. A number of proposed methods promise improved efficiency of lattice calculations, and access to regions of parameter space that are currently computationally intractable, via multi-scale action-matching approaches that necessitate parametric regression of generated lattice datasets. The applicability of machine learning to this regression task is investigated, with deep neural networks found to provide an efficient solution even in cases where approaches such as principal component analysis fail. Finally, the high information contentmore » and complex symmetries inherent in lattice QCD datasets require custom neural network layers to be introduced and present opportunities for further development.« less
Pease, J M; Morselli, M F
1987-01-01
This paper deals with a computer program adapted to a statistical method for analyzing an unlimited quantity of binary recorded data of an independent circular variable (e.g. wind direction), and a linear variable (e.g. maple sap flow volume). Circular variables cannot be statistically analyzed with linear methods, unless they have been transformed. The program calculates a critical quantity, the acrophase angle (PHI, phi o). The technique is adapted from original mathematics [1] and is written in Fortran 77 for easier conversion between computer networks. Correlation analysis can be performed following the program or regression which, because of the circular nature of the independent variable, becomes periodic regression. The technique was tested on a file of approximately 4050 data pairs.
Quantile regression for the statistical analysis of immunological data with many non-detects.
Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth
2012-07-07
Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach
NASA Astrophysics Data System (ADS)
Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew
2017-05-01
This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Thompson, Ronald E.; Hoffman, Scott A.
2006-01-01
A suite of 28 streamflow statistics, ranging from extreme low to high flows, was computed for 17 continuous-record streamflow-gaging stations and predicted for 20 partial-record stations in Monroe County and contiguous counties in north-eastern Pennsylvania. The predicted statistics for the partial-record stations were based on regression analyses relating inter-mittent flow measurements made at the partial-record stations indexed to concurrent daily mean flows at continuous-record stations during base-flow conditions. The same statistics also were predicted for 134 ungaged stream locations in Monroe County on the basis of regression analyses relating the statistics to GIS-determined basin characteristics for the continuous-record station drainage areas. The prediction methodology for developing the regression equations used to estimate statistics was developed for estimating low-flow frequencies. This study and a companion study found that the methodology also has application potential for predicting intermediate- and high-flow statistics. The statistics included mean monthly flows, mean annual flow, 7-day low flows for three recurrence intervals, nine flow durations, mean annual base flow, and annual mean base flows for two recurrence intervals. Low standard errors of prediction and high coefficients of determination (R2) indicated good results in using the regression equations to predict the statistics. Regression equations for the larger flow statistics tended to have lower standard errors of prediction and higher coefficients of determination (R2) than equations for the smaller flow statistics. The report discusses the methodologies used in determining the statistics and the limitations of the statistics and the equations used to predict the statistics. Caution is indicated in using the predicted statistics for small drainage area situations. Study results constitute input needed by water-resource managers in Monroe County for planning purposes and evaluation of water-resources availability.
An Examination and Comparison of Airline and Navy Pilot Career Earnings
1986-03-01
RECEIVED ........ .............. 45 16. AIRLINE PILOT PROBATIONARY WAGES .... ........ 46 17. 1985 FAPA MAXIMUM PILOT WAGE ESTIMATES ..... 53 1 1983...tI% LIN PILOT WAGES REGRESSION EQUATIONS . 5 19. AVERAGE 1983 PILOT WAGES COMPUTED FROM REGRESSION ANALYSIS ...... ............. 56 20. FAPA MAXIMUM...Western N/A 1,200 1,500 Source: FAPA This establishes a wage "base" for pilots. In addition, a pilot who ilys more than average in one month may "bank
Negative correlates of computer game play in adolescents.
Colwell, J; Payne, J
2000-08-01
There is some concern that playing computer games may be associated with social isolation, lowered self-esteem, and aggression among adolescents. Measures of these variables were included in a questionnaire completed by 204 year eight students at a North London comprehensive school. Principal components analysis of a scale to assess needs fulfilled by game play provided some support for the notion of 'electronic friendship' among boys, but there was no evidence that game play leads to social isolation. Play was not linked to self-esteem in girls, but a negative relationship was obtained between self-esteem and frequency of play in boys. However, self-esteem was not associated with total exposure to game play. Aggression scores were not related to the number of games with aggressive content named among three favourite games, but they were positively correlated with total exposure to game play. A multiple regression analysis revealed that sex and total game play exposure each accounted for a significant but small amount of the variance in aggression scores. The positive correlation between playing computer games and aggression provides some justification for further investigation of the causal hypothesis, and possible methodologies are discussed.
Welter, Michael; Rieger, Heiko
2016-01-01
Tumor vasculature, the blood vessel network supplying a growing tumor with nutrients such as oxygen or glucose, is in many respects different from the hierarchically organized arterio-venous blood vessel network in normal tissues. Angiogenesis (the formation of new blood vessels), vessel cooption (the integration of existing blood vessels into the tumor vasculature), and vessel regression remodel the healthy vascular network into a tumor-specific vasculature. Integrative models, based on detailed experimental data and physical laws, implement, in silico, the complex interplay of molecular pathways, cell proliferation, migration, and death, tissue microenvironment, mechanical and hydrodynamic forces, and the fine structure of the host tissue vasculature. With the help of computer simulations high-precision information about blood flow patterns, interstitial fluid flow, drug distribution, oxygen and nutrient distribution can be obtained and a plethora of therapeutic protocols can be tested before clinical trials. This chapter provides an overview over the current status of computer simulations of vascular remodeling during tumor growth including interstitial fluid flow, drug delivery, and oxygen supply within the tumor. The model predictions are compared with experimental and clinical data and a number of longstanding physiological paradigms about tumor vasculature and intratumoral solute transport are critically scrutinized.
Nugis, V Yu; Khvostunov, I K; Goloub, E V; Kozlova, M G; Nadejinal, N M; Galstian, I A
2015-01-01
The method for retrospective dose assessment based on the analysis of cell distribution by the number of dicentrics and unstable aberrations using a special computer program was earlier developed based on the data about the persons irradiated as a result of the accident at the Chernobyl nuclear power plant. This method was applied for the same purpose for data processing of repeated cytogenetic studies of the patients exposed to γ-, γ-β- or γ-neutron radiation in various situations. As a whole, this group was followed up in more distant periods (17-50 years) after exposure than Chernobyl patients (up to 25 years). The use for retrospective dose assessment of the multiple regression equations obtained for the Chernobyl cohort showed that the equation, which includes computer recovered estimate of the dose and the time elapsed after irradiation, was generally unsatisfactory (r = 0.069 at p = 0.599). Similar equations with recovered estimate of the dose and frequency of abnormal chromosomes in a distant period or with all three parameters as variables gave better results (r = 0.686 at p = 0.000000001 and r = 0.542 at p = 0.000008, respectively).
Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Volden, Thomas R.
2012-01-01
An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-01-01
Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332
Computer Use and Computer Anxiety in Older Korean Americans.
Yoon, Hyunwoo; Jang, Yuri; Xie, Bo
2016-09-01
Responding to the limited literature on computer use in ethnic minority older populations, the present study examined predictors of computer use and computer anxiety in older Korean Americans. Separate regression models were estimated for computer use and computer anxiety with the common sets of predictors: (a) demographic variables (age, gender, marital status, and education), (b) physical health indicators (chronic conditions, functional disability, and self-rated health), and (c) sociocultural factors (acculturation and attitudes toward aging). Approximately 60% of the participants were computer-users, and they had significantly lower levels of computer anxiety than non-users. A higher likelihood of computer use and lower levels of computer anxiety were commonly observed among individuals with younger age, male gender, advanced education, more positive ratings of health, and higher levels of acculturation. In addition, positive attitudes toward aging were found to reduce computer anxiety. Findings provide implications for developing computer training and education programs for the target population. © The Author(s) 2015.
Pedraza-Flechas, Ana María; Lope, Virginia; Moreo, Pilar; Ascunce, Nieves; Miranda-García, Josefa; Vidal, Carmen; Sánchez-Contador, Carmen; Santamariña, Carmen; Pedraz-Pingarrón, Carmen; Llobet, Rafael; Aragonés, Nuria; Salas-Trejo, Dolores; Pollán, Marina; Pérez-Gómez, Beatriz
2017-05-01
We explored the relationship between sleep patterns and sleep disorders and mammographic density (MD), a marker of breast cancer risk. Participants in the DDM-Spain/var-DDM study, which included 2878 middle-aged Spanish women, were interviewed via telephone and asked questions on sleep characteristics. Two radiologists assessed MD in their left craneo-caudal mammogram, assisted by a validated semiautomatic-computer tool (DM-scan). We used log-transformed percentage MD as the dependent variable and fitted mixed linear regression models, including known confounding variables. Our results showed that neither sleeping patterns nor sleep disorders were associated with MD. However, women with frequent changes in their bedtime due to anxiety or depression had higher MD (e β :1.53;95%CI:1.04-2.26). Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Hoisington, C. M.
1984-01-01
A position estimation algorithm was developed to track a humpback whale tagged with an ARGOS platform after a transmitter deployment failure and the whale's diving behavior precluded standard methods. The algorithm is especially useful where a transmitter location program exists; it determines the classical keplarian elements from the ARGOS spacecraft position vectors included with the probationary file messages. A minimum of three distinct messages are required. Once the spacecraft orbit is determined, the whale is located using standard least squares regression techniques. Experience suggests that in instances where circumstances inherent in the experiment yield message data unsuitable for the standard ARGOS reduction, (message data may be too sparse, span an insufficient period, or include variable-length messages). System ARGOS can still provide much valuable location information if the user is willing to accept the increased location uncertainties.
Modeling the Learner in Computer-Assisted Instruction
ERIC Educational Resources Information Center
Fletcher, J. D.
1975-01-01
This paper briefly reviews relevant work in four areas: 1) quantitative models of memory; 2) regression models of performance; 3) automation models of performance; and 4) artificial intelligence. (Author/HB)
Further investigations of the W-test for pairwise epistasis testing.
Howey, Richard; Cordell, Heather J
2017-01-01
Background: In a recent paper, a novel W-test for pairwise epistasis testing was proposed that appeared, in computer simulations, to have higher power than competing alternatives. Application to genome-wide bipolar data detected significant epistasis between SNPs in genes of relevant biological function. Network analysis indicated that the implicated genes formed two separate interaction networks, each containing genes highly related to autism and neurodegenerative disorders. Methods: Here we investigate further the properties and performance of the W-test via theoretical evaluation, computer simulations and application to real data. Results: We demonstrate that, for common variants, the W-test is closely related to several existing tests of association allowing for interaction, including logistic regression on 8 degrees of freedom, although logistic regression can show inflated type I error for low minor allele frequencies, whereas the W-test shows good/conservative type I error control. Although in some situations the W-test can show higher power, logistic regression is not limited to tests on 8 degrees of freedom but can instead be tailored to impose greater structure on the assumed alternative hypothesis, offering a power advantage when the imposed structure matches the true structure. Conclusions: The W-test is a potentially useful method for testing for association - without necessarily implying interaction - between genetic variants disease, particularly when one or more of the genetic variants are rare. For common variants, the advantages of the W-test are less clear, and, indeed, there are situations where existing methods perform better. In our investigations, we further uncover a number of problems with the practical implementation and application of the W-test (to bipolar disorder) previously described, apparently due to inadequate use of standard data quality-control procedures. This observation leads us to urge caution in interpretation of the previously-presented results, most of which we consider are highly likely to be artefacts.
Comparison of statistical tests for association between rare variants and binary traits.
Bacanu, Silviu-Alin; Nelson, Matthew R; Whittaker, John C
2012-01-01
Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait's variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.
Jiang, Xiaoqian; Aziz, Md Momin Al; Wang, Shuang; Mohammed, Noman
2018-01-01
Background Machine learning is an effective data-driven tool that is being widely used to extract valuable patterns and insights from data. Specifically, predictive machine learning models are very important in health care for clinical data analysis. The machine learning algorithms that generate predictive models often require pooling data from different sources to discover statistical patterns or correlations among different attributes of the input data. The primary challenge is to fulfill one major objective: preserving the privacy of individuals while discovering knowledge from data. Objective Our objective was to develop a hybrid cryptographic framework for performing regression analysis over distributed data in a secure and efficient way. Methods Existing secure computation schemes are not suitable for processing the large-scale data that are used in cutting-edge machine learning applications. We designed, developed, and evaluated a hybrid cryptographic framework, which can securely perform regression analysis, a fundamental machine learning algorithm using somewhat homomorphic encryption and a newly introduced secure hardware component of Intel Software Guard Extensions (Intel SGX) to ensure both privacy and efficiency at the same time. Results Experimental results demonstrate that our proposed method provides a better trade-off in terms of security and efficiency than solely secure hardware-based methods. Besides, there is no approximation error. Computed model parameters are exactly similar to plaintext results. Conclusions To the best of our knowledge, this kind of secure computation model using a hybrid cryptographic framework, which leverages both somewhat homomorphic encryption and Intel SGX, is not proposed or evaluated to this date. Our proposed framework ensures data security and computational efficiency at the same time. PMID:29506966
Sadat, Md Nazmus; Jiang, Xiaoqian; Aziz, Md Momin Al; Wang, Shuang; Mohammed, Noman
2018-03-05
Machine learning is an effective data-driven tool that is being widely used to extract valuable patterns and insights from data. Specifically, predictive machine learning models are very important in health care for clinical data analysis. The machine learning algorithms that generate predictive models often require pooling data from different sources to discover statistical patterns or correlations among different attributes of the input data. The primary challenge is to fulfill one major objective: preserving the privacy of individuals while discovering knowledge from data. Our objective was to develop a hybrid cryptographic framework for performing regression analysis over distributed data in a secure and efficient way. Existing secure computation schemes are not suitable for processing the large-scale data that are used in cutting-edge machine learning applications. We designed, developed, and evaluated a hybrid cryptographic framework, which can securely perform regression analysis, a fundamental machine learning algorithm using somewhat homomorphic encryption and a newly introduced secure hardware component of Intel Software Guard Extensions (Intel SGX) to ensure both privacy and efficiency at the same time. Experimental results demonstrate that our proposed method provides a better trade-off in terms of security and efficiency than solely secure hardware-based methods. Besides, there is no approximation error. Computed model parameters are exactly similar to plaintext results. To the best of our knowledge, this kind of secure computation model using a hybrid cryptographic framework, which leverages both somewhat homomorphic encryption and Intel SGX, is not proposed or evaluated to this date. Our proposed framework ensures data security and computational efficiency at the same time. ©Md Nazmus Sadat, Xiaoqian Jiang, Md Momin Al Aziz, Shuang Wang, Noman Mohammed. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 05.03.2018.
NASA Astrophysics Data System (ADS)
Li, Richard Y.; Di Felice, Rosa; Rohs, Remo; Lidar, Daniel A.
2018-03-01
Transcription factors regulate gene expression, but how these proteins recognize and specifically bind to their DNA targets is still debated. Machine learning models are effective means to reveal interaction mechanisms. Here we studied the ability of a quantum machine learning approach to classify and rank binding affinities. Using simplified data sets of a small number of DNA sequences derived from actual binding affinity experiments, we trained a commercially available quantum annealer to classify and rank transcription factor binding. The results were compared to state-of-the-art classical approaches for the same simplified data sets, including simulated annealing, simulated quantum annealing, multiple linear regression, LASSO, and extreme gradient boosting. Despite technological limitations, we find a slight advantage in classification performance and nearly equal ranking performance using the quantum annealer for these fairly small training data sets. Thus, we propose that quantum annealing might be an effective method to implement machine learning for certain computational biology problems.
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data
Ching, Travers; Zhu, Xun
2018-01-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet. PMID:29634719
Zimmerman, Tammy M.
2006-01-01
The Lake Erie shoreline in Pennsylvania spans nearly 40 miles and is a valuable recreational resource for Erie County. Nearly 7 miles of the Lake Erie shoreline lies within Presque Isle State Park in Erie, Pa. Concentrations of Escherichia coli (E. coli) bacteria at permitted Presque Isle beaches occasionally exceed the single-sample bathing-water standard, resulting in unsafe swimming conditions and closure of the beaches. E. coli concentrations and other water-quality and environmental data collected at Presque Isle Beach 2 during the 2004 and 2005 recreational seasons were used to develop models using tobit regression analyses to predict E. coli concentrations. All variables statistically related to E. coli concentrations were included in the initial regression analyses, and after several iterations, only those explanatory variables that made the models significantly better at predicting E. coli concentrations were included in the final models. Regression models were developed using data from 2004, 2005, and the combined 2-year dataset. Variables in the 2004 model and the combined 2004-2005 model were log10 turbidity, rain weight, wave height (calculated), and wind direction. Variables in the 2005 model were log10 turbidity and wind direction. Explanatory variables not included in the final models were water temperature, streamflow, wind speed, and current speed; model results indicated these variables did not meet significance criteria at the 95-percent confidence level (probabilities were greater than 0.05). The predicted E. coli concentrations produced by the models were used to develop probabilities that concentrations would exceed the single-sample bathing-water standard for E. coli of 235 colonies per 100 milliliters. Analysis of the exceedence probabilities helped determine a threshold probability for each model, chosen such that the correct number of exceedences and nonexceedences was maximized and the number of false positives and false negatives was minimized. Future samples with computed exceedence probabilities higher than the selected threshold probability, as determined by the model, will likely exceed the E. coli standard and a beach advisory or closing may need to be issued; computed exceedence probabilities lower than the threshold probability will likely indicate the standard will not be exceeded. Additional data collected each year can be used to test and possibly improve the model. This study will aid beach managers in more rapidly determining when waters are not safe for recreational use and, subsequently, when to issue beach advisories or closings.
Application of XGBoost algorithm in hourly PM2.5 concentration prediction
NASA Astrophysics Data System (ADS)
Pan, Bingyue
2018-02-01
In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.
Quantile regression in the presence of monotone missingness with sensitivity analysis
Liu, Minzhao; Daniels, Michael J.; Perri, Michael G.
2016-01-01
In this paper, we develop methods for longitudinal quantile regression when there is monotone missingness. In particular, we propose pattern mixture models with a constraint that provides a straightforward interpretation of the marginal quantile regression parameters. Our approach allows sensitivity analysis which is an essential component in inference for incomplete data. To facilitate computation of the likelihood, we propose a novel way to obtain analytic forms for the required integrals. We conduct simulations to examine the robustness of our approach to modeling assumptions and compare its performance to competing approaches. The model is applied to data from a recent clinical trial on weight management. PMID:26041008
Chaurasia, Ashok; Harel, Ofer
2015-02-10
Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.
A Regression Analysis of Elementary Students' ICT Usage vis-à-vis Access to Technology in Singapore
ERIC Educational Resources Information Center
Tay, Lee Yong; Nair, Shanthi Suraj; Lim, Cher Ping
2017-01-01
This paper explores the relationship among ICT infrastructure (i.e., computing devices and Internet), one-to-one computing program and student ICT activities in school. It also looks into the differences of how ICT is being used in the teaching of English, mathematics and science at the elementary school level in relation to the availability of…
Ranasinghe, Priyanga; Perera, Yashasvi S; Lamabadusuriya, Dilusha A; Kulatunga, Supun; Jayawardana, Naveen; Rajapakse, Senaka; Katulanda, Prasad
2011-08-04
Complaints of arms, neck and shoulders (CANS) is common among computer office workers. We evaluated an aetiological model with physical/psychosocial risk-factors. We invited 2,500 computer office workers for the study. Data on prevalence and risk-factors of CANS were collected by validated Maastricht-Upper-extremity-Questionnaire. Workstations were evaluated by Occupational Safety and Health Administration (OSHA) Visual-Display-Terminal workstation-checklist. Participants' knowledge and awareness was evaluated by a set of expert-validated questions. A binary logistic regression analysis investigated relationships/correlations between risk-factors and symptoms. Sample size was 2,210. Mean age 30.8 ± 8.1 years, 50.8% were males. The 1-year prevalence of CANS was 56.9%, commonest region of complaint was forearm/hand (42.6%), followed by neck (36.7%) and shoulder/arm (32.0%). In those with CANS, 22.7% had taken treatment from a health care professional, only in 1.1% seeking medical advice an occupation-related injury had been suspected/diagnosed. In addition 9.3% reported CANS-related absenteeism from work, while 15.4% reported CANS causing disruption of normal activities. A majority of evaluated workstations in all participants (88.4%,) and in those with CANS (91.9%) had OSHA non-compliant workstations. In the binary logistic regression analyses female gender, daily computer usage, incorrect body posture, bad work-habits, work overload, poor social support and poor ergonomic knowledge were associated with CANS and its' severity In a multiple logistic regression analysis controlling for age, gender and duration of occupation, incorrect body posture, bad work-habits and daily computer usage were significant independent predictors of CANS. The prevalence of work-related CANS among computer office workers in Sri Lanka, a developing, South Asian country is high and comparable to prevalence in developed countries. Work-related physical factors, psychosocial factors and lack of awareness were all important associations of CANS and effective preventive strategies need to address all three areas.
NASA Astrophysics Data System (ADS)
Anderson, Delia Marie Castro
Computer literacy and use have become commonplace in our colleges and universities. In an environment that demands the use of technology, educators should be knowledgeable of the components that make up the overall computer attitude of students and be willing to investigate the processes and techniques of effective teaching and learning that can take place with computer technology. The purpose of this study is two fold. First, it investigates the relationship between computer attitudes and gender, ethnicity, and computer experience. Second, it addresses the question of whether, and to what extent, students' attitudes toward computers change over a 16 week period in an undergraduate microbiology course that supplements the traditional lecture with computer-driven assignments. Multiple regression analyses, using data from the Computer Attitudes Scale (Loyd & Loyd, 1985), showed that, in the experimental group, no significant relationships were found between computer anxiety and gender or ethnicity or between computer confidence and gender or ethnicity. However, students who used computers the longest (p = .001) and who were self-taught (p = .046) had the lowest computer anxiety levels. Likewise students who used computers the longest (p = .001) and who were self-taught (p = .041) had the highest confidence levels. No significant relationships between computer liking, usefulness, or the use of Internet resources and gender, ethnicity, or computer experience were found. Dependent T-tests were performed to determine whether computer attitude scores (pretest and posttest) increased over a 16-week period for students who had been exposed to computer-driven assignments and other Internet resources. Results showed that students in the experimental group were less anxious about working with computers and considered computers to be more useful. In the control group, no significant changes in computer anxiety, confidence, liking, or usefulness were noted. Overall, students in the experimental group, who responded to the use of Internet Resources Survey, were positive (mean of 3.4 on the 4-point scale) toward their use of Internet resources which included the online courseware developed by the researcher. Findings from this study suggest that (1) the digital divide with respect to gender and ethnicity may be narrowing, and (2) students who are exposed to a course that augments computer-driven courseware with traditional teaching methods appear to have less anxiety, have a clearer perception of computer usefulness, and feel that online resources enhance their learning.
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.
Choi, Jae-Seok; Kim, Munchurl
2017-03-01
Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower computational complexity when compared with a super-resolution method based on convolutional neural nets (SRCNN15). Compared with the previous SI method that is limited with a scale factor of 2, GLM-SI shows superior performance with average 0.79 dB higher in PSNR, and can be used for scale factors of 3 or higher.
Nicolakakis, Nektaria; Stock, Susan R; Abrahamowicz, Michal; Kline, Rex; Messing, Karen
2017-11-01
Computer work has been identified as a risk factor for upper extremity musculoskeletal problems (UEMSP). But few studies have investigated how psychosocial and organizational work factors affect this relation. Nor have gender differences in the relation between UEMSP and these work factors been studied. We sought to estimate: (1) the association between UEMSP and a range of physical, psychosocial and organizational work exposures, including the duration of computer work, and (2) the moderating effect of psychosocial work exposures on the relation between computer work and UEMSP. Using 2007-2008 Québec survey data on 2478 workers, we carried out gender-stratified multivariable logistic regression modeling and two-way interaction analyses. In both genders, odds of UEMSP were higher with exposure to high physical work demands and emotionally demanding work. Additionally among women, UEMSP were associated with duration of occupational computer exposure, sexual harassment, tense situations when dealing with clients, high quantitative demands and lack of prospects for promotion, and among men, with low coworker support, episodes of unemployment, low job security and contradictory work demands. Among women, the effect of computer work on UEMSP was considerably increased in the presence of emotionally demanding work, and may also be moderated by low recognition at work, contradictory work demands, and low supervisor support. These results suggest that the relations between UEMSP and computer work are moderated by psychosocial work exposures and that the relations between working conditions and UEMSP are somewhat different for each gender, highlighting the complexity of these relations and the importance of considering gender.
Computer-aided US diagnosis of breast lesions by using cell-based contour grouping.
Cheng, Jie-Zhi; Chou, Yi-Hong; Huang, Chiun-Sheng; Chang, Yeun-Chung; Tiu, Chui-Mei; Chen, Kuei-Wu; Chen, Chung-Ming
2010-06-01
To develop a computer-aided diagnostic algorithm with automatic boundary delineation for differential diagnosis of benign and malignant breast lesions at ultrasonography (US) and investigate the effect of boundary quality on the performance of a computer-aided diagnostic algorithm. This was an institutional review board-approved retrospective study with waiver of informed consent. A cell-based contour grouping (CBCG) segmentation algorithm was used to delineate the lesion boundaries automatically. Seven morphologic features were extracted. The classifier was a logistic regression function. Five hundred twenty breast US scans were obtained from 520 subjects (age range, 15-89 years), including 275 benign (mean size, 15 mm; range, 5-35 mm) and 245 malignant (mean size, 18 mm; range, 8-29 mm) lesions. The newly developed computer-aided diagnostic algorithm was evaluated on the basis of boundary quality and differentiation performance. The segmentation algorithms and features in two conventional computer-aided diagnostic algorithms were used for comparative study. The CBCG-generated boundaries were shown to be comparable with the manually delineated boundaries. The area under the receiver operating characteristic curve (AUC) and differentiation accuracy were 0.968 +/- 0.010 and 93.1% +/- 0.7, respectively, for all 520 breast lesions. At the 5% significance level, the newly developed algorithm was shown to be superior to the use of the boundaries and features of the two conventional computer-aided diagnostic algorithms in terms of AUC (0.974 +/- 0.007 versus 0.890 +/- 0.008 and 0.788 +/- 0.024, respectively). The newly developed computer-aided diagnostic algorithm that used a CBCG segmentation method to measure boundaries achieved a high differentiation performance. Copyright RSNA, 2010
A new method of real-time detection of changes in periodic data stream
NASA Astrophysics Data System (ADS)
Lyu, Chen; Lu, Guoliang; Cheng, Bin; Zheng, Xiangwei
2017-07-01
The change point detection in periodic time series is much desirable in many practical usages. We present a novel algorithm for this task, which includes two phases: 1) anomaly measure- on the basis of a typical regression model, we propose a new computation method to measure anomalies in time series which does not require any reference data from other measurement(s); 2) change detection- we introduce a new martingale test for detection which can be operated in an unsupervised and nonparametric way. We have conducted extensive experiments to systematically test our algorithm. The results make us believe that our algorithm can be directly applicable in many real-world change-point-detection applications.
Perlman, Sharon; Raviv-Zilka, Lisa; Levinsky, Denis; Gidron, Ayelet; Achiron, Reuven; Gilboa, Yinon; Kivilevitch, Zvi
2018-04-22
Assessment of pelvic configuration is an important factor in the prediction of a successful vaginal birth. However, manual evaluation of the pelvis is practically a vanishing art, and imaging techniques are not available as a real-time bed-side tool. Unlike the obstetrical conjugate diameter (OC) and inter spinous diameter (ISD), the pubic arch angle (PAA) can be easily measured by transperineal ultrasound. Three-dimensional computed tomography bone reconstructions were used to measure the three main birth canal diameters, evaluate the correlation between them, and establish the normal reference range for the inlet, mid-, and pelvic outlet. Measurements of the PAA, obstetric conjugate (OC), and ISD were performed offline using three-dimensional post processing reconstruction in bone algorithm application of the pelvis on examinations performed for suspected renal colic in nonpregnant reproductive age woman. The mean of two measurements was used for statistical analysis which included reproducibility of measurements, regression curve estimation between PAA, OC, and ISD, and calculation of the respective reference range centiles for each PAA degree. Two hundred ninety-eight women comprised the study group. The mean ± SD of the PAA, ISD, and OC were 104.9° (±7.4), 103.8 mm (±7.3), and 129.9 mm (±8.3), respectively. The intra- and interobserver agreement defined by the intraclass correlation coefficient (ICC) was excellent for all parameters (range 0.905-0.993). A significant positive correlation was found between PAA and ISD and between PAA and OCD (Pearson's correlation = 0.373 (p < .001), and 0.163 (p = .022), respectively). The best regression formula was found with quadratic regression for inter spinous diameter (ISD): 34.122778 + (0.962182*PAA - 0.002830*PAA 2 ), and linear regression for obstetric conjugate (OC): 110.638397 + 0.183156*PAA. Modeled mean, SD, and reference centiles of the ISD and OCD were calculated using the above regression models as function of the PAA. We report significant correlation between the three pelvic landmarks with greatest impact on the prediction of a successful vaginal delivery: the PAA which is easily measured sonographically and the ISD and OC which are not measurable by ultrasound. This correlation may serve as a basis for future studies to assess its utility and prognostic value for a safe vaginal delivery.
Kaladji, Adrien; Cardon, Alain; Abouliatim, Issam; Campillo-Gimenez, Boris; Heautot, Jean François; Verhoye, Jean-Philippe
2012-05-01
Aneurysmal regression is a reliable marker for long-lasting success after endovascular aneurysm repair (EVAR). The aim of this study was to identify the preoperative factors that can predictably lead to aneurysmal sac regression after EVAR, according to the reporting standards of the Society for Vascular Surgery and the International Society of Cardiovascular Surgery (SVS/ISCVS). From 199 patients treated by EVAR between 2000 and 2009, 164 completed computed tomography angiographies and duplex scan follow-up images were available. All computed tomography angiographies for enrolled patients in this retrospective study were analyzed with Endosize software (Therenva, Rennes, France) to provide spatially correct 3-dimensional data in accordance with SVS/ISCVS recommendations. Anatomic parameters were graded according to the relevant severity grades. A severity score was calculated at the aortic neck, the abdominal aortic aneurysm, and the iliac arteries. Clinical and demographic factors were studied. Patients with aneurysmal regression >5 mm were assigned to group A (mean age, 71.4 ± 8.9 years) and the others to group B (76.3 ± 8.3 years). Aneurysmal regression occurred in 66 patients (40.2%; group A). Univariate analyses showed smaller severity scores at the aortic neck (P = .02) and the iliac arteries (P = .002) in group A and calcifications and thrombus were less significant at the aortic neck (P = .003 and P = .02) and at the iliac arteries (P = .001 and P = .02), and inferior mesenteric artery patency was less frequent (68.2% vs 82.7%, P = .04). Two multivariate analyses were done: one considered the scores and the other the variables included in the scores. In the first, the patients of group A were younger (P = .002) and aortic neck calcifications were less significant (P = .007). In the second, group A patients were younger (P < .001) and the aortic neck scores were smaller (P = .04). There was no difference between the two groups in the type of implanted endoprosthesis or in the follow-up (group A: 46.4 ± 24 months; group B: 47.2 ± 22 months; P = .35). In this study, the young age of the patients and their aortic neck quality, in particular the absence of neck calcification, appear to have been the main factors affecting aneurysm shrinkage, such that they represent a target population for the improvement of EVAR results. Copyright © 2012 Society for Vascular Surgery. Published by Mosby, Inc. All rights reserved.
MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors.
Hedeker, D; Gibbons, R D
1996-05-01
MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors. This model can be used for analysis of unbalanced longitudinal data, where individuals may be measured at a different number of timepoints, or even at different timepoints. Autocorrelated errors of a general form or following an AR(1), MA(1), or ARMA(1,1) form are allowable. This model can also be used for analysis of clustered data, where the mixed-effects model assumes data within clusters are dependent. The degree of dependency is estimated jointly with estimates of the usual model parameters, thus adjusting for clustering. MIXREG uses maximum marginal likelihood estimation, utilizing both the EM algorithm and a Fisher-scoring solution. For the scoring solution, the covariance matrix of the random effects is expressed in its Gaussian decomposition, and the diagonal matrix reparameterized using the exponential transformation. Estimation of the individual random effects is accomplished using an empirical Bayes approach. Examples illustrating usage and features of MIXREG are provided.
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Life stress and atherosclerosis: a pathway through unhealthy lifestyle.
Mainous, Arch G; Everett, Charles J; Diaz, Vanessa A; Player, Marty S; Gebregziabher, Mulugeta; Smith, Daniel W
2010-01-01
To examine the relationship between a general measure of chronic life stress and atherosclerosis among middle aged adults without clinical cardiovascular disease via pathways through unhealthy lifestyle characteristics. We conducted an analysis of The Multi-Ethnic Study of Atherosclerosis (MESA). The MESA collected in 2000 includes 5,773 participants, aged 45-84. We computed standard regression techniques to examine the relationship between life stress and atherosclerosis as well as path analysis with hypothesized paths from stress to atherosclerosis through unhealthy lifestyle. Our outcome was sub-clinical atherosclerosis measured as presence of coronary artery calcification (CAC). A logistic regression adjusted for potential confounding variables along with the unhealthy lifestyle characteristics of smoking, excessive alcohol use, high caloric intake, sedentary lifestyle, and obesity yielded no significant relationship between chronic life stress (OR 0.93, 95% CI 0.80-1.08) and CAC. However, significant indirect pathways between chronic life stress and CAC through smoking (p = .007), and sedentary lifestyle (p = .03) and caloric intake (.002) through obesity were found. These results suggest that life stress is related to atherosclerosis once paths of unhealthy coping behaviors are considered.
VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA
Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu
2009-01-01
We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology. PMID:20336190
Duncan, Dustin T; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A; Arbia, Giuseppe; Castro, Marcia C; White, Kellee; Williams, David R
2014-04-01
The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran's I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran's I range from 0.24 to 0.86, all P =0.001), for tree density (Global Moran's I =0.452, P =0.001), and in the OLS regression residuals (Global Moran's I range from 0.32 to 0.38, all P <0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (r S =-0.19; conventional P -value=0.016; spatially adjusted P -value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (r S =-0.18; conventional P -value=0.019; spatially adjusted P -value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed.
Cross Validation of Selection of Variables in Multiple Regression.
1979-12-01
55 vii CROSS VALIDATION OF SELECTION OF VARIABLES IN MULTIPLE REGRESSION I Introduction Background Long term DoD planning gcals...028545024 .31109000 BF * SS - .008700618 .0471961 Constant - .70977903 85.146786 55 had adequate predictive capabilities; the other two models (the...71ZCO F111D Control 54 73EGO FlIID Computer, General Purpose 55 73EPO FII1D Converter-Multiplexer 56 73HAO flllD Stabilizer Platform 57 73HCO F1ID
Gebremariam, Mekdes K; Totland, Torunn H; Andersen, Lene F; Bergh, Ingunn H; Bjelland, Mona; Grydeland, May; Ommundsen, Yngvar; Lien, Nanna
2012-02-06
In order to inform interventions to prevent sedentariness, more longitudinal studies are needed focusing on stability and change over time in multiple sedentary behaviours. This paper investigates patterns of stability and change in TV/DVD use, computer/electronic game use and total screen time (TST) and factors associated with these patterns among Norwegian children in the transition between childhood and adolescence. The baseline of this longitudinal study took place in September 2007 and included 975 students from 25 control schools of an intervention study, the HEalth In Adolescents (HEIA) study. The first follow-up took place in May 2008 and the second follow-up in May 2009, with 885 students participating at all time points (average age at baseline = 11.2, standard deviation ± 0.3). Time used for/spent on TV/DVD and computer/electronic games was self-reported, and a TST variable (hours/week) was computed. Tracking analyses based on absolute and rank measures, as well as regression analyses to assess factors associated with change in TST and with tracking high TST were conducted. Time spent on all sedentary behaviours investigated increased in both genders. Findings based on absolute and rank measures revealed a fair to moderate level of tracking over the 2 year period. High parental education was inversely related to an increase in TST among females. In males, self-efficacy related to barriers to physical activity and living with married or cohabitating parents were inversely related to an increase in TST. Factors associated with tracking high vs. low TST in the multinomial regression analyses were low self-efficacy and being of an ethnic minority background among females, and low self-efficacy, being overweight/obese and not living with married or cohabitating parents among males. Use of TV/DVD and computer/electronic games increased with age and tracked over time in this group of 11-13 year old Norwegian children. Interventions targeting these sedentary behaviours should thus be introduced early. The identified modifiable and non-modifiable factors associated with change in TST and tracking of high TST should be taken into consideration when planning such interventions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blais, AR; Dekaban, M; Lee, T-Y
2014-08-15
Quantitative analysis of dynamic positron emission tomography (PET) data usually involves minimizing a cost function with nonlinear regression, wherein the choice of starting parameter values and the presence of local minima affect the bias and variability of the estimated kinetic parameters. These nonlinear methods can also require lengthy computation time, making them unsuitable for use in clinical settings. Kinetic modeling of PET aims to estimate the rate parameter k{sub 3}, which is the binding affinity of the tracer to a biological process of interest and is highly susceptible to noise inherent in PET image acquisition. We have developed linearized kineticmore » models for kinetic analysis of dynamic contrast enhanced computed tomography (DCE-CT)/PET imaging, including a 2-compartment model for DCE-CT and a 3-compartment model for PET. Use of kinetic parameters estimated from DCE-CT can stabilize the kinetic analysis of dynamic PET data, allowing for more robust estimation of k{sub 3}. Furthermore, these linearized models are solved with a non-negative least squares algorithm and together they provide other advantages including: 1) only one possible solution and they do not require a choice of starting parameter values, 2) parameter estimates are comparable in accuracy to those from nonlinear models, 3) significantly reduced computational time. Our simulated data show that when blood volume and permeability are estimated with DCE-CT, the bias of k{sub 3} estimation with our linearized model is 1.97 ± 38.5% for 1,000 runs with a signal-to-noise ratio of 10. In summary, we have developed a computationally efficient technique for accurate estimation of k{sub 3} from noisy dynamic PET data.« less
Tzeng, Jung-Ying; Zhang, Daowen; Pongpanich, Monnat; Smith, Chris; McCarthy, Mark I.; Sale, Michèle M.; Worrall, Bradford B.; Hsu, Fang-Chi; Thomas, Duncan C.; Sullivan, Patrick F.
2011-01-01
Genomic association analyses of complex traits demand statistical tools that are capable of detecting small effects of common and rare variants and modeling complex interaction effects and yet are computationally feasible. In this work, we introduce a similarity-based regression method for assessing the main genetic and interaction effects of a group of markers on quantitative traits. The method uses genetic similarity to aggregate information from multiple polymorphic sites and integrates adaptive weights that depend on allele frequencies to accomodate common and uncommon variants. Collapsing information at the similarity level instead of the genotype level avoids canceling signals that have the opposite etiological effects and is applicable to any class of genetic variants without the need for dichotomizing the allele types. To assess gene-trait associations, we regress trait similarities for pairs of unrelated individuals on their genetic similarities and assess association by using a score test whose limiting distribution is derived in this work. The proposed regression framework allows for covariates, has the capacity to model both main and interaction effects, can be applied to a mixture of different polymorphism types, and is computationally efficient. These features make it an ideal tool for evaluating associations between phenotype and marker sets defined by linkage disequilibrium (LD) blocks, genes, or pathways in whole-genome analysis. PMID:21835306
Natural language processing in an intelligent writing strategy tutoring system.
McNamara, Danielle S; Crossley, Scott A; Roscoe, Rod
2013-06-01
The Writing Pal is an intelligent tutoring system that provides writing strategy training. A large part of its artificial intelligence resides in the natural language processing algorithms to assess essay quality and guide feedback to students. Because writing is often highly nuanced and subjective, the development of these algorithms must consider a broad array of linguistic, rhetorical, and contextual features. This study assesses the potential for computational indices to predict human ratings of essay quality. Past studies have demonstrated that linguistic indices related to lexical diversity, word frequency, and syntactic complexity are significant predictors of human judgments of essay quality but that indices of cohesion are not. The present study extends prior work by including a larger data sample and an expanded set of indices to assess new lexical, syntactic, cohesion, rhetorical, and reading ease indices. Three models were assessed. The model reported by McNamara, Crossley, and McCarthy (Written Communication 27:57-86, 2010) including three indices of lexical diversity, word frequency, and syntactic complexity accounted for only 6% of the variance in the larger data set. A regression model including the full set of indices examined in prior studies of writing predicted 38% of the variance in human scores of essay quality with 91% adjacent accuracy (i.e., within 1 point). A regression model that also included new indices related to rhetoric and cohesion predicted 44% of the variance with 94% adjacent accuracy. The new indices increased accuracy but, more importantly, afford the means to provide more meaningful feedback in the context of a writing tutoring system.
Estimating ground-water inflow to lakes in central Florida using the isotope mass-balance approach
Sacks, Laura A.
2002-01-01
The isotope mass-balance approach was used to estimate ground-water inflow to 81 lakes in the central highlands and coastal lowlands of central Florida. The study area is characterized by a subtropical climate and numerous lakes in a mantled karst terrain. Ground-water inflow was computed using both steady-state and transient formulations of the isotope mass-balance equation. More detailed data were collected from two study lakes, including climatic, hydrologic, and isotopic (hydrogen and oxygen isotope ratio) data. For one of these lakes (Lake Starr), ground-water inflow was independently computed from a water-budget study. Climatic and isotopic data collected from the two lakes were similar even though they were in different physiographic settings about 60 miles apart. Isotopic data from all of the study lakes plotted on an evaporation trend line, which had a very similar slope to the theoretical slope computed for Lake Starr. These similarities suggest that data collected from the detailed study lakes can be extrapolated to the rest of the study area. Ground-water inflow computed using the isotope mass-balance approach ranged from 0 to more than 260 inches per year (or 0 to more than 80 percent of total inflows). Steady-state and transient estimates of ground-water inflow were very similar. Computed ground-water inflow was most sensitive to uncertainty in variables used to calculate the isotopic composition of lake evaporate (isotopic compositions of lake water and atmospheric moisture and climatic variables). Transient results were particularly sensitive to changes in the isotopic composition of lake water. Uncertainty in ground-water inflow results is considerably less for lakes with higher ground-water inflow than for lakes with lower ground-water inflow. Because of these uncertainties, the isotope mass-balance approach is better used to distinguish whether ground-water inflow quantities fall within certain ranges of values, rather than for precise quantification. The lakes fit into three categories based on their range of ground-water inflow: low (less than 25 percent of total inflows), medium (25-50 percent of inflows), and high (greater than 50 percent of inflows). The majority of lakes in the coastal lowlands had low ground-water inflow, whereas the majority of lakes in the central highlands had medium to high ground-water inflow. Multiple linear regression models were used to predict ground-water inflow to lakes. These models help identify basin characteristics that are important in controlling ground-water inflow to Florida lakes. Significant explanatory variables include: ratio of basin area to lake surface area, depth to the Upper Floridan aquifer, maximum lake depth, and fraction of wetlands in the basin. Models were improved when lake water-quality data (nitrate, sodium, and iron concentrations) were included, illustrating the link between ground-water geochemistry and lake chemistry. Regression models that considered lakes within specific geographic areas were generally poorer than models for the entire study area. Regression results illustrate how more simplified models based on basin and lake characteristics can be used to estimate ground-water inflow. Although the uncertainty in the amount of ground-water inflow to individual lakes is high, the isotope mass-balance approach was useful in comparing the range of ground-water inflow for numerous Florida lakes. Results were also helpful in understanding differences in the geographic distribution of ground-water inflow between the coastal lowlands and central highlands. In order to use the isotope mass-balance approach to estimate inflow for multiple lakes, it is essential that all the lakes are sampled during the same time period and that detailed isotopic, hydrologic, and climatic data are collected over this same period of time. Isotopic data for Florida lakes can change over time, both seasonally and interannually, primarily because of differ
Watson, Kara M.; McHugh, Amy R.
2014-01-01
Regional regression equations were developed for estimating monthly flow-duration and monthly low-flow frequency statistics for ungaged streams in Coastal Plain and non-coastal regions of New Jersey for baseline and current land- and water-use conditions. The equations were developed to estimate 87 different streamflow statistics, which include the monthly 99-, 90-, 85-, 75-, 50-, and 25-percentile flow-durations of the minimum 1-day daily flow; the August–September 99-, 90-, and 75-percentile minimum 1-day daily flow; and the monthly 7-day, 10-year (M7D10Y) low-flow frequency. These 87 streamflow statistics were computed for 41 continuous-record streamflow-gaging stations (streamgages) with 20 or more years of record and 167 low-flow partial-record stations in New Jersey with 10 or more streamflow measurements. The regression analyses used to develop equations to estimate selected streamflow statistics were performed by testing the relation between flow-duration statistics and low-flow frequency statistics for 32 basin characteristics (physical characteristics, land use, surficial geology, and climate) at the 41 streamgages and 167 low-flow partial-record stations. The regression analyses determined drainage area, soil permeability, average April precipitation, average June precipitation, and percent storage (water bodies and wetlands) were the significant explanatory variables for estimating the selected flow-duration and low-flow frequency statistics. Streamflow estimates were computed for two land- and water-use conditions in New Jersey—land- and water-use during the baseline period of record (defined as the years a streamgage had little to no change in development and water use) and current land- and water-use conditions (1989–2008)—for each selected station using data collected through water year 2008. The baseline period of record is representative of a period when the basin was unaffected by change in development. The current period is representative of the increased development of the last 20 years (1989–2008). The two different land- and water-use conditions were used as surrogates for development to determine whether there have been changes in low-flow statistics as a result of changes in development over time. The State was divided into two low-flow regression regions, the Coastal Plain and the non-coastal region, in order to improve the accuracy of the regression equations. The left-censored parametric survival regression method was used for the analyses to account for streamgages and partial-record stations that had zero flow values for some of the statistics. The average standard error of estimate for the 348 regression equations ranged from 16 to 340 percent. These regression equations and basin characteristics are presented in the U.S. Geological Survey (USGS) StreamStats Web-based geographic information system application. This tool allows users to click on an ungaged site on a stream in New Jersey and get the estimated flow-duration and low-flow frequency statistics. Additionally, the user can click on a streamgage or partial-record station and get the “at-site” streamflow statistics. The low-flow characteristics of a stream ultimately affect the use of the stream by humans. Specific information on the low-flow characteristics of streams is essential to water managers who deal with problems related to municipal and industrial water supply, fish and wildlife conservation, and dilution of wastewater.
Enhancing hyperspectral spatial resolution using multispectral image fusion: A wavelet approach
NASA Astrophysics Data System (ADS)
Jazaeri, Amin
High spectral and spatial resolution images have a significant impact in remote sensing applications. Because both spatial and spectral resolutions of spaceborne sensors are fixed by design and it is not possible to further increase the spatial or spectral resolution, techniques such as image fusion must be applied to achieve such goals. This dissertation introduces the concept of wavelet fusion between hyperspectral and multispectral sensors in order to enhance the spectral and spatial resolution of a hyperspectral image. To test the robustness of this concept, images from Hyperion (hyperspectral sensor) and Advanced Land Imager (multispectral sensor) were first co-registered and then fused using different wavelet algorithms. A regression-based fusion algorithm was also implemented for comparison purposes. The results show that the fused images using a combined bi-linear wavelet-regression algorithm have less error than other methods when compared to the ground truth. In addition, a combined regression-wavelet algorithm shows more immunity to misalignment of the pixels due to the lack of proper registration. The quantitative measures of average mean square error show that the performance of wavelet-based methods degrades when the spatial resolution of hyperspectral images becomes eight times less than its corresponding multispectral image. Regardless of what method of fusion is utilized, the main challenge in image fusion is image registration, which is also a very time intensive process. Because the combined regression wavelet technique is computationally expensive, a hybrid technique based on regression and wavelet methods was also implemented to decrease computational overhead. However, the gain in faster computation was offset by the introduction of more error in the outcome. The secondary objective of this dissertation is to examine the feasibility and sensor requirements for image fusion for future NASA missions in order to be able to perform onboard image fusion. In this process, the main challenge of image registration was resolved by registering the input images using transformation matrices of previously acquired data. The composite image resulted from the fusion process remarkably matched the ground truth, indicating the possibility of real time onboard fusion processing.
Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro
2010-04-21
The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.
Conditional Monte Carlo randomization tests for regression models.
Parhat, Parwen; Rosenberger, William F; Diao, Guoqing
2014-08-15
We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.
Assessing product image quality for online shopping
NASA Astrophysics Data System (ADS)
Goswami, Anjan; Chung, Sung H.; Chittar, Naren; Islam, Atiq
2012-01-01
Assessing product-image quality is important in the context of online shopping. A high quality image that conveys more information about a product can boost the buyer's confidence and can get more attention. However, the notion of image quality for product-images is not the same as that in other domains. The perception of quality of product-images depends not only on various photographic quality features but also on various high level features such as clarity of the foreground or goodness of the background etc. In this paper, we define a notion of product-image quality based on various such features. We conduct a crowd-sourced experiment to collect user judgments on thousands of eBay's images. We formulate a multi-class classification problem for modeling image quality by classifying images into good, fair and poor quality based on the guided perceptual notions from the judges. We also conduct experiments with regression using average crowd-sourced human judgments as target. We compute a pseudo-regression score with expected average of predicted classes and also compute a score from the regression technique. We design many experiments with various sampling and voting schemes with crowd-sourced data and construct various experimental image quality models. Most of our models have reasonable accuracies (greater or equal to 70%) on test data set. We observe that our computed image quality score has a high (0.66) rank correlation with average votes from the crowd sourced human judgments.
A robust and efficient stepwise regression method for building sparse polynomial chaos expansions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abraham, Simon, E-mail: Simon.Abraham@ulb.ac.be; Raisee, Mehrdad; Ghorbaniasl, Ghader
2017-03-01
Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selectionmore » criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.« less
Molnos, Sophie; Baumbach, Clemens; Wahl, Simone; Müller-Nurasyid, Martina; Strauch, Konstantin; Wang-Sattler, Rui; Waldenberger, Melanie; Meitinger, Thomas; Adamski, Jerzy; Kastenmüller, Gabi; Suhre, Karsten; Peters, Annette; Grallert, Harald; Theis, Fabian J; Gieger, Christian
2017-09-29
Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different "omics" layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/ .
NASA Astrophysics Data System (ADS)
Gulliver, John; de Hoogh, Kees; Fecht, Daniela; Vienneau, Danielle; Briggs, David
2011-12-01
The development of geographical information system techniques has opened up a wide array of methods for air pollution exposure assessment. The extent to which these provide reliable estimates of air pollution concentrations is nevertheless not clearly established. Nor is it clear which methods or metrics should be preferred in epidemiological studies. This paper compares the performance of ten different methods and metrics in terms of their ability to predict mean annual PM 10 concentrations across 52 monitoring sites in London, UK. Metrics analysed include indicators (distance to nearest road, traffic volume on nearest road, heavy duty vehicle (HDV) volume on nearest road, road density within 150 m, traffic volume within 150 m and HDV volume within 150 m) and four modelling approaches: based on the nearest monitoring site, kriging, dispersion modelling and land use regression (LUR). Measures were computed in a GIS, and resulting metrics calibrated and validated against monitoring data using a form of grouped jack-knife analysis. The results show that PM 10 concentrations across London show little spatial variation. As a consequence, most methods can predict the average without serious bias. Few of the approaches, however, show good correlations with monitored PM 10 concentrations, and most predict no better than a simple classification based on site type. Only land use regression reaches acceptable levels of correlation ( R2 = 0.47), though this can be improved by also including information on site type. This might therefore be taken as a recommended approach in many studies, though care is needed in developing meaningful land use regression models, and like any method they need to be validated against local data before their application as part of epidemiological studies.
Maxillary arch dimensions associated with acoustic parameters in prepubertal children.
Hamdan, Abdul-Latif; Khandakji, Mohannad; Macari, Anthony Tannous
2018-04-18
To evaluate the association between maxillary arch dimensions and fundamental frequency and formants of voice in prepubertal subjects. Thirty-five consecutive prepubertal patients seeking orthodontic treatment were recruited (mean age = 11.41 ± 1.46 years; range, 8 to 13.7 years). Participants with a history of respiratory infection, laryngeal manipulation, dysphonia, congenital facial malformations, or history of orthodontic treatment were excluded. Dental measurements included maxillary arch length, perimeter, depth, and width. Voice parameters comprising fundamental frequency (f0_sustained), Habitual pitch (f0_count), Jitter, Shimmer, and different formant frequencies (F1, F2, F3, and F4) were measured using acoustic analysis prior to initiation of any orthodontic treatment. Pearson's correlation coefficients were used to measure the strength of associations between different dental and voice parameters. Multiple linear regressions were computed for the predictions of different dental measurements. Arch width and arch depth had moderate significant negative correlations with f0 ( r = -0.52; P = .001 and r = -0.39; P = .022, respectively) and with habitual frequency ( r = -0.51; P = .0014 and r = -0.34; P = .04, respectively). Arch depth and arch length were significantly correlated with formant F3 and formant F4, respectively. Predictors of arch depth included frequencies of F3 vowels, with a significant regression equation ( P-value < .001; R 2 = 0.49). Similarly, fundamental frequency f0 and frequencies of formant F3 vowels were predictors of arch width, with a significant regression equation ( P-value < .001; R 2 = 0.37). There is a significant association between arch dimensions, particularly arch length and depth, and voice parameters. The formant most predictive of arch depth and width is the third formant, along with fundamental frequency of voice.
Assessing the potential for improving S2S forecast skill through multimodel ensembling
NASA Astrophysics Data System (ADS)
Vigaud, N.; Robertson, A. W.; Tippett, M. K.; Wang, L.; Bell, M. J.
2016-12-01
Non-linear logistic regression is well suited to probability forecasting and has been successfully applied in the past to ensemble weather and climate predictions, providing access to the full probabilities distribution without any Gaussian assumption. However, little work has been done at sub-monthly lead times where relatively small re-forecast ensembles and lengths represent new challenges for which post-processing avenues have yet to be investigated. A promising approach consists in extending the definition of non-linear logistic regression by including the quantile of the forecast distribution as one of the predictors. So-called Extended Logistic Regression (ELR), which enables mutually consistent individual threshold probabilities, is here applied to ECMWF, CFSv2 and CMA re-forecasts from the S2S database in order to produce rainfall probabilities at weekly resolution. The ELR model is trained on seasonally-varying tercile categories computed for lead times of 1 to 4 weeks. It is then tested in a cross-validated manner, i.e. allowing real-time predictability applications, to produce rainfall tercile probabilities from individual weekly hindcasts that are finally combined by equal pooling. Results will be discussed over a broader North American region, where individual and MME forecasts generated out to 4 weeks lead are characterized by good probabilistic reliability but low sharpness, exhibiting systematically more skill in winter than summer.
A review of small canned computer programs for survey research and demographic analysis.
Sinquefield, J C
1976-12-01
A variety of small canned computer programs for survey research and demographic analysis appropriate for use in developing countries are reviewed in this article. The programs discussed are SPSS (Statistical Package for the Social Sciences); CENTS, CO-CENTS, CENTS-AID, CENTS-AIE II; MINI-TAB EDIT, FREQUENCIES, TABLES, REGRESSION, CLIENT RECORD, DATES, MULT, LIFE, and PREGNANCY HISTORY; FIVFIV and SINSIN; DCL (Demographic Computer Library); MINI-TAB Population Projection, Functional Population Projection, and Family Planning Target Projection. A description and evaluation for each program of uses, instruction manuals, computer requirements, and procedures for obtaining manuals and programs are provided. Such information is intended to facilitate and encourage the use of the computer by data processors in developing countries.
ERIC Educational Resources Information Center
Zewude, Bereket Tessema; Ashine, Kidus Meskele
2016-01-01
An attempt has been made to assess and identify the major variables that influence student academic achievement at college of natural and computational science of Wolaita Sodo University in Ethiopia. Study time, peer influence, securing first choice of department, arranging study time outside class, amount of money received from family, good life…
Eye movement analysis of reading from computer displays, eReaders and printed books.
Zambarbieri, Daniela; Carniglia, Elena
2012-09-01
To compare eye movements during silent reading of three eBooks and a printed book. The three different eReading tools were a desktop PC, iPad tablet and Kindle eReader. Video-oculographic technology was used for recording eye movements. In the case of reading from the computer display the recordings were made by a video camera placed below the computer screen, whereas for reading from the iPad tablet, eReader and printed book the recording system was worn by the subject and had two cameras: one for recording the movement of the eyes and the other for recording the scene in front of the subject. Data analysis provided quantitative information in terms of number of fixations, their duration, and the direction of the movement, the latter to distinguish between fixations and regressions. Mean fixation duration was different only in reading from the computer display, and was similar for the Tablet, eReader and printed book. The percentage of regressions with respect to the total amount of fixations was comparable for eReading tools and the printed book. The analysis of eye movements during reading an eBook from different eReading tools suggests that subjects' reading behaviour is similar to reading from a printed book. © 2012 The College of Optometrists.
NASA Technical Reports Server (NTRS)
Batterson, J. G.
1986-01-01
The successful parametric modeling of the aerodynamics for an airplane operating at high angles of attack or sideslip is performed in two phases. First the aerodynamic model structure must be determined and second the associated aerodynamic parameters (stability and control derivatives) must be estimated for that model. The purpose of this paper is to document two versions of a stepwise regression computer program which were developed for the determination of airplane aerodynamic model structure and to provide two examples of their use on computer generated data. References are provided for the application of the programs to real flight data. The two computer programs that are the subject of this report, STEP and STEPSPL, are written in FORTRAN IV (ANSI l966) compatible with a CDC FTN4 compiler. Both programs are adaptations of a standard forward stepwise regression algorithm. The purpose of the adaptation is to facilitate the selection of a adequate mathematical model of the aerodynamic force and moment coefficients of an airplane from flight test data. The major difference between STEP and STEPSPL is in the basis for the model. The basis for the model in STEP is the standard polynomial Taylor's series expansion of the aerodynamic function about some steady-state trim condition. Program STEPSPL utilizes a set of spline basis functions.
Empirical Assessment of Spatial Prediction Methods for Location Cost Adjustment Factors
Migliaccio, Giovanni C.; Guindani, Michele; D'Incognito, Maria; Zhang, Linlin
2014-01-01
In the feasibility stage, the correct prediction of construction costs ensures that budget requirements are met from the start of a project's lifecycle. A very common approach for performing quick-order-of-magnitude estimates is based on using Location Cost Adjustment Factors (LCAFs) that compute historically based costs by project location. Nowadays, numerous LCAF datasets are commercially available in North America, but, obviously, they do not include all locations. Hence, LCAFs for un-sampled locations need to be inferred through spatial interpolation or prediction methods. Currently, practitioners tend to select the value for a location using only one variable, namely the nearest linear-distance between two sites. However, construction costs could be affected by socio-economic variables as suggested by macroeconomic theories. Using a commonly used set of LCAFs, the City Cost Indexes (CCI) by RSMeans, and the socio-economic variables included in the ESRI Community Sourcebook, this article provides several contributions to the body of knowledge. First, the accuracy of various spatial prediction methods in estimating LCAF values for un-sampled locations was evaluated and assessed in respect to spatial interpolation methods. Two Regression-based prediction models were selected, a Global Regression Analysis and a Geographically-weighted regression analysis (GWR). Once these models were compared against interpolation methods, the results showed that GWR is the most appropriate way to model CCI as a function of multiple covariates. The outcome of GWR, for each covariate, was studied for all the 48 states in the contiguous US. As a direct consequence of spatial non-stationarity, it was possible to discuss the influence of each single covariate differently from state to state. In addition, the article includes a first attempt to determine if the observed variability in cost index values could be, at least partially explained by independent socio-economic variables. PMID:25018582
Loeve, Martine; Hop, Wim C J; de Bruijne, Marleen; van Hal, Peter T W; Robinson, Phil; Aitken, Moira L; Dodd, Jonathan D; Tiddens, Harm A W M
2012-05-15
Up to one-third of patients with cystic fibrosis (CF) awaiting lung transplantation (LTX) die while waiting. Inclusion of computed tomography (CT) scores may improve survival prediction models such as the lung allocation score (LAS). This study investigated the association between CT and survival in patients with CF screened for LTX. Clinical data and chest CTs of 411 patients with CF screened for LTX between 1990 and 2005 were collected from 17 centers. CTs were scored with the Severe Advanced Lung Disease (SALD) four-category scoring system, including the components infection/inflammation (INF), air trapping/hypoperfusion (AT), normal/hyperperfusion (NOR), and bulla/cysts (BUL). The volume of each component was computed using semiautomated software. Survival analysis included Kaplan-Meier curves and Cox regression models. Three hundred and sixty-six (186 males) of 411 patients entered the waiting list (median age, 23 yr; range, 5-58 yr). Subsequently, 67 of 366 (18%) died while waiting, 263 of 366 (72%) underwent LTX, and 36 of 366 (10%) were awaiting LTX at the census date. INF and LAS were significantly associated with waiting list mortality in univariate analyses. The multivariate Cox model including INF and LAS grouped in tertiles, and comparing tertiles 2 and 3 with tertile 1, showed waiting list mortality hazard ratios of 1.62 (95% confidence interval [95% CI], 0.78-3.36; P = 0.19) and 2.65 (95% CI, 1.35-5.20; P = 0.005) for INF, and 1.42 (95% CI, 0.63-3.24; P = 0.40), and 2.32 (95% CI, 1.17-4.60; P = 0.016) for LAS, respectively. These results indicated that INF and LAS had significant, independent predictive value for survival. CT score INF correlates with survival, and adds to the predictive value of LAS.
Computer literacy among first year medical students in a developing country: A cross sectional study
2012-01-01
Background The use of computer assisted learning (CAL) has enhanced undergraduate medical education. CAL improves performance at examinations, develops problem solving skills and increases student satisfaction. The study evaluates computer literacy among first year medical students in Sri Lanka. Methods The study was conducted at Faculty of Medicine, University of Colombo, Sri Lanka between August-September 2008. First year medical students (n = 190) were invited for the study. Data on computer literacy and associated factors were collected by an expert-validated pre-tested self-administered questionnaire. Computer literacy was evaluated by testing knowledge on 6 domains; common software packages, operating systems, database management and the usage of internet and E-mail. A linear regression was conducted using total score for computer literacy as the continuous dependant variable and other independent covariates. Results Sample size-181 (Response rate-95.3%), 49.7% were Males. Majority of the students (77.3%) owned a computer (Males-74.4%, Females-80.2%). Students have gained their present computer knowledge by; a formal training programme (64.1%), self learning (63.0%) or by peer learning (49.2%). The students used computers for predominately; word processing (95.6%), entertainment (95.0%), web browsing (80.1%) and preparing presentations (76.8%). Majority of the students (75.7%) expressed their willingness for a formal computer training programme at the faculty. Mean score for the computer literacy questionnaire was 48.4 ± 20.3, with no significant gender difference (Males-47.8 ± 21.1, Females-48.9 ± 19.6). There were 47.9% students that had a score less than 50% for the computer literacy questionnaire. Students from Colombo district, Western Province and Student owning a computer had a significantly higher mean score in comparison to other students (p < 0.001). In the linear regression analysis, formal computer training was the strongest predictor of computer literacy (β = 13.034), followed by using internet facility, being from Western province, using computers for Web browsing and computer programming, computer ownership and doing IT (Information Technology) as a subject in GCE (A/L) examination. Conclusion Sri Lankan medical undergraduates had a low-intermediate level of computer literacy. There is a need to improve computer literacy, by increasing computer training in schools, or by introducing computer training in the initial stages of the undergraduate programme. These two options require improvement in infrastructure and other resources. PMID:22980096
NASA Astrophysics Data System (ADS)
Li, Tao
2018-06-01
The complexity of aluminum electrolysis process leads the temperature for aluminum reduction cells hard to measure directly. However, temperature is the control center of aluminum production. To solve this problem, combining some aluminum plant's practice data, this paper presents a Soft-sensing model of temperature for aluminum electrolysis process on Improved Twin Support Vector Regression (ITSVR). ITSVR eliminates the slow learning speed of Support Vector Regression (SVR) and the over-fit risk of Twin Support Vector Regression (TSVR) by introducing a regularization term into the objective function of TSVR, which ensures the structural risk minimization principle and lower computational complexity. Finally, the model with some other parameters as auxiliary variable, predicts the temperature by ITSVR. The simulation result shows Soft-sensing model based on ITSVR has short time-consuming and better generalization.
Over, Thomas M.; Saito, Riki J.; Veilleux, Andrea G.; Sharpe, Jennifer B.; Soong, David T.; Ishii, Audrey L.
2016-06-28
This report provides two sets of equations for estimating peak discharge quantiles at annual exceedance probabilities (AEPs) of 0.50, 0.20, 0.10, 0.04, 0.02, 0.01, 0.005, and 0.002 (recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively) for watersheds in Illinois based on annual maximum peak discharge data from 117 watersheds in and near northeastern Illinois. One set of equations was developed through a temporal analysis with a two-step least squares-quantile regression technique that measures the average effect of changes in the urbanization of the watersheds used in the study. The resulting equations can be used to adjust rural peak discharge quantiles for the effect of urbanization, and in this study the equations also were used to adjust the annual maximum peak discharges from the study watersheds to 2010 urbanization conditions.The other set of equations was developed by a spatial analysis. This analysis used generalized least-squares regression to fit the peak discharge quantiles computed from the urbanization-adjusted annual maximum peak discharges from the study watersheds to drainage-basin characteristics. The peak discharge quantiles were computed by using the Expected Moments Algorithm following the removal of potentially influential low floods defined by a multiple Grubbs-Beck test. To improve the quantile estimates, regional skew coefficients were obtained from a newly developed regional skew model in which the skew increases with the urbanized land use fraction. The drainage-basin characteristics used as explanatory variables in the spatial analysis include drainage area, the fraction of developed land, the fraction of land with poorly drained soils or likely water, and the basin slope estimated as the ratio of the basin relief to basin perimeter.This report also provides the following: (1) examples to illustrate the use of the spatial and urbanization-adjustment equations for estimating peak discharge quantiles at ungaged sites and to improve flood-quantile estimates at and near a gaged site; (2) the urbanization-adjusted annual maximum peak discharges and peak discharge quantile estimates at streamgages from 181 watersheds including the 117 study watersheds and 64 additional watersheds in the study region that were originally considered for use in the study but later deemed to be redundant.The urbanization-adjustment equations, spatial regression equations, and peak discharge quantile estimates developed in this study will be made available in the web application StreamStats, which provides automated regression-equation solutions for user-selected stream locations. Figures and tables comparing the observed and urbanization-adjusted annual maximum peak discharge records by streamgage are provided at https://doi.org/10.3133/sir20165050 for download.
Differences in head impulse test results due to analysis techniques.
Cleworth, Taylor W; Carpenter, Mark G; Honegger, Flurin; Allum, John H J
2017-01-01
Different analysis techniques are used to define vestibulo-ocular reflex (VOR) gain between eye and head angular velocity during the video head impulse test (vHIT). Comparisons would aid selection of gain techniques best related to head impulse characteristics and promote standardisation. Compare and contrast known methods of calculating vHIT VOR gain. We examined lateral canal vHIT responses recorded from 20 patients twice within 13 weeks of acute unilateral peripheral vestibular deficit onset. Ten patients were tested with an ICS Impulse system (GN Otometrics) and 10 with an EyeSeeCam (ESC) system (Interacoustics). Mean gain and variance were computed with area, average sample gain, and regression techniques over specific head angular velocity (HV) and acceleration (HA) intervals. Results for the same gain technique were not different between measurement systems. Area and average sample gain yielded equally lower variances than regression techniques. Gains computed over the whole impulse duration were larger than those computed for increasing HV. Gain over decreasing HV was associated with larger variances. Gains computed around peak HV were smaller than those computed around peak HA. The median gain over 50-70 ms was not different from gain around peak HV. However, depending on technique used, the gain over increasing HV was different from gain around peak HA. Conversion equations between gains obtained with standard ICS and ESC methods were computed. For low gains, the conversion was dominated by a constant that needed to be added to ESC gains to equal ICS gains. We recommend manufacturers standardize vHIT gain calculations using 2 techniques: area gain around peak HA and peak HV.
Defining Nitrogen Kinetics for Air Break in Prebreath
NASA Technical Reports Server (NTRS)
Conkin, Johnny
2010-01-01
Actual tissue nitrogen (N2) kinetics are complex; the uptake and elimination is often approximated with a single half-time compartment in statistical descriptions of denitrogenation [prebreathe(PB)] protocols. Air breaks during PB complicate N2 kinetics. A comparison of symmetrical versus asymmetrical N2 kinetics was performed using the time to onset of hypobaric decompression sickness (DCS) as a surrogate for actual venous N2 tension. METHODS: Published results of 12 tests involving 179 hypobaric exposures in altitude chambers after PB, with and without airbreaks, provide the complex protocols from which to model N2 kinetics. DCS survival time for combined control and airbreaks were described with an accelerated log logistic model where N2 uptake and elimination before, during, and after the airbreak was computed with a simple exponential function or a function that changed half-time depending on ambient N2 partial pressure. P1N2-P2 = (Delta)P defined decompression dose for each altitude exposure, where P2 was the test altitude and P1N2 was computed N2 pressure at the beginning of the altitude exposure. RESULTS: The log likelihood (LL) without decompression dose (null model) was -155.6, and improved (best-fit) to -97.2 when dose was defined with a 240 min half-time for both N2 elimination and uptake during the PB. The description of DCS survival time was less precise with asymmetrical N2 kinetics, for example, LL was -98.9 with 240 min half-time elimination and 120 min half-time uptake. CONCLUSION: The statistical regression described survival time mechanistically linked to symmetrical N2 kinetics during PBs that also included airbreaks. The results are data-specific, and additional data may change the conclusion. The regression is useful to compute additional PB time to compensate for an airbreak in PB within the narrow range of tested conditions.
Defining Nitrogen Kinetics for Air Break in Prebreathe
NASA Technical Reports Server (NTRS)
Conkin, Johnny
2009-01-01
Actual tissue nitrogen (N2) kinetics are complex; the uptake and elimination is often approximated with a single half-time compartment in statistical descriptions of denitrogenation [prebreathe (PB)] protocols. Air breaks during PB complicate N2 kinetics. A comparison of symmetrical versus asymmetrical N2 kinetics was performed using the time to onset of hypobaric decompression sickness (DCS) as a surrogate for actual venous N2 tension. Published results of 12 tests involving 179 hypobaric exposures in altitude chambers after PB, with and without air breaks, provide the complex protocols from which to model N2 kinetics. DCS survival time for combined control and air breaks were described with an accelerated log logistic model where N2 uptake and elimination before, during, and after the air break was computed with a simple exponential function or a function that changed half-time depending on ambient N2 partial pressure. P1N2-P2 = delta P defined DCS dose for each altitude exposure, where P2 was the test altitude and P1N2 was computed N2 pressure at the beginning of the altitude exposure. The log likelihood (LL) without DCS dose (null model) was -155.6, and improved (best-fit) to -97.2 when dose was defined with a 240 min half-time for both N2 elimination and uptake during the PB. The description of DCS survival time was less precise with asymmetrical N2 kinetics, for example, LL was -98.9 with 240 min half-time elimination and 120 min half-time uptake. The statistical regression described survival time mechanistically linked to symmetrical N2 kinetics during PBs that also included air breaks. The results are data-specific, and additional data may change the conclusion. The regression is useful to compute additional PB time to compensate for an air break in PB within the narrow range of tested conditions.
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
2015-05-20
original variable. Residual risk can be exempli ed as a quanti cation of the improved situation faced by a hedging investor compared to that of a...distributional information about Yx for every x as well as the computational cost of evaluating R(Yx) for numerous x, for example within an optimization...Still, when g is costly to evaluate , it might be desirable to develop an approximation of R(Yx), x ∈ IRn through regression based on observations {xj
Friedman tongue position and cone beam computed tomography in patients with obstructive sleep apnea
O'Brien, Louise; Aronovich, Sharon; Shelgikar, Anita; Hoff, Paul; Palmisano, John; Stanley, Jeffrey
2017-01-01
Objective Evaluate the correlation between Friedman Tongue Position (FTP) and airway cephalometrics in patients with obstructive sleep apnea (OSA). Study Design Retrospective review of adult patients with OSA undergoing Cone Beam Computed Tomography (CBCT). Methods Collected data included age, sex, body mass index, apnea hypopnea index, FTP, and airway cephalometric parameters. Data analyses were performed using ANOVA, dichotomous t‐testing, and linear regression. Results 203 patients were included in the analysis. (M:F 132:71). The mean posterior airway space (PAS) was inversely correlated (p = 0.001, r =.119) with higher FTP grades with means of 12.3 mm, 7.9 mm, 6.6 mm, and 4.3 mm, I‐IV respectively. Minimal cross‐sectional area for patients with FTP I‐IV was 245.7, 179.8, 137.6, and 74.2 mm, 2 respectively (p = 0.002, r = .095). Mean hyoid‐mandibular plane (H‐MP) for FTP I‐IV was 20.6 mm, 20.4 mm, 24.7 mm, and 28.9 mm respectively. No statistically significant difference between H‐MP values when comparing patients with FTP I or II (p = 0.22). There were statistically significant differences when these two groups were individually compared to FTP III and IV (p = 0.002). Linear regression analysis confirmed an independent association between FTP and PAS (β = −2.06, p < 0.001), minimal cross‐sectional area (β = −45.07, p = 0.02), and H‐MP (β = 3.03, p = 0.01) controlling for BMI, age, AHI, and sex. Conclusions Use of FTP is supported by objective CBCT cephalometric results, in particular the PAS, minimal cross‐sectional area, and H‐MP. Understanding the correlation between objective measurements of retroglossal collapse should allow Otolaryngologists to more confidently select patients who may require surgery to address the retroglossal area, particularly when the ability to perform cephalometric analysis is not possible Level of Evidence 4. PMID:29094076
Friedman tongue position and cone beam computed tomography in patients with obstructive sleep apnea.
Harvey, Rebecca; O'Brien, Louise; Aronovich, Sharon; Shelgikar, Anita; Hoff, Paul; Palmisano, John; Stanley, Jeffrey
2017-10-01
Evaluate the correlation between Friedman Tongue Position (FTP) and airway cephalometrics in patients with obstructive sleep apnea (OSA). Retrospective review of adult patients with OSA undergoing Cone Beam Computed Tomography (CBCT). Collected data included age, sex, body mass index, apnea hypopnea index, FTP, and airway cephalometric parameters. Data analyses were performed using ANOVA, dichotomous t-testing, and linear regression. 203 patients were included in the analysis. (M:F 132:71). The mean posterior airway space (PAS) was inversely correlated ( p = 0.001, r =.119) with higher FTP grades with means of 12.3 mm, 7.9 mm, 6.6 mm, and 4.3 mm, I-IV respectively. Minimal cross-sectional area for patients with FTP I-IV was 245.7, 179.8, 137.6, and 74.2 mm, 2 respectively ( p = 0.002, r = .095). Mean hyoid-mandibular plane (H-MP) for FTP I-IV was 20.6 mm, 20.4 mm, 24.7 mm, and 28.9 mm respectively. No statistically significant difference between H-MP values when comparing patients with FTP I or II ( p = 0.22). There were statistically significant differences when these two groups were individually compared to FTP III and IV ( p = 0.002). Linear regression analysis confirmed an independent association between FTP and PAS (β = -2.06, p < 0.001), minimal cross-sectional area (β = -45.07, p = 0.02), and H-MP (β = 3.03, p = 0.01) controlling for BMI, age, AHI, and sex. Use of FTP is supported by objective CBCT cephalometric results, in particular the PAS, minimal cross-sectional area, and H-MP. Understanding the correlation between objective measurements of retroglossal collapse should allow Otolaryngologists to more confidently select patients who may require surgery to address the retroglossal area, particularly when the ability to perform cephalometric analysis is not possible. 4.
Li, Qi; Zhang, Gang; Huang, Yuan-Jun; Dong, Mei-Xue; Lv, Fa-Jin; Wei, Xiao; Chen, Jian-Jun; Zhang, Li-Juan; Qin, Xin-Yue; Xie, Peng
2015-08-01
Early hematoma growth is not uncommon in patients with intracerebral hemorrhage and is an independent predictor of poor functional outcome. The purpose of our study was to report and validate the use of our newly identified computed tomographic (CT) blend sign in predicting early hematoma growth. Patients with intracerebral hemorrhage who underwent baseline CT scan within 6 hours after onset of symptoms were included. The follow-up CT scan was performed within 24 hours after the baseline CT scan. Significant hematoma growth was defined as an increase in hematoma volume of >33% or an absolute increase of hematoma volume of >12.5 mL. The blend sign on admission nonenhanced CT was defined as blending of hypoattenuating area and hyperattenuating region with a well-defined margin. Univariate and multivariable logistic regression analyses were performed to assess the relationship between the presence of the blend sign on nonenhanced admission CT and early hematoma growth. A total of 172 patients were included in our study. Blend sign was observed in 29 of 172 (16.9%) patients with intracerebral hemorrhage on baseline nonenhanced CT scan. Of the 61 patients with hematoma growth, 24 (39.3%) had blend sign on admission CT scan. Interobserver agreement for identifying blend sign was excellent between the 2 readers (κ=0.957). The multivariate logistic regression analysis demonstrated that the time to baseline CT scan, initial hematoma volume, and presence of blend sign on baseline CT scan to be independent predictors of early hematoma growth. The sensitivity, specificity, positive and negative predictive values of blend sign for predicting hematoma growth were 39.3%, 95.5%, 82.7%, and 74.1%, respectively. The CT blend sign could be easily identified on regular nonenhanced CT and is highly specific for predicting hematoma growth. © 2015 American Heart Association, Inc.
NASA Astrophysics Data System (ADS)
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
2014-01-01
Background Support vector regression (SVR) and Gaussian process regression (GPR) were used for the analysis of electroanalytical experimental data to estimate diffusion coefficients. Results For simulated cyclic voltammograms based on the EC, Eqr, and EqrC mechanisms these regression algorithms in combination with nonlinear kernel/covariance functions yielded diffusion coefficients with higher accuracy as compared to the standard approach of calculating diffusion coefficients relying on the Nicholson-Shain equation. The level of accuracy achieved by SVR and GPR is virtually independent of the rate constants governing the respective reaction steps. Further, the reduction of high-dimensional voltammetric signals by manual selection of typical voltammetric peak features decreased the performance of both regression algorithms compared to a reduction by downsampling or principal component analysis. After training on simulated data sets, diffusion coefficients were estimated by the regression algorithms for experimental data comprising voltammetric signals for three organometallic complexes. Conclusions Estimated diffusion coefficients closely matched the values determined by the parameter fitting method, but reduced the required computational time considerably for one of the reaction mechanisms. The automated processing of voltammograms according to the regression algorithms yields better results than the conventional analysis of peak-related data. PMID:24987463
Partitioning heritability by functional annotation using genome-wide association summary statistics.
Finucane, Hilary K; Bulik-Sullivan, Brendan; Gusev, Alexander; Trynka, Gosia; Reshef, Yakir; Loh, Po-Ru; Anttila, Verneri; Xu, Han; Zang, Chongzhi; Farh, Kyle; Ripke, Stephan; Day, Felix R; Purcell, Shaun; Stahl, Eli; Lindstrom, Sara; Perry, John R B; Okada, Yukinori; Raychaudhuri, Soumya; Daly, Mark J; Patterson, Nick; Neale, Benjamin M; Price, Alkes L
2015-11-01
Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.
Learning to Predict Combinatorial Structures
NASA Astrophysics Data System (ADS)
Vembu, Shankar
2009-12-01
The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
NASA Astrophysics Data System (ADS)
Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun
2018-02-01
For a drug, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.
Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun
2018-01-01
During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future. PMID:29515993
Field applications of stand-off sensing using visible/NIR multivariate optical computing
NASA Astrophysics Data System (ADS)
Eastwood, DeLyle; Soyemi, Olusola O.; Karunamuni, Jeevanandra; Zhang, Lixia; Li, Hongli; Myrick, Michael L.
2001-02-01
12 A novel multivariate visible/NIR optical computing approach applicable to standoff sensing will be demonstrated with porphyrin mixtures as examples. The ultimate goal is to develop environmental or counter-terrorism sensors for chemicals such as organophosphorus (OP) pesticides or chemical warfare simulants in the near infrared spectral region. The mathematical operation that characterizes prediction of properties via regression from optical spectra is a calculation of inner products between the spectrum and the pre-determined regression vector. The result is scaled appropriately and offset to correspond to the basis from which the regression vector is derived. The process involves collecting spectroscopic data and synthesizing a multivariate vector using a pattern recognition method. Then, an interference coating is designed that reproduces the pattern of the multivariate vector in its transmission or reflection spectrum, and appropriate interference filters are fabricated. High and low refractive index materials such as Nb2O5 and SiO2 are excellent choices for the visible and near infrared regions. The proof of concept has now been established for this system in the visible and will later be extended to chemicals such as OP compounds in the near and mid-infrared.
Stamey, Timothy C.
1998-01-01
Simple and reliable methods for estimating hourly streamflow are needed for the calibration and verification of a Chattahoochee River basin model between Buford Dam and Franklin, Ga. The river basin model is being developed by Georgia Department of Natural Resources, Environmental Protection Division, as part of their Chattahoochee River Modeling Project. Concurrent streamflow data collected at 19 continuous-record, and 31 partial-record streamflow stations, were used in ordinary least-squares linear regression analyses to define estimating equations, and in verifying drainage-area prorations. The resulting regression or drainage-area ratio estimating equations were used to compute hourly streamflow at the partial-record stations. The coefficients of determination (r-squared values) for the regression estimating equations ranged from 0.90 to 0.99. Observed and estimated hourly and daily streamflow data were computed for May 1, 1995, through October 31, 1995. Comparisons of observed and estimated daily streamflow data for 12 continuous-record tributary stations, that had available streamflow data for all or part of the period from May 1, 1995, to October 31, 1995, indicate that the mean error of estimate for the daily streamflow was about 25 percent.
Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery.
Liu, Han; Wang, Lie; Zhao, Tuo
2015-08-01
We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models. Compared with existing methods, CMR calibrates regularization for each regression task with respect to its noise level so that it simultaneously attains improved finite-sample performance and tuning insensitiveness. Theoretically, we provide sufficient conditions under which CMR achieves the optimal rate of convergence in parameter estimation. Computationally, we propose an efficient smoothed proximal gradient algorithm with a worst-case numerical rate of convergence O (1/ ϵ ), where ϵ is a pre-specified accuracy of the objective function value. We conduct thorough numerical simulations to illustrate that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR to solve a brain activity prediction problem and find that it is as competitive as a handcrafted model created by human experts. The R package camel implementing the proposed method is available on the Comprehensive R Archive Network http://cran.r-project.org/web/packages/camel/.
Ultrasound based computer-aided-diagnosis of kidneys for pediatric hydronephrosis
NASA Astrophysics Data System (ADS)
Cerrolaza, Juan J.; Peters, Craig A.; Martin, Aaron D.; Myers, Emmarie; Safdar, Nabile; Linguraru, Marius G.
2014-03-01
Ultrasound is the mainstay of imaging for pediatric hydronephrosis, though its potential as diagnostic tool is limited by its subjective assessment, and lack of correlation with renal function. Therefore, all cases showing signs of hydronephrosis undergo further invasive studies, like diuretic renogram, in order to assess the actual renal function. Under the hypothesis that renal morphology is correlated with renal function, a new ultrasound based computer-aided diagnosis (CAD) tool for pediatric hydronephrosis is presented. From 2D ultrasound, a novel set of morphological features of the renal collecting systems and the parenchyma, is automatically extracted using image analysis techniques. From the original set of features, including size, geometric and curvature descriptors, a subset of ten features are selected as predictive variables, combining a feature selection technique and area under the curve filtering. Using the washout half time (T1/2) as indicative of renal obstruction, two groups are defined. Those cases whose T1/2 is above 30 minutes are considered to be severe, while the rest would be in the safety zone, where diuretic renography could be avoided. Two different classification techniques are evaluated (logistic regression, and support vector machines). Adjusting the probability decision thresholds to operate at the point of maximum sensitivity, i.e., preventing any severe case be misclassified, specificities of 53%, and 75% are achieved, for the logistic regression and the support vector machine classifier, respectively. The proposed CAD system allows to establish a link between non-invasive non-ionizing imaging techniques and renal function, limiting the need for invasive and ionizing diuretic renography.
Wang, Dongmiao; He, Xiaotong; Wang, Yanling; Li, Zhongwu; Zhu, Yumin; Sun, Chao; Ye, Jinhai; Jiang, Hongbing; Cheng, Jie
2017-05-01
The aim of the present study was to assess the incidence and risk factors of ERR in second molars with mesially and horizontally impacted mandibular third molars using cone beam computed tomography (CBCT) images from patients in a Chinese tertiary referral hospital. A total number of 216 patients with 362 mesially and horizontally impacted mandibular third molars who were treated at our institution from 2014 to 2015 was retrospectively included. The ERR in second molars was identified on CBCT multiplanar images. The associations between incidence of ERR and multiple clinical parameters were statistically analyzed by Chi-square test. Moreover, the risk factors for ERR in second molars were further assessed by multivariate regression analysis. The overall incidence of ERR in second molars was 20.17 % (73/362) as detected on CBCT images. The presence of ERR significantly associated with patients age and impaction depth of mandibular third molars. However, no significant relationship was found between ERR severity and impaction depth or ERR location. Multivariate regression analyses further revealed age over 35 years and impaction depth as important risk factors affecting the ERR incidence caused by mesial and horizontal impaction of mandibular third molar. ERR in second molar resulted from mesially and horizontally impacted mandibular third molar is not very rare and can be reliably identified via CBCT scan. Given the possibility of ERR associated with third molar impaction, the prophylactic removal of these impacted teeth could be considered especially for those patients with over 35 years and mesially and horizontally impacted teeth.
Psychological Predictors of Visual and Auditory P300 Brain-Computer Interface Performance
Hammer, Eva M.; Halder, Sebastian; Kleih, Sonja C.; Kübler, Andrea
2018-01-01
Brain-Computer Interfaces (BCIs) provide communication channels independent from muscular control. In the current study we used two versions of the P300-BCI: one based on visual the other on auditory stimulation. Up to now, data on the impact of psychological variables on P300-BCI control are scarce. Hence, our goal was to identify new predictors with a comprehensive psychological test-battery. A total of N = 40 healthy BCI novices took part in a visual and an auditory BCI session. Psychological variables were measured with an electronic test-battery including clinical, personality, and performance tests. The personality factor “emotional stability” was negatively correlated (Spearman's rho = −0.416; p < 0.01) and an output variable of the non-verbal learning test (NVLT), which can be interpreted as ability to learn, correlated positively (Spearman's rho = 0.412; p < 0.01) with visual P300-BCI performance. In a linear regression analysis both independent variables explained 24% of the variance. “Emotional stability” was also negatively related to auditory P300-BCI performance (Spearman's rho = −0.377; p < 0.05), but failed significance in the regression analysis. Psychological parameters seem to play a moderate role in visual P300-BCI performance. “Emotional stability” was identified as a new predictor, indicating that BCI users who characterize themselves as calm and rational showed worse BCI performance. The positive relation of the ability to learn and BCI performance corroborates the notion that also for P300 based BCIs learning may constitute an important factor. Further studies are needed to consolidate or reject the presented predictors. PMID:29867319
Ground temperature measurement by PRT-5 for maps experiment
NASA Technical Reports Server (NTRS)
Gupta, S. K.; Tiwari, S. N.
1978-01-01
A simple algorithm and computer program were developed for determining the actual surface temperature from the effective brightness temperature as measured remotely by a radiation thermometer called PRT-5. This procedure allows the computation of atmospheric correction to the effective brightness temperature without performing detailed radiative transfer calculations. Model radiative transfer calculations were performed to compute atmospheric corrections for several values of the surface and atmospheric parameters individually and in combination. Polynomial regressions were performed between the magnitudes or deviations of these parameters and the corresponding computed corrections to establish simple analytical relations between them. Analytical relations were also developed to represent combined correction for simultaneous variation of parameters in terms of their individual corrections.
Science literacy by technology by country: USA, Finland and Mexico. making sense of it all
NASA Astrophysics Data System (ADS)
Papanastasiou, Elena C.
2003-02-01
The purpose of this study was to examine how variables related to computer availability, computer comfort and educational software are associated with higher or lower levels of science literacy in the USA, Finland and Mexico, after controlling for the socio-economic status of the students. The analyses for this study were based on a series of multivariate regression models. The data were obtained from the Program for International Student Assessment. The results of this study showed that it was not computer use itself that had a positive or negative effect on the science achievement of the students, but the way in which the computers were used within the context of each country.
2016-11-22
structure of the graph, we replace the ℓ1- norm by the nonconvex Capped -ℓ1 norm , and obtain the Generalized Capped -ℓ1 regularized logistic regression...X. M. Yuan. Linearized augmented lagrangian and alternating direction methods for nuclear norm minimization. Mathematics of Computation, 82(281):301...better approximations of ℓ0- norm theoretically and computationally beyond ℓ1- norm , for example, the compressive sensing (Xiao et al., 2011). The
Cost Estimation Techniques for C3I System Software.
1984-07-01
opment manmonth have been determined for maxi, midi , and mini .1 type computers. Small to median size timeshared developments used 0.2 to 1.5 hours...development schedule 1.23 1.00 1.10 2.1.3 Detailed Model The final codification of the COCOMO regressions was the development of separate effort...regardless of the software structure level being estimated: D8VC -- the expected development computer (maxi. midi . mini, micro) MODE -- the expected
Predictability of Top of Descent Location for Operational Idle-Thrust Descents
NASA Technical Reports Server (NTRS)
Stell, Laurel L.
2010-01-01
To enable arriving aircraft to fly optimized descents computed by the flight management system (FMS) in congested airspace, ground automation must accurately predict descent trajectories. To support development of the trajectory predictor and its uncertainty models, commercial flights executed idle-thrust descents at a specified descent speed, and the recorded data included the specified descent speed profile, aircraft weight, and the winds entered into the FMS as well as the radar data. The FMS computed the intended descent path assuming idle thrust after top of descent (TOD), and the controllers and pilots then endeavored to allow the FMS to fly the descent to the meter fix with minimal human intervention. The horizontal flight path, cruise and meter fix altitudes, and actual TOD location were extracted from the radar data. Using approximately 70 descents each in Boeing 757 and Airbus 319/320 aircraft, multiple regression estimated TOD location as a linear function of the available predictive factors. The cruise and meter fix altitudes, descent speed, and wind clearly improve goodness of fit. The aircraft weight improves fit for the Airbus descents but not for the B757. Except for a few statistical outliers, the residuals have absolute value less than 5 nmi. Thus, these predictive factors adequately explain the TOD location, which indicates the data do not include excessive noise.
Brain Positron Emission Tomography-Computed Tomography Gender Differences in Tinnitus Patients.
Shlamkovich, Nathan; Gavriel, Haim; Eviatar, Ephraim; Lorberboym, Mordechay; Aviram, Eliad
2016-10-01
Increased metabolism in the left auditory cortex has been reported in tinnitus patients. However, gender difference has not been addressed. To assess the differences in Positron emission tomography-computed tomography (PET-CT) results between the genders in tinnitus patients. Retrospective cohort. Included were patients referred to our clinic between January 2011 and August 2013 who complained of tinnitus and underwent fluorodeoxyglucose (FDG)-PET to assess brain metabolism. Univariate and multivariate nominal logistic regressions were used to evaluate the association between upper temporal gyrus (UTG; right and left) and gender. Included were 140 patients (87 males) with an average age of 52.5 yr (median = 53.1). Bilateral tinnitus was found in 85 patients (60.7%), left sided in 30 (21.4%), and right sided in 21(15%). Increased uptake in the UTG was found in 60% of the patients on either side. Males had a statistically significant increased uptake in the UTG in those with unilateral tinnitus and in the entire population. We present the largest study reported so far on tinnitus patients who have undergone FDG-PET-CT. We found a statistically significant difference between the genders in FDG uptake by the UTG. Further investigations should be undertaken to reveal the etiologies for these differences and to assess different therapeutic protocols according to gender. American Academy of Audiology
Huang, Yu-Sen; Hsu, Hsao-Hsun; Chen, Jo-Yu; Tai, Mei-Hwa; Jaw, Fu-Shan; Chang, Yeun-Chung
2014-01-01
This study strived to evaluate the relationship between degree of pulmonary emphysema and cardiac ventricular function in chronic obstructive pulmonary disease (COPD) patients with pulmonary hypertension (PH) using electrocardiographic-gated multidetector computed tomography (CT). Lung transplantation candidates with the diagnosis of COPD and PH were chosen for the study population, and a total of 15 patients were included. The extent of emphysema is defined as the percentage of voxels below -910 Hounsfield units in the lung windows in whole lung CT without intravenous contrast. Heart function parameters were measured by electrocardiographic-gated CT angiography. Linear regression analysis was conducted to examine the associations between percent emphysema and heart function indicators. Significant correlations were found between percent emphysema and right ventricular (RV) measurements, including RV end-diastolic volume (R(2) = 0.340, p = 0.023), RV stroke volume (R(2) = 0.406, p = 0.011), and RV cardiac output (R(2) = 0.382, p = 0.014); the correlations between percent emphysema and left ventricular function indicators were not observed. The study revealed that percent emphysema is correlated with RV dysfunction among COPD patients with PH. Based on our findings, percent emphysema can be considered for use as an indicator to predict the severity of right ventricular dysfunction among COPD patients.
Multi-scale computational study of the mechanical regulation of cell mitotic rounding in epithelia
Xu, Zhiliang; Zartman, Jeremiah J.; Alber, Mark
2017-01-01
Mitotic rounding during cell division is critical for preventing daughter cells from inheriting an abnormal number of chromosomes, a condition that occurs frequently in cancer cells. Cells must significantly expand their apical area and transition from a polygonal to circular apical shape to achieve robust mitotic rounding in epithelial tissues, which is where most cancers initiate. However, how cells mechanically regulate robust mitotic rounding within packed tissues is unknown. Here, we analyze mitotic rounding using a newly developed multi-scale subcellular element computational model that is calibrated using experimental data. Novel biologically relevant features of the model include separate representations of the sub-cellular components including the apical membrane and cytoplasm of the cell at the tissue scale level as well as detailed description of cell properties during mitotic rounding. Regression analysis of predictive model simulation results reveals the relative contributions of osmotic pressure, cell-cell adhesion and cortical stiffness to mitotic rounding. Mitotic area expansion is largely driven by regulation of cytoplasmic pressure. Surprisingly, mitotic shape roundness within physiological ranges is most sensitive to variation in cell-cell adhesivity and stiffness. An understanding of how perturbed mechanical properties impact mitotic rounding has important potential implications on, amongst others, how tumors progressively become more genetically unstable due to increased chromosomal aneuploidy and more aggressive. PMID:28531187
Wild, P; Gonzalez, M; Bourgkard, E; Courouble, N; Clément-Duchêne, C; Martinet, Y; Févotte, J; Paris, C
2012-03-27
The aim of this study was to compute attributable fractions (AF) to occupational factors in an area in North-Eastern France with high lung cancer rates and a past of mining and steel industry. A population-based case-control study among males aged 40-79 was conducted, including confirmed primary lung cancer cases from all hospitals of the study region. Controls were stratified by broad age-classes, district and socioeconomic classes. Detailed occupational and personal risk factors were obtained in face-to-face interviews. Cumulative occupational exposure indices were obtained from the questionnaires. Attributable fractions were computed from multiple unconditional logistic regression models. A total of 246 cases and 531 controls were included. The odds ratios (ORs) adjusted on cumulative smoking and family history of lung cancer increased significantly with the cumulative exposure indices to asbestos, polycyclic aromatic hydrocarbons and crystalline silica, and with exposure to diesel motor exhaust. The AF for occupational factors exceeded 50%, the most important contributor being crystalline silica and asbestos. These AFs are higher than most published figures. This can be because of the highly industrialised area or methods for exposure assessments. Occupational factors are important risk factors and should not be forgotten when defining high-risk lung cancer populations.
Continuous monitoring of sediment and nutrients in the Illinois River at Florence, Illinois, 2012-13
Terrio, Paul J.; Straub, Timothy D.; Domanski, Marian M.; Siudyla, Nicholas A.
2015-01-01
The Illinois River is the largest river in Illinois and is the primary contributing watershed for nitrogen, phosphorus, and suspended-sediment loading to the upper Mississippi River from Illinois. In addition to streamflow, the following water-quality constituents were monitored at the Illinois River at Florence, Illinois (U.S. Geological Survey station number 05586300), during May 2012–October 2013: phosphate, nitrate, turbidity, temperature, specific conductance, pH, and dissolved oxygen. The objectives of this monitoring were to (1) determine performance capabilities of the in-situ instruments; (2) collect continuous data that would provide an improved understanding of constituent characteristics during normal, low-, and high-flow periods and during different climatic and land-use seasons; (3) evaluate the ability to use continuous turbidity as a surrogate constituent to determine suspended-sediment concentrations; and (4) evaluate the ability to develop a regression model for total phosphorus using phosphate, turbidity, and other measured parameters. Reliable data collection was achieved, following some initial periods of instrument and data-communication difficulties. The resulting regression models for suspended sediment had coefficient of determination (R2) values of about 0.9. Nitrate plus nitrite loads computed using continuous data were found to be approximately 8 percent larger than loads computed using traditional discrete-sampling based models. A regression model for total phosphorus was developed by using historic orthophosphate data (important during periods of low flow and low concentrations) and historic suspended-sediment data (important during periods of high flow and higher concentrations). The R2of the total phosphorus regression model using orthophosphorus and suspended sediment was 0.8. Data collection and refinement of the regression models is ongoing.
Internal Structure of Kidney Calculi as a Predictor for Shockwave Lithotripsy Success.
Christiansen, Frederikke Eichner; Andreassen, Kim Hovgaard; Osther, Susanne Sloth; Osther, Palle Joern Sloth
2016-03-01
The internal structure of renal calculi can be determined on CT using bone windows and may be classified as homogeneous or inhomogeneous with void regions. In vitro studies have shown homogeneous stones to be less responsive to extracorporeal shockwave lithotripsy (SWL). The objective was to evaluate whether the internal morphology of calculi defined by CT bone window influences SWL outcome in vivo. One hundred eleven patients with solitary renal calculi treated with SWL were included. Treatment data were registered prospectively and follow-up data were collected retrospectively. All patients had noncontrast computed tomography (NCCT) performed before SWL and at 3-month follow-up. The stones were categorized as homogeneous or inhomogeneous. At follow-up, the patient's stone status was registered. Stone-free status was defined as no evidence of calculi on NCCT. Treatment was considered successful if the patient was either stone free or had clinically insignificant residual fragments. Using simple logistic regression, the odds for being stone free 3 months post-SWL were significantly reduced in the patients with inhomogeneous stones compared with patients with homogeneous stones (odds ratio 0.43 [95% confidence interval 0.20, 0.92; p < 0.05]). However, when adjusting for stone size by multiple logistic regression, including stone size (area) as a covariate, this difference became insignificant. The internal structure of kidney stones did not predict the outcome of SWL in vivo.
Detecting Nano-Scale Vibrations in Rotating Devices by Using Advanced Computational Methods
del Toro, Raúl M.; Haber, Rodolfo E.; Schmittdiel, Michael C.
2010-01-01
This paper presents a computational method for detecting vibrations related to eccentricity in ultra precision rotation devices used for nano-scale manufacturing. The vibration is indirectly measured via a frequency domain analysis of the signal from a piezoelectric sensor attached to the stationary component of the rotating device. The algorithm searches for particular harmonic sequences associated with the eccentricity of the device rotation axis. The detected sequence is quantified and serves as input to a regression model that estimates the eccentricity. A case study presents the application of the computational algorithm during precision manufacturing processes. PMID:22399918
Techniques for estimating flood-peak discharges of rural, unregulated streams in Ohio
Koltun, G.F.
2003-01-01
Regional equations for estimating 2-, 5-, 10-, 25-, 50-, 100-, and 500-year flood-peak discharges at ungaged sites on rural, unregulated streams in Ohio were developed by means of ordinary and generalized least-squares (GLS) regression techniques. One-variable, simple equations and three-variable, full-model equations were developed on the basis of selected basin characteristics and flood-frequency estimates determined for 305 streamflow-gaging stations in Ohio and adjacent states. The average standard errors of prediction ranged from about 39 to 49 percent for the simple equations, and from about 34 to 41 percent for the full-model equations. Flood-frequency estimates determined by means of log-Pearson Type III analyses are reported along with weighted flood-frequency estimates, computed as a function of the log-Pearson Type III estimates and the regression estimates. Values of explanatory variables used in the regression models were determined from digital spatial data sets by means of a geographic information system (GIS), with the exception of drainage area, which was determined by digitizing the area within basin boundaries manually delineated on topographic maps. Use of GIS-based explanatory variables represents a major departure in methodology from that described in previous reports on estimating flood-frequency characteristics of Ohio streams. Examples are presented illustrating application of the regression equations to ungaged sites on ungaged and gaged streams. A method is provided to adjust regression estimates for ungaged sites by use of weighted and regression estimates for a gaged site on the same stream. A region-of-influence method, which employs a computer program to estimate flood-frequency characteristics for ungaged sites based on data from gaged sites with similar characteristics, was also tested and compared to the GLS full-model equations. For all recurrence intervals, the GLS full-model equations had superior prediction accuracy relative to the simple equations and therefore are recommended for use.
Cawyer, Chase R; Anderson, Sarah B; Szychowski, Jeff M; Neely, Cherry; Owen, John
2018-03-01
To compare the accuracy of a new regression-derived formula developed from the National Fetal Growth Studies data to the common alternative method that uses the average of the gestational ages (GAs) calculated for each fetal biometric measurement (biparietal diameter, head circumference, abdominal circumference, and femur length). This retrospective cross-sectional study identified nonanomalous singleton pregnancies that had a crown-rump length plus at least 1 additional sonographic examination with complete fetal biometric measurements. With the use of the crown-rump length to establish the referent estimated date of delivery, each method's (National Institute of Child Health and Human Development regression versus Hadlock average [Radiology 1984; 152:497-501]), error at every examination was computed. Error, defined as the difference between the crown-rump length-derived GA and each method's predicted GA (weeks), was compared in 3 GA intervals: 1 (14 weeks-20 weeks 6 days), 2 (21 weeks-28 weeks 6 days), and 3 (≥29 weeks). In addition, the proportion of each method's examinations that had errors outside prespecified (±) day ranges was computed by using odds ratios. A total of 16,904 sonograms were identified. The overall and prespecified GA range subset mean errors were significantly smaller for the regression compared to the average (P < .01), and the regression had significantly lower odds of observing examinations outside the specified range of error in GA intervals 2 (odds ratio, 1.15; 95% confidence interval, 1.01-1.31) and 3 (odds ratio, 1.24; 95% confidence interval, 1.17-1.32) than the average method. In a contemporary unselected population of women dated by a crown-rump length-derived GA, the National Institute of Child Health and Human Development regression formula produced fewer estimates outside a prespecified margin of error than the commonly used Hadlock average; the differences were most pronounced for GA estimates at 29 weeks and later. © 2017 by the American Institute of Ultrasound in Medicine.
Alcohol Interventions Among Underage Drinkers in the ED: A Randomized Controlled Trial.
Cunningham, Rebecca M; Chermack, Stephen T; Ehrlich, Peter F; Carter, Patrick M; Booth, Brenda M; Blow, Frederic C; Barry, Kristen L; Walton, Maureen A
2015-10-01
This study examined the efficacy of emergency department (ED)-based brief interventions (BIs), delivered by a computer or therapist, with and without a post-ED session, on alcohol consumption and consequences over 12 months. Patients (ages 14-20 years) screening positive for risky drinking were randomized to: BI (n = 277), therapist BI (n = 278), or control (n = 281). After the 3-month follow-up, participants were randomized to receive a post-ED BI session or control. Incorporating motivational interviewing, the BIs addressed alcohol consumption and consequences, including driving under the influence (DUI), and alcohol-related injury, as well as other concomitant drug use. The computer BI was an offline, Facebook-styled program. Among 4389 patients screened, 1054 patients reported risky drinking and 836 were enrolled in the randomized controlled trial. Regression models examined the main effects of the intervention conditions (versus control) and the interaction effects (ED condition × post-ED condition) on primary outcomes. The therapist and computer BIs significantly reduced consumption at 3 months, consequences at 3 and 12 months, and prescription drug use at 12 months; the computer BI reduced the frequency of DUI at 12 months; and the therapist BI reduced the frequency of alcohol-related injury at 12 months. The post-ED session reduced alcohol consequences at 6 months, benefiting those who had not received a BI in the ED. A single-session BI, delivered by a computer or therapist in the ED, shows promise for underage drinkers. Findings for the fully automated stand-alone computer BI are particularly appealing given the ease of future implementation. Copyright © 2015 by the American Academy of Pediatrics.
HIV-related ocular microangiopathic syndrome and color contrast sensitivity.
Geier, S A; Hammel, G; Bogner, J R; Kronawitter, U; Berninger, T; Goebel, F D
1994-06-01
Color vision deficits in patients with acquired immunodeficiency syndrome (AIDS) or human immunodeficiency virus (HIV) disease were reported, and a retinal pathogenic mechanism was proposed. The purpose of this study was to evaluate the association of color vision deficits with HIV-related retinal microangiopathy. A computer graphics system was used to measure protan, deutan, and tritan color contrast sensitivity (CCS) thresholds in 60 HIV-infected patients. Retinal microangiopathy was measured by counting the number of cotton-wool spots, and conjunctival blood-flow sludging was determined. Additional predictors were CD4+ count, age, time on aerosolized pentamidine, time on zidovudine, and Walter Reed staging. The relative influence of each predictor was calculated by stepwise multiple regression analysis (inclusion criterion; incremental P value = < 0.05) using data for the right eyes (RE). The results were validated by using data for the left eyes (LE) and both eyes (BE). The only included predictors in multiple regression analyses for the RE were number of cotton-wool spots (tritan: R = .70; deutan: R = .46; and protan: R = .58; P < .0001 for all axes) and age (tritan: increment of R [Ri] = .05, P = .002; deutan: Ri = .10, P = .004; and protan: Ri = .05, P = .002). The predictors time on zidovudine (Ri = .05, P = .002) and Walter Reed staging (Ri = .03, P = .01) were additionally included in multiple regression analysis for tritan LE. The results for deutan LE were comparable to those for the RE. In the analysis for protan LE, the only included predictor was number of cotton-wool spots. In the analyses for BE, no further predictors were included. The predictors Walter Reed staging and CD4+ count showed a significant association with all three criteria in univariate analysis. Additionally, tritan CCS was significantly associated with conjunctival blood-flow sludging. CCS deficits in patients with HIV disease are primarily associated with the number of cotton-wool spots. Results of this study are in accordance with the hypothesis that CCS deficits are in a relevant part caused by neuroretinal damage secondary to HIV-related microangiopathy.
Exploration, Sampling, And Reconstruction of Free Energy Surfaces with Gaussian Process Regression.
Mones, Letif; Bernstein, Noam; Csányi, Gábor
2016-10-11
Practical free energy reconstruction algorithms involve three separate tasks: biasing, measuring some observable, and finally reconstructing the free energy surface from those measurements. In more than one dimension, adaptive schemes make it possible to explore only relatively low lying regions of the landscape by progressively building up the bias toward the negative of the free energy surface so that free energy barriers are eliminated. Most schemes use the final bias as their best estimate of the free energy surface. We show that large gains in computational efficiency, as measured by the reduction of time to solution, can be obtained by separating the bias used for dynamics from the final free energy reconstruction itself. We find that biasing with metadynamics, measuring a free energy gradient estimator, and reconstructing using Gaussian process regression can give an order of magnitude reduction in computational cost.
Prediction of elemental creep. [steady state and cyclic data from regression analysis
NASA Technical Reports Server (NTRS)
Davis, J. W.; Rummler, D. R.
1975-01-01
Cyclic and steady-state creep tests were performed to provide data which were used to develop predictive equations. These equations, describing creep as a function of stress, temperature, and time, were developed through the use of a least squares regression analyses computer program for both the steady-state and cyclic data sets. Comparison of the data from the two types of tests, revealed that there was no significant difference between the cyclic and steady-state creep strains for the L-605 sheet under the experimental conditions investigated (for the same total time at load). Attempts to develop a single linear equation describing the combined steady-state and cyclic creep data resulted in standard errors of estimates higher than obtained for the individual data sets. A proposed approach to predict elemental creep in metals uses the cyclic creep equation and a computer program which applies strain and time hardening theories of creep accumulation.