Logistic regression applied to natural hazards: rare event logistic regression with replications
NASA Astrophysics Data System (ADS)
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
NASA Astrophysics Data System (ADS)
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Enhancement of partial robust M-regression (PRM) performance using Bisquare weight function
NASA Astrophysics Data System (ADS)
Mohamad, Mazni; Ramli, Norazan Mohamed; Ghani@Mamat, Nor Azura Md; Ahmad, Sanizah
2014-09-01
Partial Least Squares (PLS) regression is a popular regression technique for handling multicollinearity in low and high dimensional data which fits a linear relationship between sets of explanatory and response variables. Several robust PLS methods are proposed to accommodate the classical PLS algorithms which are easily affected with the presence of outliers. The recent one was called partial robust M-regression (PRM). Unfortunately, the use of monotonous weighting function in the PRM algorithm fails to assign appropriate and proper weights to large outliers according to their severity. Thus, in this paper, a modified partial robust M-regression is introduced to enhance the performance of the original PRM. A re-descending weight function, known as Bisquare weight function is recommended to replace the fair function in the PRM. A simulation study is done to assess the performance of the modified PRM and its efficiency is also tested in both contaminated and uncontaminated simulated data under various percentages of outliers, sample sizes and number of predictors.
Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers
NASA Technical Reports Server (NTRS)
Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.
2010-01-01
This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.
Notes on power of normality tests of error terms in regression models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
A Weighted Least Squares Approach To Robustify Least Squares Estimates.
ERIC Educational Resources Information Center
Lin, Chowhong; Davenport, Ernest C., Jr.
This study developed a robust linear regression technique based on the idea of weighted least squares. In this technique, a subsample of the full data of interest is drawn, based on a measure of distance, and an initial set of regression coefficients is calculated. The rest of the data points are then taken into the subsample, one after another,…
A diagnostic analysis of the VVP single-doppler retrieval technique
NASA Technical Reports Server (NTRS)
Boccippio, Dennis J.
1995-01-01
A diagnostic analysis of the VVP (volume velocity processing) retrieval method is presented, with emphasis on understanding the technique as a linear, multivariate regression. Similarities and differences to the velocity-azimuth display and extended velocity-azimuth display retrieval techniques are discussed, using this framework. Conventional regression diagnostics are then employed to quantitatively determine situations in which the VVP technique is likely to fail. An algorithm for preparation and analysis of a robust VVP retrieval is developed and applied to synthetic and actual datasets with high temporal and spatial resolution. A fundamental (but quantifiable) limitation to some forms of VVP analysis is inadequate sampling dispersion in the n space of the multivariate regression, manifest as a collinearity between the basis functions of some fitted parameters. Such collinearity may be present either in the definition of these basis functions or in their realization in a given sampling configuration. This nonorthogonality may cause numerical instability, variance inflation (decrease in robustness), and increased sensitivity to bias from neglected wind components. It is shown that these effects prevent the application of VVP to small azimuthal sectors of data. The behavior of the VVP regression is further diagnosed over a wide range of sampling constraints, and reasonable sector limits are established.
An application of robust ridge regression model in the presence of outliers to real data problem
NASA Astrophysics Data System (ADS)
Shariff, N. S. Md.; Ferdaos, N. A.
2017-09-01
Multicollinearity and outliers are often leads to inconsistent and unreliable parameter estimates in regression analysis. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is believed are affected by the presence of outlier. The combination of GM-estimation and ridge parameter that is robust towards both problems is on interest in this study. As such, both techniques are employed to investigate the relationship between stock market price and macroeconomic variables in Malaysia due to curiosity of involving the multicollinearity and outlier problem in the data set. There are four macroeconomic factors selected for this study which are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The results demonstrate that the proposed procedure is able to produce reliable results towards the presence of multicollinearity and outliers in the real data.
Graphical Evaluation of the Ridge-Type Robust Regression Estimators in Mixture Experiments
Erkoc, Ali; Emiroglu, Esra
2014-01-01
In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set. PMID:25202738
Graphical evaluation of the ridge-type robust regression estimators in mixture experiments.
Erkoc, Ali; Emiroglu, Esra; Akay, Kadri Ulas
2014-01-01
In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.
Robust regression on noisy data for fusion scaling laws
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be; Laboratoire de Physique des Plasmas de l'ERM - Laboratorium voor Plasmafysica van de KMS
2014-11-15
We introduce the method of geodesic least squares (GLS) regression for estimating fusion scaling laws. Based on straightforward principles, the method is easily implemented, yet it clearly outperforms established regression techniques, particularly in cases of significant uncertainty on both the response and predictor variables. We apply GLS for estimating the scaling of the L-H power threshold, resulting in estimates for ITER that are somewhat higher than predicted earlier.
Passing the Test: Ecological Regression Analysis in the Los Angeles County Case and Beyond.
ERIC Educational Resources Information Center
Lichtman, Allan J.
1991-01-01
Statistical analysis of racially polarized voting prepared for the Garza v County of Los Angeles (California) (1990) voting rights case is reviewed to demonstrate that ecological regression is a flexible, robust technique that illuminates the reality of ethnic voting, and superior to the neighborhood model supported by the defendants. (SLD)
A robust background regression based score estimation algorithm for hyperspectral anomaly detection
NASA Astrophysics Data System (ADS)
Zhao, Rui; Du, Bo; Zhang, Liangpei; Zhang, Lefei
2016-12-01
Anomaly detection has become a hot topic in the hyperspectral image analysis and processing fields in recent years. The most important issue for hyperspectral anomaly detection is the background estimation and suppression. Unreasonable or non-robust background estimation usually leads to unsatisfactory anomaly detection results. Furthermore, the inherent nonlinearity of hyperspectral images may cover up the intrinsic data structure in the anomaly detection. In order to implement robust background estimation, as well as to explore the intrinsic data structure of the hyperspectral image, we propose a robust background regression based score estimation algorithm (RBRSE) for hyperspectral anomaly detection. The Robust Background Regression (RBR) is actually a label assignment procedure which segments the hyperspectral data into a robust background dataset and a potential anomaly dataset with an intersection boundary. In the RBR, a kernel expansion technique, which explores the nonlinear structure of the hyperspectral data in a reproducing kernel Hilbert space, is utilized to formulate the data as a density feature representation. A minimum squared loss relationship is constructed between the data density feature and the corresponding assigned labels of the hyperspectral data, to formulate the foundation of the regression. Furthermore, a manifold regularization term which explores the manifold smoothness of the hyperspectral data, and a maximization term of the robust background average density, which suppresses the bias caused by the potential anomalies, are jointly appended in the RBR procedure. After this, a paired-dataset based k-nn score estimation method is undertaken on the robust background and potential anomaly datasets, to implement the detection output. The experimental results show that RBRSE achieves superior ROC curves, AUC values, and background-anomaly separation than some of the other state-of-the-art anomaly detection methods, and is easy to implement in practice.
Kesselmeier, Miriam; Lorenzo Bermejo, Justo
2017-11-01
Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Robust estimation approach for blind denoising.
Rabie, Tamer
2005-11-01
This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.
Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H.
2016-01-04
Recent studies, and most of their predecessors, use tide gage data to quantify SL acceleration, ASL(t). In the current study, three techniques were used to calculate acceleration from tide gage data, and of those examined, it was determined that the two techniques based on sliding a regression window through the time series are more robust compared to the technique that fits a single quadratic form to the entire time series, particularly if there is temporal variation in the magnitude of the acceleration. The single-fit quadratic regression method has been the most commonly used technique in determining acceleration in tide gage data. The inability of the single-fit method to account for time-varying acceleration may explain some of the inconsistent findings between investigators. Properly quantifying ASL(t) from field measurements is of particular importance in evaluating numerical models of past, present, and future SLR resulting from anticipated climate change.
Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions.
Drouard, Vincent; Horaud, Radu; Deleforge, Antoine; Ba, Sileye; Evangelidis, Georgios
2017-03-01
Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging, because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose to use a mixture of linear regressions with partially-latent output. This regression method learns to map high-dimensional feature vectors (extracted from bounding boxes of faces) onto the joint space of head-pose angles and bounding-box shifts, such that they are robustly predicted in the presence of unobservable phenomena. We describe in detail the mapping method that combines the merits of unsupervised manifold learning techniques and of mixtures of regressions. We validate our method with three publicly available data sets and we thoroughly benchmark four variants of the proposed algorithm with several state-of-the-art head-pose estimation methods.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Robust biological parametric mapping: an improved technique for multimodal brain image analysis
NASA Astrophysics Data System (ADS)
Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.
2011-03-01
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.
USDA-ARS?s Scientific Manuscript database
Spectral scattering is useful for nondestructive sensing of fruit firmness. Prediction models, however, are typically built using multivariate statistical methods such as partial least squares regression (PLSR), whose performance generally depends on the characteristics of the data. The aim of this ...
Cole-Cole, linear and multivariate modeling of capacitance data for on-line monitoring of biomass.
Dabros, Michal; Dennewald, Danielle; Currie, David J; Lee, Mark H; Todd, Robert W; Marison, Ian W; von Stockar, Urs
2009-02-01
This work evaluates three techniques of calibrating capacitance (dielectric) spectrometers used for on-line monitoring of biomass: modeling of cell properties using the theoretical Cole-Cole equation, linear regression of dual-frequency capacitance measurements on biomass concentration, and multivariate (PLS) modeling of scanning dielectric spectra. The performance and robustness of each technique is assessed during a sequence of validation batches in two experimental settings of differing signal noise. In more noisy conditions, the Cole-Cole model had significantly higher biomass concentration prediction errors than the linear and multivariate models. The PLS model was the most robust in handling signal noise. In less noisy conditions, the three models performed similarly. Estimates of the mean cell size were done additionally using the Cole-Cole and PLS models, the latter technique giving more satisfactory results.
Neural network uncertainty assessment using Bayesian statistics: a remote sensing application
NASA Technical Reports Server (NTRS)
Aires, F.; Prigent, C.; Rossow, W. B.
2004-01-01
Neural network (NN) techniques have proved successful for many regression problems, in particular for remote sensing; however, uncertainty estimates are rarely provided. In this article, a Bayesian technique to evaluate uncertainties of the NN parameters (i.e., synaptic weights) is first presented. In contrast to more traditional approaches based on point estimation of the NN weights, we assess uncertainties on such estimates to monitor the robustness of the NN model. These theoretical developments are illustrated by applying them to the problem of retrieving surface skin temperature, microwave surface emissivities, and integrated water vapor content from a combined analysis of satellite microwave and infrared observations over land. The weight uncertainty estimates are then used to compute analytically the uncertainties in the network outputs (i.e., error bars and correlation structure of these errors). Such quantities are very important for evaluating any application of an NN model. The uncertainties on the NN Jacobians are then considered in the third part of this article. Used for regression fitting, NN models can be used effectively to represent highly nonlinear, multivariate functions. In this situation, most emphasis is put on estimating the output errors, but almost no attention has been given to errors associated with the internal structure of the regression model. The complex structure of dependency inside the NN is the essence of the model, and assessing its quality, coherency, and physical character makes all the difference between a blackbox model with small output errors and a reliable, robust, and physically coherent model. Such dependency structures are described to the first order by the NN Jacobians: they indicate the sensitivity of one output with respect to the inputs of the model for given input data. We use a Monte Carlo integration procedure to estimate the robustness of the NN Jacobians. A regularization strategy based on principal component analysis is proposed to suppress the multicollinearities in order to make these Jacobians robust and physically meaningful.
Global image registration using a symmetric block-matching approach
Modat, Marc; Cash, David M.; Daga, Pankaj; Winston, Gavin P.; Duncan, John S.; Ourselin, Sébastien
2014-01-01
Abstract. Most medical image registration algorithms suffer from a directionality bias that has been shown to largely impact subsequent analyses. Several approaches have been proposed in the literature to address this bias in the context of nonlinear registration, but little work has been done for global registration. We propose a symmetric approach based on a block-matching technique and least-trimmed square regression. The proposed method is suitable for multimodal registration and is robust to outliers in the input images. The symmetric framework is compared with the original asymmetric block-matching technique and is shown to outperform it in terms of accuracy and robustness. The methodology presented in this article has been made available to the community as part of the NiftyReg open-source package. PMID:26158035
A Regression Design Approach to Optimal and Robust Spacing Selection.
1981-07-01
Hassanein (1968, 1969a, 1969b, 1971, 1972, 1977), Kulldorf (1963), Kulldorf and Vannman (1973), Rhodin (1976), Sarhan and Greenberg (1958, 1962) and...of d0 and Q0 1 d 0 "Q0 ’ are in the reproducing kernel Hilbert space (RKHS) generated by R, the techniques developed by Parzen (1961a, 1961b) may be... Greenberg , B.G. (1958). Estimation problems in the exponential distribution using order statistics. Proceedings of the Statistical Techniques in Missile
Balabin, Roman M; Lomakina, Ekaterina I
2011-04-21
In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.
Robust inference under the beta regression model with application to health care studies.
Ghosh, Abhik
2017-01-01
Data on rates, percentages, or proportions arise frequently in many different applied disciplines like medical biology, health care, psychology, and several others. In this paper, we develop a robust inference procedure for the beta regression model, which is used to describe such response variables taking values in (0, 1) through some related explanatory variables. In relation to the beta regression model, the issue of robustness has been largely ignored in the literature so far. The existing maximum likelihood-based inference has serious lack of robustness against outliers in data and generate drastically different (erroneous) inference in the presence of data contamination. Here, we develop the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model along with several applications. We derive their asymptotic properties and describe their robustness theoretically through the influence function analyses. Finite sample performances of the proposed estimators and tests are examined through suitable simulation studies and real data applications in the context of health care and psychology. Although we primarily focus on the beta regression models with a fixed dispersion parameter, some indications are also provided for extension to the variable dispersion beta regression models with an application.
2016-09-15
18] under the context of robust parameter design for simulation. Bellucci’s technique is used in this research, primarily because the interior -point...Fundamentals of Radial Basis Neural Network (RBNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.2.2.2 Design of Experiments...with Neural Nets . . . . . . . . . . . . . 31 1.2.2.3 Factorial Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.2.2.4
León, Larry F; Cai, Tianxi
2012-04-01
In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.
Robust scaling laws for energy confinement time, including radiated fraction, in Tokamaks
NASA Astrophysics Data System (ADS)
Murari, A.; Peluso, E.; Gaudio, P.; Gelfusa, M.
2017-12-01
In recent years, the limitations of scalings in power-law form that are obtained from traditional log regression have become increasingly evident in many fields of research. Given the wide gap in operational space between present-day and next-generation devices, robustness of the obtained models in guaranteeing reasonable extrapolability is a major issue. In this paper, a new technique, called symbolic regression, is reviewed, refined, and applied to the ITPA database for extracting scaling laws of the energy-confinement time at different radiated fraction levels. The main advantage of this new methodology is its ability to determine the most appropriate mathematical form of the scaling laws to model the available databases without the restriction of their having to be power laws. In a completely new development, this technique is combined with the concept of geodesic distance on Gaussian manifolds so as to take into account the error bars in the measurements and provide more reliable models. Robust scaling laws, including radiated fractions as regressor, have been found; they are not in power-law form, and are significantly better than the traditional scalings. These scaling laws, including radiated fractions, extrapolate quite differently to ITER, and therefore they require serious consideration. On the other hand, given the limitations of the existing databases, dedicated experimental investigations will have to be carried out to fully understand the impact of radiated fractions on the confinement in metallic machines and in the next generation of devices.
NASA Astrophysics Data System (ADS)
Bellugi, D. G.; Tennant, C.; Larsen, L.
2016-12-01
Catchment and climate heterogeneity complicate prediction of runoff across time and space, and resulting parameter uncertainty can lead to large accumulated errors in hydrologic models, particularly in ungauged basins. Recently, data-driven modeling approaches have been shown to avoid the accumulated uncertainty associated with many physically-based models, providing an appealing alternative for hydrologic prediction. However, the effectiveness of different methods in hydrologically and geomorphically distinct catchments, and the robustness of these methods to changing climate and changing hydrologic processes remain to be tested. Here, we evaluate the use of machine learning techniques to predict daily runoff across time and space using only essential climatic forcing (e.g. precipitation, temperature, and potential evapotranspiration) time series as model input. Model training and testing was done using a high quality dataset of daily runoff and climate forcing data for 25+ years for 600+ minimally-disturbed catchments (drainage area range 5-25,000 km2, median size 336 km2) that cover a wide range of climatic and physical characteristics. Preliminary results using Support Vector Regression (SVR) suggest that in some catchments this nonlinear-based regression technique can accurately predict daily runoff, while the same approach fails in other catchments, indicating that the representation of climate inputs and/or catchment filter characteristics in the model structure need further refinement to increase performance. We bolster this analysis by using Sparse Identification of Nonlinear Dynamics (a sparse symbolic regression technique) to uncover the governing equations that describe runoff processes in catchments where SVR performed well and for ones where it performed poorly, thereby enabling inference about governing processes. This provides a robust means of examining how catchment complexity influences runoff prediction skill, and represents a contribution towards the integration of data-driven inference and physically-based models.
Revisiting the southern pine growth decline: Where are we 10 years later?
Gary L. Gadbury; Michael S. Williams; Hans T. Schreuder
2004-01-01
This paper evaluates changes in growth of pine stands in the state of Georgia, U.S.A., using USDA Forest Service Forest Inventory and Analysis (FIA) data. In particular, data representing an additional 10-year growth cy-cle has been added to previously published results from two earlier growth cycles. A robust regression procedure is combined with a bootstrap technique...
A robust and efficient stepwise regression method for building sparse polynomial chaos expansions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abraham, Simon, E-mail: Simon.Abraham@ulb.ac.be; Raisee, Mehrdad; Ghorbaniasl, Ghader
2017-03-01
Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selectionmore » criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.« less
Jang, Dae -Heung; Anderson-Cook, Christine Michaela
2016-11-22
With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jang, Dae -Heung; Anderson-Cook, Christine Michaela
With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less
Robust support vector regression networks for function approximation with outliers.
Chuang, Chen-Chia; Su, Shun-Feng; Jeng, Jin-Tsong; Hsiao, Chih-Ching
2002-01-01
Support vector regression (SVR) employs the support vector machine (SVM) to tackle problems of function approximation and regression estimation. SVR has been shown to have good robust properties against noise. When the parameters used in SVR are improperly selected, overfitting phenomena may still occur. However, the selection of various parameters is not straightforward. Besides, in SVR, outliers may also possibly be taken as support vectors. Such an inclusion of outliers in support vectors may lead to seriously overfitting phenomena. In this paper, a novel regression approach, termed as the robust support vector regression (RSVR) network, is proposed to enhance the robust capability of SVR. In the approach, traditional robust learning approaches are employed to improve the learning performance for any selected parameters. From the simulation results, our RSVR can always improve the performance of the learned systems for all cases. Besides, it can be found that even the training lasted for a long period, the testing errors would not go up. In other words, the overfitting phenomenon is indeed suppressed.
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications
Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric
2016-01-01
Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
Wilcox, Rand; Carlson, Mike; Azen, Stan; Clark, Florence
2013-03-01
Recently, there have been major advances in statistical techniques for assessing central tendency and measures of association. The practical utility of modern methods has been documented extensively in the statistics literature, but they remain underused and relatively unknown in clinical trials. Our objective was to address this issue. STUDY DESIGN AND PURPOSE: The first purpose was to review common problems associated with standard methodologies (low power, lack of control over type I errors, and incorrect assessments of the strength of the association). The second purpose was to summarize some modern methods that can be used to circumvent such problems. The third purpose was to illustrate the practical utility of modern robust methods using data from the Well Elderly 2 randomized controlled trial. In multiple instances, robust methods uncovered differences among groups and associations among variables that were not detected by classic techniques. In particular, the results demonstrated that details of the nature and strength of the association were sometimes overlooked when using ordinary least squares regression and Pearson correlation. Modern robust methods can make a practical difference in detecting and describing differences between groups and associations between variables. Such procedures should be applied more frequently when analyzing trial-based data. Copyright © 2013 Elsevier Inc. All rights reserved.
Evaluation of laser cutting process with auxiliary gas pressure by soft computing approach
NASA Astrophysics Data System (ADS)
Lazov, Lyubomir; Nikolić, Vlastimir; Jovic, Srdjan; Milovančević, Miloš; Deneva, Heristina; Teirumenieka, Erika; Arsic, Nebojsa
2018-06-01
Evaluation of the optimal laser cutting parameters is very important for the high cut quality. This is highly nonlinear process with different parameters which is the main challenge in the optimization process. Data mining methodology is one of most versatile method which can be used laser cutting process optimization. Support vector regression (SVR) procedure is implemented since it is a versatile and robust technique for very nonlinear data regression. The goal in this study was to determine the optimal laser cutting parameters to ensure robust condition for minimization of average surface roughness. Three cutting parameters, the cutting speed, the laser power, and the assist gas pressure, were used in the investigation. As a laser type TruLaser 1030 technological system was used. Nitrogen as an assisted gas was used in the laser cutting process. As the data mining method, support vector regression procedure was used. Data mining prediction accuracy was very high according the coefficient (R2) of determination and root mean square error (RMSE): R2 = 0.9975 and RMSE = 0.0337. Therefore the data mining approach could be used effectively for determination of the optimal conditions of the laser cutting process.
Pearl, D L; Louie, M; Chui, L; Doré, K; Grimsrud, K M; Martin, S W; Michel, P; Svenson, L W; McEwen, S A
2008-04-01
Using multivariable models, we compared whether there were significant differences between reported outbreak and sporadic cases in terms of their sex, age, and mode and site of disease transmission. We also determined the potential role of administrative, temporal, and spatial factors within these models. We compared a variety of approaches to account for clustering of cases in outbreaks including weighted logistic regression, random effects models, general estimating equations, robust variance estimates, and the random selection of one case from each outbreak. Age and mode of transmission were the only epidemiologically and statistically significant covariates in our final models using the above approaches. Weighing observations in a logistic regression model by the inverse of their outbreak size appeared to be a relatively robust and valid means for modelling these data. Some analytical techniques, designed to account for clustering, had difficulty converging or producing realistic measures of association.
Adjustment of geochemical background by robust multivariate statistics
Zhou, D.
1985-01-01
Conventional analyses of exploration geochemical data assume that the background is a constant or slowly changing value, equivalent to a plane or a smoothly curved surface. However, it is better to regard the geochemical background as a rugged surface, varying with changes in geology and environment. This rugged surface can be estimated from observed geological, geochemical and environmental properties by using multivariate statistics. A method of background adjustment was developed and applied to groundwater and stream sediment reconnaissance data collected from the Hot Springs Quadrangle, South Dakota, as part of the National Uranium Resource Evaluation (NURE) program. Source-rock lithology appears to be a dominant factor controlling the chemical composition of groundwater or stream sediments. The most efficacious adjustment procedure is to regress uranium concentration on selected geochemical and environmental variables for each lithologic unit, and then to delineate anomalies by a common threshold set as a multiple of the standard deviation of the combined residuals. Robust versions of regression and RQ-mode principal components analysis techniques were used rather than ordinary techniques to guard against distortion caused by outliers Anomalies delineated by this background adjustment procedure correspond with uranium prospects much better than do anomalies delineated by conventional procedures. The procedure should be applicable to geochemical exploration at different scales for other metals. ?? 1985.
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
NASA Technical Reports Server (NTRS)
Smith, Kelly; Gay, Robert; Stachowiak, Susan
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
NASA Technical Reports Server (NTRS)
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
2013-01-01
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter. In order to increase overall robustness, the vehicle also has an alternate method of triggering the drogue parachute deployment based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this velocity-based trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers excellent performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
NASA Astrophysics Data System (ADS)
Shariff, Nurul Sima Mohamad; Ferdaos, Nur Aqilah
2017-08-01
Multicollinearity often leads to inconsistent and unreliable parameter estimates in regression analysis. This situation will be more severe in the presence of outliers it will cause fatter tails in the error distributions than the normal distributions. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is expected to be affected by the presence of outliers due to some assumptions imposed in the modeling procedure. Thus, the robust version of existing ridge method with some modification in the inverse matrix and the estimated response value is introduced. The performance of the proposed method is discussed and comparisons are made with several existing estimators namely, Ordinary Least Squares (OLS), ridge regression and robust ridge regression based on GM-estimates. The finding of this study is able to produce reliable parameter estimates in the presence of both multicollinearity and outliers in the data.
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Robust Variable Selection with Exponential Squared Loss.
Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping
2013-04-01
Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are [Formula: see text] and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods.
Robust Variable Selection with Exponential Squared Loss
Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping
2013-01-01
Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are n-consistent and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods. PMID:23913996
Robust regression for large-scale neuroimaging studies.
Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand
2015-05-01
Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.
Motulsky, Harvey J; Brown, Ronald E
2006-01-01
Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.
2009-01-01
Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)
NASA Astrophysics Data System (ADS)
Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul
2018-05-01
The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.
Robust Eye Center Localization through Face Alignment and Invariant Isocentric Patterns
Teng, Dongdong; Chen, Dihu; Tan, Hongzhou
2015-01-01
The localization of eye centers is a very useful cue for numerous applications like face recognition, facial expression recognition, and the early screening of neurological pathologies. Several methods relying on available light for accurate eye-center localization have been exploited. However, despite the considerable improvements that eye-center localization systems have undergone in recent years, only few of these developments deal with the challenges posed by the profile (non-frontal face). In this paper, we first use the explicit shape regression method to obtain the rough location of the eye centers. Because this method extracts global information from the human face, it is robust against any changes in the eye region. We exploit this robustness and utilize it as a constraint. To locate the eye centers accurately, we employ isophote curvature features, the accuracy of which has been demonstrated in a previous study. By applying these features, we obtain a series of eye-center locations which are candidates for the actual position of the eye-center. Among these locations, the estimated locations which minimize the reconstruction error between the two methods mentioned above are taken as the closest approximation for the eye centers locations. Therefore, we combine explicit shape regression and isophote curvature feature analysis to achieve robustness and accuracy, respectively. In practical experiments, we use BioID and FERET datasets to test our approach to obtaining an accurate eye-center location while retaining robustness against changes in scale and pose. In addition, we apply our method to non-frontal faces to test its robustness and accuracy, which are essential in gaze estimation but have seldom been mentioned in previous works. Through extensive experimentation, we show that the proposed method can achieve a significant improvement in accuracy and robustness over state-of-the-art techniques, with our method ranking second in terms of accuracy. According to our implementation on a PC with a Xeon 2.5Ghz CPU, the frame rate of the eye tracking process can achieve 38 Hz. PMID:26426929
Torija, Antonio J; Ruiz, Diego P
2015-02-01
The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.
Liu, Rong; Li, Xi; Zhang, Wei; Zhou, Hong-Hao
2015-01-01
Objective Multiple linear regression (MLR) and machine learning techniques in pharmacogenetic algorithm-based warfarin dosing have been reported. However, performances of these algorithms in racially diverse group have never been objectively evaluated and compared. In this literature-based study, we compared the performances of eight machine learning techniques with those of MLR in a large, racially-diverse cohort. Methods MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied in warfarin dose algorithms in a cohort from the International Warfarin Pharmacogenetics Consortium database. Covariates obtained by stepwise regression from 80% of randomly selected patients were used to develop algorithms. To compare the performances of these algorithms, the mean percentage of patients whose predicted dose fell within 20% of the actual dose (mean percentage within 20%) and the mean absolute error (MAE) were calculated in the remaining 20% of patients. The performances of these techniques in different races, as well as the dose ranges of therapeutic warfarin were compared. Robust results were obtained after 100 rounds of resampling. Results BART, MARS and SVR were statistically indistinguishable and significantly out performed all the other approaches in the whole cohort (MAE: 8.84–8.96 mg/week, mean percentage within 20%: 45.88%–46.35%). In the White population, MARS and BART showed higher mean percentage within 20% and lower mean MAE than those of MLR (all p values < 0.05). In the Asian population, SVR, BART, MARS and LAR performed the same as MLR. MLR and LAR optimally performed among the Black population. When patients were grouped in terms of warfarin dose range, all machine learning techniques except ANN and LAR showed significantly higher mean percentage within 20%, and lower MAE (all p values < 0.05) than MLR in the low- and high- dose ranges. Conclusion Overall, machine learning-based techniques, BART, MARS and SVR performed superior than MLR in warfarin pharmacogenetic dosing. Differences of algorithms’ performances exist among the races. Moreover, machine learning-based algorithms tended to perform better in the low- and high- dose ranges than MLR. PMID:26305568
Estimating monotonic rates from biological data using local linear regression.
Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R
2017-03-01
Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.
Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F
2018-06-01
This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Rank-preserving regression: a more robust rank regression model against outliers.
Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M
2016-08-30
Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Regression estimators for generic health-related quality of life and quality-adjusted life years.
Basu, Anirban; Manca, Andrea
2012-01-01
To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.
Incremental online learning in high dimensions.
Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan
2005-12-01
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood
Bondell, Howard D.; Stefanski, Leonard A.
2013-01-01
Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805
Taylor, Jeremy M G; Cheng, Wenting; Foster, Jared C
2015-03-01
A recent article (Zhang et al., 2012, Biometrics 168, 1010-1018) compares regression based and inverse probability based methods of estimating an optimal treatment regime and shows for a small number of covariates that inverse probability weighted methods are more robust to model misspecification than regression methods. We demonstrate that using models that fit the data better reduces the concern about non-robustness for the regression methods. We extend the simulation study of Zhang et al. (2012, Biometrics 168, 1010-1018), also considering the situation of a larger number of covariates, and show that incorporating random forests into both regression and inverse probability weighted based methods improves their properties. © 2014, The International Biometric Society.
Piovesan, Davide; Pierobon, Alberto; DiZio, Paul; Lackner, James R
2012-01-01
This study presents and validates a Time-Frequency technique for measuring 2-dimensional multijoint arm stiffness throughout a single planar movement as well as during static posture. It is proposed as an alternative to current regressive methods which require numerous repetitions to obtain average stiffness on a small segment of the hand trajectory. The method is based on the analysis of the reassigned spectrogram of the arm's response to impulsive perturbations and can estimate arm stiffness on a trial-by-trial basis. Analytic and empirical methods are first derived and tested through modal analysis on synthetic data. The technique's accuracy and robustness are assessed by modeling the estimation of stiffness time profiles changing at different rates and affected by different noise levels. Our method obtains results comparable with two well-known regressive techniques. We also test how the technique can identify the viscoelastic component of non-linear and higher than second order systems with a non-parametrical approach. The technique proposed here is very impervious to noise and can be used easily for both postural and movement tasks. Estimations of stiffness profiles are possible with only one perturbation, making our method a useful tool for estimating limb stiffness during motor learning and adaptation tasks, and for understanding the modulation of stiffness in individuals with neurodegenerative diseases.
Calibration of remotely sensed, coarse resolution NDVI to CO2 fluxes in a sagebrush–steppe ecosystem
Wylie, Bruce K.; Johnson, Douglas A.; Laca, Emilio; Saliendra, Nicanor Z.; Gilmanov, Tagir G.; Reed, Bradley C.; Tieszen, Larry L.; Worstell, Bruce B.
2003-01-01
The net ecosystem exchange (NEE) of carbon flux can be partitioned into gross primary productivity (GPP) and respiration (R). The contribution of remote sensing and modeling holds the potential to predict these components and map them spatially and temporally. This has obvious utility to quantify carbon sink and source relationships and to identify improved land management strategies for optimizing carbon sequestration. The objective of our study was to evaluate prediction of 14-day average daytime CO2 fluxes (Fday) and nighttime CO2 fluxes (Rn) using remote sensing and other data. Fday and Rnwere measured with a Bowen ratio–energy balance (BREB) technique in a sagebrush (Artemisia spp.)–steppe ecosystem in northeast Idaho, USA, during 1996–1999. Micrometeorological variables aggregated across 14-day periods and time-integrated Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (iNDVI) were determined during four growing seasons (1996–1999) and used to predict Fday and Rn. We found that iNDVI was a strong predictor of Fday(R2=0.79, n=66, P<0.0001). Inclusion of evapotranspiration in the predictive equation led to improved predictions of Fday (R2=0.82, n=66, P<0.0001). Crossvalidation indicated that regression tree predictions of Fday were prone to overfitting and that linear regression models were more robust. Multiple regression and regression tree models predicted Rn quite well (R2=0.75–0.77, n=66) with the regression tree model being slightly more robust in crossvalidation. Temporal mapping of Fday and Rn is possible with these techniques and would allow the assessment of NEE in sagebrush–steppe ecosystems. Simulations of periodic Fday measurements, as might be provided by a mobile flux tower, indicated that such measurements could be used in combination with iNDVI to accurately predict Fday. These periodic measurements could maximize the utility of expensive flux towers for evaluating various carbon management strategies, carbon certification, and validation and calibration of carbon flux models.
Calibration of remotely sensed, coarse resolution NDVI to CO2 fluxes in a sagebrush-steppe ecosystem
Wylie, B.K.; Johnson, D.A.; Laca, Emilio; Saliendra, Nicanor Z.; Gilmanov, T.G.; Reed, B.C.; Tieszen, L.L.; Worstell, B.B.
2003-01-01
The net ecosystem exchange (NEE) of carbon flux can be partitioned into gross primary productivity (GPP) and respiration (R). The contribution of remote sensing and modeling holds the potential to predict these components and map them spatially and temporally. This has obvious utility to quantify carbon sink and source relationships and to identify improved land management strategies for optimizing carbon sequestration. The objective of our study was to evaluate prediction of 14-day average daytime CO2 fluxes (Fday) and nighttime CO2 fluxes (Rn) using remote sensing and other data. Fday and Rn were measured with a Bowen ratio-energy balance (BREB) technique in a sagebrush (Artemisia spp.)-steppe ecosystem in northeast Idaho, USA, during 1996-1999. Micrometeorological variables aggregated across 14-day periods and time-integrated Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (iNDVI) were determined during four growing seasons (1996-1999) and used to predict Fday and Rn. We found that iNDVI was a strong predictor of Fday (R2 = 0.79, n = 66, P < 0.0001). Inclusion of evapotranspiration in the predictive equation led to improved predictions of Fday (R2= 0.82, n = 66, P < 0.0001). Crossvalidation indicated that regression tree predictions of Fday were prone to overfitting and that linear regression models were more robust. Multiple regression and regression tree models predicted Rn quite well (R2 = 0.75-0.77, n = 66) with the regression tree model being slightly more robust in crossvalidation. Temporal mapping of Fday and Rn is possible with these techniques and would allow the assessment of NEE in sagebrush-steppe ecosystems. Simulations of periodic Fday measurements, as might be provided by a mobile flux tower, indicated that such measurements could be used in combination with iNDVI to accurately predict Fday. These periodic measurements could maximize the utility of expensive flux towers for evaluating various carbon management strategies, carbon certification, and validation and calibration of carbon flux models. ?? 2003 Elsevier Science Inc. All rights reserved.
A quantile regression model for failure-time data with time-dependent covariates
Gorfine, Malka; Goldberg, Yair; Ritov, Ya’acov
2017-01-01
Summary Since survival data occur over time, often important covariates that we wish to consider also change over time. Such covariates are referred as time-dependent covariates. Quantile regression offers flexible modeling of survival data by allowing the covariates to vary with quantiles. This article provides a novel quantile regression model accommodating time-dependent covariates, for analyzing survival data subject to right censoring. Our simple estimation technique assumes the existence of instrumental variables. In addition, we present a doubly-robust estimator in the sense of Robins and Rotnitzky (1992, Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N. P., Dietz, K. and Farewell, V. T. (editors), AIDS Epidemiology. Boston: Birkhaäuser, pp. 297–331.). The asymptotic properties of the estimators are rigorously studied. Finite-sample properties are demonstrated by a simulation study. The utility of the proposed methodology is demonstrated using the Stanford heart transplant dataset. PMID:27485534
Multivariate Analysis of Seismic Field Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alam, M. Kathleen
1999-06-01
This report includes the details of the model building procedure and prediction of seismic field data. Principal Components Regression, a multivariate analysis technique, was used to model seismic data collected as two pieces of equipment were cycled on and off. Models built that included only the two pieces of equipment of interest had trouble predicting data containing signals not included in the model. Evidence for poor predictions came from the prediction curves as well as spectral F-ratio plots. Once the extraneous signals were included in the model, predictions improved dramatically. While Principal Components Regression performed well for the present datamore » sets, the present data analysis suggests further work will be needed to develop more robust modeling methods as the data become more complex.« less
Macrocell path loss prediction using artificial intelligence techniques
NASA Astrophysics Data System (ADS)
Usman, Abraham U.; Okereke, Okpo U.; Omizegba, Elijah E.
2014-04-01
The prediction of propagation loss is a practical non-linear function approximation problem which linear regression or auto-regression models are limited in their ability to handle. However, some computational Intelligence techniques such as artificial neural networks (ANNs) and adaptive neuro-fuzzy inference systems (ANFISs) have been shown to have great ability to handle non-linear function approximation and prediction problems. In this study, the multiple layer perceptron neural network (MLP-NN), radial basis function neural network (RBF-NN) and an ANFIS network were trained using actual signal strength measurement taken at certain suburban areas of Bauchi metropolis, Nigeria. The trained networks were then used to predict propagation losses at the stated areas under differing conditions. The predictions were compared with the prediction accuracy of the popular Hata model. It was observed that ANFIS model gave a better fit in all cases having higher R2 values in each case and on average is more robust than MLP and RBF models as it generalises better to a different data.
Data-driven discovery of partial differential equations.
Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan
2017-04-01
We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.
Extending the Distributed Lag Model framework to handle chemical mixtures.
Bello, Ghalib A; Arora, Manish; Austin, Christine; Horton, Megan K; Wright, Robert O; Gennings, Chris
2017-07-01
Distributed Lag Models (DLMs) are used in environmental health studies to analyze the time-delayed effect of an exposure on an outcome of interest. Given the increasing need for analytical tools for evaluation of the effects of exposure to multi-pollutant mixtures, this study attempts to extend the classical DLM framework to accommodate and evaluate multiple longitudinally observed exposures. We introduce 2 techniques for quantifying the time-varying mixture effect of multiple exposures on an outcome of interest. Lagged WQS, the first technique, is based on Weighted Quantile Sum (WQS) regression, a penalized regression method that estimates mixture effects using a weighted index. We also introduce Tree-based DLMs, a nonparametric alternative for assessment of lagged mixture effects. This technique is based on the Random Forest (RF) algorithm, a nonparametric, tree-based estimation technique that has shown excellent performance in a wide variety of domains. In a simulation study, we tested the feasibility of these techniques and evaluated their performance in comparison to standard methodology. Both methods exhibited relatively robust performance, accurately capturing pre-defined non-linear functional relationships in different simulation settings. Further, we applied these techniques to data on perinatal exposure to environmental metal toxicants, with the goal of evaluating the effects of exposure on neurodevelopment. Our methods identified critical neurodevelopmental windows showing significant sensitivity to metal mixtures. Copyright © 2017 Elsevier Inc. All rights reserved.
Efficient robust doubly adaptive regularized regression with applications.
Karunamuni, Rohana J; Kong, Linglong; Tu, Wei
2018-01-01
We consider the problem of estimation and variable selection for general linear regression models. Regularized regression procedures have been widely used for variable selection, but most existing methods perform poorly in the presence of outliers. We construct a new penalized procedure that simultaneously attains full efficiency and maximum robustness. Furthermore, the proposed procedure satisfies the oracle properties. The new procedure is designed to achieve sparse and robust solutions by imposing adaptive weights on both the decision loss and the penalty function. The proposed method of estimation and variable selection attains full efficiency when the model is correct and, at the same time, achieves maximum robustness when outliers are present. We examine the robustness properties using the finite-sample breakdown point and an influence function. We show that the proposed estimator attains the maximum breakdown point. Furthermore, there is no loss in efficiency when there are no outliers or the error distribution is normal. For practical implementation of the proposed method, we present a computational algorithm. We examine the finite-sample and robustness properties using Monte Carlo studies. Two datasets are also analyzed.
NASA Astrophysics Data System (ADS)
Ahmed, Shamim; Miorelli, Roberto; Calmon, Pierre; Anselmi, Nicola; Salucci, Marco
2018-04-01
This paper describes Learning-By-Examples (LBE) technique for performing quasi real time flaw localization and characterization within a conductive tube based on Eddy Current Testing (ECT) signals. Within the framework of LBE, the combination of full-factorial (i.e., GRID) sampling and Partial Least Squares (PLS) feature extraction (i.e., GRID-PLS) techniques are applied for generating a suitable training set in offine phase. Support Vector Regression (SVR) is utilized for model development and inversion during offine and online phases, respectively. The performance and robustness of the proposed GIRD-PLS/SVR strategy on noisy test set is evaluated and compared with standard GRID/SVR approach.
Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis
ERIC Educational Resources Information Center
Williams, Ryan
2013-01-01
The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…
Measurement Consistency from Magnetic Resonance Images
Chung, Dongjun; Chung, Moo K.; Durtschi, Reid B.; Lindell, R. Gentry; Vorperian, Houri K.
2010-01-01
Rationale and Objectives In quantifying medical images, length-based measurements are still obtained manually. Due to possible human error, a measurement protocol is required to guarantee the consistency of measurements. In this paper, we review various statistical techniques that can be used in determining measurement consistency. The focus is on detecting a possible measurement bias and determining the robustness of the procedures to outliers. Materials and Methods We review correlation analysis, linear regression, Bland-Altman method, paired t-test, and analysis of variance (ANOVA). These techniques were applied to measurements, obtained by two raters, of head and neck structures from magnetic resonance images (MRI). Results The correlation analysis and the linear regression were shown to be insufficient for detecting measurement inconsistency. They are also very sensitive to outliers. The widely used Bland-Altman method is a visualization technique so it lacks the numerical quantification. The paired t-test tends to be sensitive to small measurement bias. On the other hand, ANOVA performs well even under small measurement bias. Conclusion In almost all cases, using only one method is insufficient and it is recommended to use several methods simultaneously. In general, ANOVA performs the best. PMID:18790405
Robust extraction of baseline signal of atmospheric trace species using local regression
NASA Astrophysics Data System (ADS)
Ruckstuhl, A. F.; Henne, S.; Reimann, S.; Steinbacher, M.; Vollmer, M. K.; O'Doherty, S.; Buchmann, B.; Hueglin, C.
2012-11-01
The identification of atmospheric trace species measurements that are representative of well-mixed background air masses is required for monitoring atmospheric composition change at background sites. We present a statistical method based on robust local regression that is well suited for the selection of background measurements and the estimation of associated baseline curves. The bootstrap technique is applied to calculate the uncertainty in the resulting baseline curve. The non-parametric nature of the proposed approach makes it a very flexible data filtering method. Application to carbon monoxide (CO) measured from 1996 to 2009 at the high-alpine site Jungfraujoch (Switzerland, 3580 m a.s.l.), and to measurements of 1,1-difluoroethane (HFC-152a) from Jungfraujoch (2000 to 2009) and Mace Head (Ireland, 1995 to 2009) demonstrates the feasibility and usefulness of the proposed approach. The determined average annual change of CO at Jungfraujoch for the 1996 to 2009 period as estimated from filtered annual mean CO concentrations is -2.2 ± 1.1 ppb yr-1. For comparison, the linear trend of unfiltered CO measurements at Jungfraujoch for this time period is -2.9 ± 1.3 ppb yr-1.
Robust Regression for Slope Estimation in Curriculum-Based Measurement Progress Monitoring
ERIC Educational Resources Information Center
Mercer, Sterett H.; Lyons, Alina F.; Johnston, Lauren E.; Millhoff, Courtney L.
2015-01-01
Although ordinary least-squares (OLS) regression has been identified as a preferred method to calculate rates of improvement for individual students during curriculum-based measurement (CBM) progress monitoring, OLS slope estimates are sensitive to the presence of extreme values. Robust estimators have been developed that are less biased by…
A note on variance estimation in random effects meta-regression.
Sidik, Kurex; Jonkman, Jeffrey N
2005-01-01
For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.
Hu, Yi-Chung
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
Piovesan, Davide; Pierobon, Alberto; DiZio, Paul; Lackner, James R.
2012-01-01
This study presents and validates a Time-Frequency technique for measuring 2-dimensional multijoint arm stiffness throughout a single planar movement as well as during static posture. It is proposed as an alternative to current regressive methods which require numerous repetitions to obtain average stiffness on a small segment of the hand trajectory. The method is based on the analysis of the reassigned spectrogram of the arm's response to impulsive perturbations and can estimate arm stiffness on a trial-by-trial basis. Analytic and empirical methods are first derived and tested through modal analysis on synthetic data. The technique's accuracy and robustness are assessed by modeling the estimation of stiffness time profiles changing at different rates and affected by different noise levels. Our method obtains results comparable with two well-known regressive techniques. We also test how the technique can identify the viscoelastic component of non-linear and higher than second order systems with a non-parametrical approach. The technique proposed here is very impervious to noise and can be used easily for both postural and movement tasks. Estimations of stiffness profiles are possible with only one perturbation, making our method a useful tool for estimating limb stiffness during motor learning and adaptation tasks, and for understanding the modulation of stiffness in individuals with neurodegenerative diseases. PMID:22448233
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.
Robust Methods for Moderation Analysis with a Two-Level Regression Model.
Yang, Miao; Yuan, Ke-Hai
2016-01-01
Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.
Robust neural network with applications to credit portfolio data analysis.
Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun
2010-01-01
In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.
Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, G., E-mail: geert.verdoolaege@ugent.be; Laboratory for Plasma Physics, Royal Military Academy, B-1000 Brussels; Shabbir, A.
Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standardmore » least squares.« less
NASA Astrophysics Data System (ADS)
Adhikari, Nilanjan; Amin, Sk. Abdul; Saha, Achintya; Jha, Tarun
2018-03-01
Matrix metalloproteinase-2 (MMP-2) is a promising pharmacological target for designing potential anticancer drugs. MMP-2 plays critical functions in apoptosis by cleaving the DNA repair enzyme namely poly (ADP-ribose) polymerase (PARP). Moreover, MMP-2 expression triggers the vascular endothelial growth factor (VEGF) having a positive influence on tumor size, invasion, and angiogenesis. Therefore, it is an urgent need to develop potential MMP-2 inhibitors without any toxicity but better pharmacokinetic property. In this article, robust validated multi-quantitative structure-activity relationship (QSAR) modeling approaches were attempted on a dataset of 222 MMP-2 inhibitors to explore the important structural and pharmacophoric requirements for higher MMP-2 inhibition. Different validated regression and classification-based QSARs, pharmacophore mapping and 3D-QSAR techniques were performed. These results were challenged and subjected to further validation to explain 24 in house MMP-2 inhibitors to judge the reliability of these models further. All these models were individually validated internally as well as externally and were supported and validated by each other. These results were further justified by molecular docking analysis. Modeling techniques adopted here not only helps to explore the necessary structural and pharmacophoric requirements but also for the overall validation and refinement techniques for designing potential MMP-2 inhibitors.
NASA Astrophysics Data System (ADS)
Mulyani, Sri; Andriyana, Yudhie; Sudartianto
2017-03-01
Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.
Data-driven discovery of partial differential equations
Rudy, Samuel H.; Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan
2017-01-01
We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg–de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable. PMID:28508044
Robust functional regression model for marginal mean and subject-specific inferences.
Cao, Chunzheng; Shi, Jian Qing; Lee, Youngjo
2017-01-01
We introduce flexible robust functional regression models, using various heavy-tailed processes, including a Student t-process. We propose efficient algorithms in estimating parameters for the marginal mean inferences and in predicting conditional means as well as interpolation and extrapolation for the subject-specific inferences. We develop bootstrap prediction intervals (PIs) for conditional mean curves. Numerical studies show that the proposed model provides a robust approach against data contamination or distribution misspecification, and the proposed PIs maintain the nominal confidence levels. A real data application is presented as an illustrative example.
Income elasticity of health expenditures in Iran.
Zare, Hossein; Trujillo, Antonio J; Leidman, Eva; Buttorff, Christine
2013-09-01
Because of its policy implications, the income elasticity of health care expenditures is a subject of much debate. Governments may have an interest in subsidizing the care of those with low income. Using more than two decades of data from the Iran Household Expenditure and Income Survey, this article investigates the relationship between income and health care expenditure in urban and rural areas in Iran, a resource rich, upper-middle-income country. We implemented spline and quantile regression techniques to obtain a more robust description of the relationship of interest. This study finds non-uniform effects of income on health expenditures. Although the results show that health care is a necessity for all income brackets, spline regression estimates indicate that the income elasticity is lowest for the poorest Iranians in urban and rural areas. This suggests that they will show low flexibility in medical expenses as income fluctuates. Further, a quantile regression model assessing the effect of income at different level of medical expenditure suggests that households with lower medical expenses are less elastic.
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
Robust nonlinear system identification: Bayesian mixture of experts using the t-distribution
NASA Astrophysics Data System (ADS)
Baldacchino, Tara; Worden, Keith; Rowson, Jennifer
2017-02-01
A novel variational Bayesian mixture of experts model for robust regression of bifurcating and piece-wise continuous processes is introduced. The mixture of experts model is a powerful model which probabilistically splits the input space allowing different models to operate in the separate regions. However, current methods have no fail-safe against outliers. In this paper, a robust mixture of experts model is proposed which consists of Student-t mixture models at the gates and Student-t distributed experts, trained via Bayesian inference. The Student-t distribution has heavier tails than the Gaussian distribution, and so it is more robust to outliers, noise and non-normality in the data. Using both simulated data and real data obtained from the Z24 bridge this robust mixture of experts performs better than its Gaussian counterpart when outliers are present. In particular, it provides robustness to outliers in two forms: unbiased parameter regression models, and robustness to overfitting/complex models.
A Novel Continuous Blood Pressure Estimation Approach Based on Data Mining Techniques.
Miao, Fen; Fu, Nan; Zhang, Yuan-Ting; Ding, Xiao-Rong; Hong, Xi; He, Qingyun; Li, Ye
2017-11-01
Continuous blood pressure (BP) estimation using pulse transit time (PTT) is a promising method for unobtrusive BP measurement. However, the accuracy of this approach must be improved for it to be viable for a wide range of applications. This study proposes a novel continuous BP estimation approach that combines data mining techniques with a traditional mechanism-driven model. First, 14 features derived from simultaneous electrocardiogram and photoplethysmogram signals were extracted for beat-to-beat BP estimation. A genetic algorithm-based feature selection method was then used to select BP indicators for each subject. Multivariate linear regression and support vector regression were employed to develop the BP model. The accuracy and robustness of the proposed approach were validated for static, dynamic, and follow-up performance. Experimental results based on 73 subjects showed that the proposed approach exhibited excellent accuracy in static BP estimation, with a correlation coefficient and mean error of 0.852 and -0.001 ± 3.102 mmHg for systolic BP, and 0.790 and -0.004 ± 2.199 mmHg for diastolic BP. Similar performance was observed for dynamic BP estimation. The robustness results indicated that the estimation accuracy was lower by a certain degree one day after model construction but was relatively stable from one day to six months after construction. The proposed approach is superior to the state-of-the-art PTT-based model for an approximately 2-mmHg reduction in the standard derivation at different time intervals, thus providing potentially novel insights for cuffless BP estimation.
Robust Face Recognition via Multi-Scale Patch-Based Matrix Regression.
Gao, Guangwei; Yang, Jian; Jing, Xiaoyuan; Huang, Pu; Hua, Juliang; Yue, Dong
2016-01-01
In many real-world applications such as smart card solutions, law enforcement, surveillance and access control, the limited training sample size is the most fundamental problem. By making use of the low-rank structural information of the reconstructed error image, the so-called nuclear norm-based matrix regression has been demonstrated to be effective for robust face recognition with continuous occlusions. However, the recognition performance of nuclear norm-based matrix regression degrades greatly in the face of the small sample size problem. An alternative solution to tackle this problem is performing matrix regression on each patch and then integrating the outputs from all patches. However, it is difficult to set an optimal patch size across different databases. To fully utilize the complementary information from different patch scales for the final decision, we propose a multi-scale patch-based matrix regression scheme based on which the ensemble of multi-scale outputs can be achieved optimally. Extensive experiments on benchmark face databases validate the effectiveness and robustness of our method, which outperforms several state-of-the-art patch-based face recognition algorithms.
NASA Astrophysics Data System (ADS)
Hart, Brian K.; Griffiths, Peter R.
1998-06-01
Partial least squares (PLS) regression has been evaluated as a robust calibration technique for over 100 hazardous air pollutants (HAPs) measured by open path Fourier transform infrared (OP/FT-IR) spectrometry. PLS has the advantage over the current recommended calibration method of classical least squares (CLS), in that it can look at the whole useable spectrum (700-1300 cm-1, 2000-2150 cm-1, and 2400-3000 cm-1), and detect several analytes simultaneously. Up to one hundred HAPs synthetically added to OP/FT-IR backgrounds have been simultaneously calibrated and detected using PLS. PLS also has the advantage in requiring less preprocessing of spectra than that which is required in CLS calibration schemes, allowing PLS to provide user independent real-time analysis of OP/FT-IR spectra.
Li, Yankun; Shao, Xueguang; Cai, Wensheng
2007-04-15
Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.
NASA Astrophysics Data System (ADS)
Czernecki, Bartosz; Nowosad, Jakub; Jabłońska, Katarzyna
2018-04-01
Changes in the timing of plant phenological phases are important proxies in contemporary climate research. However, most of the commonly used traditional phenological observations do not give any coherent spatial information. While consistent spatial data can be obtained from airborne sensors and preprocessed gridded meteorological data, not many studies robustly benefit from these data sources. Therefore, the main aim of this study is to create and evaluate different statistical models for reconstructing, predicting, and improving quality of phenological phases monitoring with the use of satellite and meteorological products. A quality-controlled dataset of the 13 BBCH plant phenophases in Poland was collected for the period 2007-2014. For each phenophase, statistical models were built using the most commonly applied regression-based machine learning techniques, such as multiple linear regression, lasso, principal component regression, generalized boosted models, and random forest. The quality of the models was estimated using a k-fold cross-validation. The obtained results showed varying potential for coupling meteorological derived indices with remote sensing products in terms of phenological modeling; however, application of both data sources improves models' accuracy from 0.6 to 4.6 day in terms of obtained RMSE. It is shown that a robust prediction of early phenological phases is mostly related to meteorological indices, whereas for autumn phenophases, there is a stronger information signal provided by satellite-derived vegetation metrics. Choosing a specific set of predictors and applying a robust preprocessing procedures is more important for final results than the selection of a particular statistical model. The average RMSE for the best models of all phenophases is 6.3, while the individual RMSE vary seasonally from 3.5 to 10 days. Models give reliable proxy for ground observations with RMSE below 5 days for early spring and late spring phenophases. For other phenophases, RMSE are higher and rise up to 9-10 days in the case of the earliest spring phenophases.
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Nagel, O G; Molina, M P; Basílico, J C; Zapata, M L; Althaus, R L
2009-06-01
To use experimental design techniques and a multiple logistic regression model to optimize a microbiological inhibition test with dichotomous response for the detection of Penicillin G in milk. A 2(3) x 2(2) robust experimental design with two replications was used. The effects of three control factors (V: culture medium volume, S: spore concentration of Geobacillus stearothermophilus, I: indicator concentration), two noise factors (Dt: diffusion time, Ip: incubation period) and their interactions were studied. The V, S, Dt, Ip factors and V x S, V x Ip, S x Ip interactions showed significant effects. The use of 100 microl culture medium volume, 2 x 10(5) spores ml(-1), 60 min diffusion time and 3 h incubation period is recommended. In these elaboration conditions, the penicillin detection limit was of 3.9 microg l(-1), similar to the maximum residue limit (MRL). Of the two noise factors studied, the incubation period can be controlled by means of the culture medium volume and spore concentration. We were able to optimize bioassays of dichotomous response using an experimental design and logistic regression model for the detection of residues at the level of MRL, aiding in the avoidance of health problems in the consumer.
Estimation of bias and variance of measurements made from tomography scans
NASA Astrophysics Data System (ADS)
Bradley, Robert S.
2016-09-01
Tomographic imaging modalities are being increasingly used to quantify internal characteristics of objects for a wide range of applications, from medical imaging to materials science research. However, such measurements are typically presented without an assessment being made of their associated variance or confidence interval. In particular, noise in raw scan data places a fundamental lower limit on the variance and bias of measurements made on the reconstructed 3D volumes. In this paper, the simulation-extrapolation technique, which was originally developed for statistical regression, is adapted to estimate the bias and variance for measurements made from a single scan. The application to x-ray tomography is considered in detail and it is demonstrated that the technique can also allow the robustness of automatic segmentation strategies to be compared.
Practical aspects of estimating energy components in rodents
van Klinken, Jan B.; van den Berg, Sjoerd A. A.; van Dijk, Ko Willems
2013-01-01
Recently there has been an increasing interest in exploiting computational and statistical techniques for the purpose of component analysis of indirect calorimetry data. Using these methods it becomes possible to dissect daily energy expenditure into its components and to assess the dynamic response of the resting metabolic rate (RMR) to nutritional and pharmacological manipulations. To perform robust component analysis, however, is not straightforward and typically requires the tuning of parameters and the preprocessing of data. Moreover the degree of accuracy that can be attained by these methods depends on the configuration of the system, which must be properly taken into account when setting up experimental studies. Here, we review the methods of Kalman filtering, linear, and penalized spline regression, and minimal energy expenditure estimation in the context of component analysis and discuss their results on high resolution datasets from mice and rats. In addition, we investigate the effect of the sample time, the accuracy of the activity sensor, and the washout time of the chamber on the estimation accuracy. We found that on the high resolution data there was a strong correlation between the results of Kalman filtering and penalized spline (P-spline) regression, except for the activity respiratory quotient (RQ). For low resolution data the basal metabolic rate (BMR) and resting RQ could still be estimated accurately with P-spline regression, having a strong correlation with the high resolution estimate (R2 > 0.997; sample time of 9 min). In contrast, the thermic effect of food (TEF) and activity related energy expenditure (AEE) were more sensitive to a reduction in the sample rate (R2 > 0.97). In conclusion, for component analysis on data generated by single channel systems with continuous data acquisition both Kalman filtering and P-spline regression can be used, while for low resolution data from multichannel systems P-spline regression gives more robust results. PMID:23641217
Gan, Yanjun; Duan, Qingyun; Gong, Wei; ...
2014-01-01
Sensitivity analysis (SA) is a commonly used approach for identifying important parameters that dominate model behaviors. We use a newly developed software package, a Problem Solving environment for Uncertainty Analysis and Design Exploration (PSUADE), to evaluate the effectiveness and efficiency of ten widely used SA methods, including seven qualitative and three quantitative ones. All SA methods are tested using a variety of sampling techniques to screen out the most sensitive (i.e., important) parameters from the insensitive ones. The Sacramento Soil Moisture Accounting (SAC-SMA) model, which has thirteen tunable parameters, is used for illustration. The South Branch Potomac River basin nearmore » Springfield, West Virginia in the U.S. is chosen as the study area. The key findings from this study are: (1) For qualitative SA methods, Correlation Analysis (CA), Regression Analysis (RA), and Gaussian Process (GP) screening methods are shown to be not effective in this example. Morris One-At-a-Time (MOAT) screening is the most efficient, needing only 280 samples to identify the most important parameters, but it is the least robust method. Multivariate Adaptive Regression Splines (MARS), Delta Test (DT) and Sum-Of-Trees (SOT) screening methods need about 400–600 samples for the same purpose. Monte Carlo (MC), Orthogonal Array (OA) and Orthogonal Array based Latin Hypercube (OALH) are appropriate sampling techniques for them; (2) For quantitative SA methods, at least 2777 samples are needed for Fourier Amplitude Sensitivity Test (FAST) to identity parameter main effect. McKay method needs about 360 samples to evaluate the main effect, more than 1000 samples to assess the two-way interaction effect. OALH and LPτ (LPTAU) sampling techniques are more appropriate for McKay method. For the Sobol' method, the minimum samples needed are 1050 to compute the first-order and total sensitivity indices correctly. These comparisons show that qualitative SA methods are more efficient but less accurate and robust than quantitative ones.« less
NASA Astrophysics Data System (ADS)
Gusriani, N.; Firdaniza
2018-03-01
The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
Igne, Benoît; Drennen, James K; Anderson, Carl A
2014-01-01
Changes in raw materials and process wear and tear can have significant effects on the prediction error of near-infrared calibration models. When the variability that is present during routine manufacturing is not included in the calibration, test, and validation sets, the long-term performance and robustness of the model will be limited. Nonlinearity is a major source of interference. In near-infrared spectroscopy, nonlinearity can arise from light path-length differences that can come from differences in particle size or density. The usefulness of support vector machine (SVM) regression to handle nonlinearity and improve the robustness of calibration models in scenarios where the calibration set did not include all the variability present in test was evaluated. Compared to partial least squares (PLS) regression, SVM regression was less affected by physical (particle size) and chemical (moisture) differences. The linearity of the SVM predicted values was also improved. Nevertheless, although visualization and interpretation tools have been developed to enhance the usability of SVM-based methods, work is yet to be done to provide chemometricians in the pharmaceutical industry with a regression method that can supplement PLS-based methods.
NASA Astrophysics Data System (ADS)
Eyarkai Nambi, Vijayaram; Thangavel, Kuladaisamy; Manickavasagan, Annamalai; Shahir, Sultan
2017-01-01
Prediction of ripeness level in climacteric fruits is essential for post-harvest handling. An index capable of predicting ripening level with minimum inputs would be highly beneficial to the handlers, processors and researchers in fruit industry. A study was conducted with Indian mango cultivars to develop a ripeness index and associated model. Changes in physicochemical, colour and textural properties were measured throughout the ripening period and the period was classified into five stages (unripe, early ripe, partially ripe, ripe and over ripe). Multivariate regression techniques like partial least square regression, principal component regression and multi linear regression were compared and evaluated for its prediction. Multi linear regression model with 12 parameters was found more suitable in ripening prediction. Scientific variable reduction method was adopted to simplify the developed model. Better prediction was achieved with either 2 or 3 variables (total soluble solids, colour and acidity). Cross validation was done to increase the robustness and it was found that proposed ripening index was more effective in prediction of ripening stages. Three-variable model would be suitable for commercial applications where reasonable accuracies are sufficient. However, 12-variable model can be used to obtain more precise results in research and development applications.
The Influential Effect of Blending, Bump, Changing Period, and Eclipsing Cepheids on the Leavitt Law
NASA Astrophysics Data System (ADS)
García-Varela, A.; Muñoz, J. R.; Sabogal, B. E.; Vargas Domínguez, S.; Martínez, J.
2016-06-01
The investigation of the nonlinearity of the Leavitt law (LL) is a topic that began more than seven decades ago, when some of the studies in this field found that the LL has a break at about 10 days. The goal of this work is to investigate a possible statistical cause of this nonlinearity. By applying linear regressions to OGLE-II and OGLE-IV data, we find that to obtain the LL by using linear regression, robust techniques to deal with influential points and/or outliers are needed instead of the ordinary least-squares regression traditionally used. In particular, by using M- and MM-regressions we establish firmly and without doubt the linearity of the LL in the Large Magellanic Cloud, without rejecting or excluding Cepheid data from the analysis. This implies that light curves of Cepheids suggesting blending, bumps, eclipses, or period changes do not affect the LL for this galaxy. For the Small Magellanic Cloud, when including Cepheids of this kind, it is not possible to find an adequate model, probably because of the geometry of the galaxy. In that case, a possible influence of these stars could exist.
Forecasting space weather: Can new econometric methods improve accuracy?
NASA Astrophysics Data System (ADS)
Reikard, Gordon
2011-06-01
Space weather forecasts are currently used in areas ranging from navigation and communication to electric power system operations. The relevant forecast horizons can range from as little as 24 h to several days. This paper analyzes the predictability of two major space weather measures using new time series methods, many of them derived from econometrics. The data sets are the A p geomagnetic index and the solar radio flux at 10.7 cm. The methods tested include nonlinear regressions, neural networks, frequency domain algorithms, GARCH models (which utilize the residual variance), state transition models, and models that combine elements of several techniques. While combined models are complex, they can be programmed using modern statistical software. The data frequency is daily, and forecasting experiments are run over horizons ranging from 1 to 7 days. Two major conclusions stand out. First, the frequency domain method forecasts the A p index more accurately than any time domain model, including both regressions and neural networks. This finding is very robust, and holds for all forecast horizons. Combining the frequency domain method with other techniques yields a further small improvement in accuracy. Second, the neural network forecasts the solar flux more accurately than any other method, although at short horizons (2 days or less) the regression and net yield similar results. The neural net does best when it includes measures of the long-term component in the data.
NASA Astrophysics Data System (ADS)
Sánchez, Clara I.; Hornero, Roberto; Mayo, Agustín; García, María
2009-02-01
Diabetic Retinopathy is one of the leading causes of blindness and vision defects in developed countries. An early detection and diagnosis is crucial to avoid visual complication. Microaneurysms are the first ocular signs of the presence of this ocular disease. Their detection is of paramount importance for the development of a computer-aided diagnosis technique which permits a prompt diagnosis of the disease. However, the detection of microaneurysms in retinal images is a difficult task due to the wide variability that these images usually present in screening programs. We propose a statistical approach based on mixture model-based clustering and logistic regression which is robust to the changes in the appearance of retinal fundus images. The method is evaluated on the public database proposed by the Retinal Online Challenge in order to obtain an objective performance measure and to allow a comparative study with other proposed algorithms.
Qu, Mingkai; Wang, Yan; Huang, Biao; Zhao, Yongcun
2018-06-01
The traditional source apportionment models, such as absolute principal component scores-multiple linear regression (APCS-MLR), are usually susceptible to outliers, which may be widely present in the regional geochemical dataset. Furthermore, the models are merely built on variable space instead of geographical space and thus cannot effectively capture the local spatial characteristics of each source contributions. To overcome the limitations, a new receptor model, robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR), was proposed based on the traditional APCS-MLR model. Then, the new method was applied to the source apportionment of soil metal elements in a region of Wuhan City, China as a case study. Evaluations revealed that: (i) RAPCS-RGWR model had better performance than APCS-MLR model in the identification of the major sources of soil metal elements, and (ii) source contributions estimated by RAPCS-RGWR model were more close to the true soil metal concentrations than that estimated by APCS-MLR model. It is shown that the proposed RAPCS-RGWR model is a more effective source apportionment method than APCS-MLR (i.e., non-robust and global model) in dealing with the regional geochemical dataset. Copyright © 2018 Elsevier B.V. All rights reserved.
Rafindadi, Abdulkadir Abdulrashid; Yusof, Zarinah; Zaman, Khalid; Kyophilavong, Phouphet; Akhmat, Ghulam
2014-10-01
The objective of the study is to examine the relationship between air pollution, fossil fuel energy consumption, water resources, and natural resource rents in the panel of selected Asia-Pacific countries, over a period of 1975-2012. The study includes number of variables in the model for robust analysis. The results of cross-sectional analysis show that there is a significant relationship between air pollution, energy consumption, and water productivity in the individual countries of Asia-Pacific. However, the results of each country vary according to the time invariant shocks. For this purpose, the study employed the panel least square technique which includes the panel least square regression, panel fixed effect regression, and panel two-stage least square regression. In general, all the panel tests indicate that there is a significant and positive relationship between air pollution, energy consumption, and water resources in the region. The fossil fuel energy consumption has a major dominating impact on the changes in the air pollution in the region.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-25
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teng, C; Ainsley, C; Teo, B
Purpose: In the light of tumor regression and normal tissue changes, dose distributions can deviate undesirably from what was planned. As a consequence, replanning is sometimes necessary during treatment to ensure continued tumor coverage or to avoid overdosing organs at risk (OARs). Proton plans are generally thought to be less robust than photon plans because of the proton beam’s higher sensitivity to changes in tissue composition, suggesting also a higher likely replanning rate due to tumor regression. The purpose of this study is to compare dosimetric deviations between forward-calculated double scattering (DS) proton plans with IMRT plans upon tumor regression,more » and assesses their impact on clinical replanning decisions. Methods: Ten consecutive locally advanced NSCLC patients whose tumors shrank > 50% in volume and who received four or more CT scans during radiotherapy were analyzed. All the patients received proton radiotherapy (6660 cGy, 180 cGy/fx). Dosimetric robustness during therapy was characterized by changes in the planning objective metrics as well as by point-by-point root-mean-squared differences for the entire PTV, ITV, and OARs (heart, cord, esophagus, brachial plexus and lungs) DVHs. Results: Sixty-four pairs of DVHs were reviewed by three clinicians, who requested a replanning rate of 16.7% and 18.6% for DS and IMRT plans, respectively, with a high agreement between providers. Robustness of clinical indicators was found to depend on the beam orientation and dose level on the DVH curve. Proton dose increased most in OARs distal to the PTV along the beam path, but these changes were primarily in the mid to low dose levels. In contrast, the variation in IMRT plans occurred primarily in the high dose region. Conclusion: Robustness of clinical indicators depends where on the DVH curves comparisons are made. Similar replanning rates were observed for DS and IMRT plans upon large tumor regression.« less
Gaussian process regression for tool wear prediction
NASA Astrophysics Data System (ADS)
Kong, Dongdong; Chen, Yongjie; Li, Ning
2018-05-01
To realize and accelerate the pace of intelligent manufacturing, this paper presents a novel tool wear assessment technique based on the integrated radial basis function based kernel principal component analysis (KPCA_IRBF) and Gaussian process regression (GPR) for real-timely and accurately monitoring the in-process tool wear parameters (flank wear width). The KPCA_IRBF is a kind of new nonlinear dimension-increment technique and firstly proposed for feature fusion. The tool wear predictive value and the corresponding confidence interval are both provided by utilizing the GPR model. Besides, GPR performs better than artificial neural networks (ANN) and support vector machines (SVM) in prediction accuracy since the Gaussian noises can be modeled quantitatively in the GPR model. However, the existence of noises will affect the stability of the confidence interval seriously. In this work, the proposed KPCA_IRBF technique helps to remove the noises and weaken its negative effects so as to make the confidence interval compressed greatly and more smoothed, which is conducive for monitoring the tool wear accurately. Moreover, the selection of kernel parameter in KPCA_IRBF can be easily carried out in a much larger selectable region in comparison with the conventional KPCA_RBF technique, which helps to improve the efficiency of model construction. Ten sets of cutting tests are conducted to validate the effectiveness of the presented tool wear assessment technique. The experimental results show that the in-process flank wear width of tool inserts can be monitored accurately by utilizing the presented tool wear assessment technique which is robust under a variety of cutting conditions. This study lays the foundation for tool wear monitoring in real industrial settings.
NASA Astrophysics Data System (ADS)
Lockwood, M.; Owens, M. J.; Barnard, L.; Usoskin, I. G.
2016-11-01
We use sunspot-group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups [RB] above a variable cut-off threshold of observed total whole spot area (uncorrected for foreshortening) to simulate what a lower-acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number [RA] using a variety of regression techniques. It is found that a very high correlation between RA and RB (r_{AB} > 0.98) does not prevent large errors in the intercalibration (for example sunspot-maximum values can be over 30 % too large even for such levels of r_{AB}). In generating the backbone sunspot number [R_{BB}], Svalgaard and Schatten ( Solar Phys., 2016) force regression fits to pass through the scatter-plot origin, which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot-cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile ("Q-Q") plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least-squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot-group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar-cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.
Ahmadi, Mehdi; Shahlaei, Mohsen
2015-01-01
P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure-activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7-7-1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure-activity relationship model suggested is robust and satisfactory.
Ahmadi, Mehdi; Shahlaei, Mohsen
2015-01-01
P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure–activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7−7−1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure–activity relationship model suggested is robust and satisfactory. PMID:26600858
Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P
2014-06-26
To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.
NASA Astrophysics Data System (ADS)
Chartosias, Marios
Acceptance of Carbon Fiber Reinforced Polymer (CFRP) structures requires a robust surface preparation method with improved process controls capable of ensuring high bond quality. Surface preparation in a production clean room environment prior to applying adhesive for bonding would minimize risk of contamination and reduce cost. Plasma treatment is a robust surface preparation process capable of being applied in a production clean room environment with process parameters that are easily controlled and documented. Repeatable and consistent processing is enabled through the development of a process parameter window utilizing techniques such as Design of Experiments (DOE) tailored to specific adhesive and substrate bonding applications. Insight from respective plasma treatment Original Equipment Manufacturers (OEMs) and screening tests determined critical process factors from non-factors and set the associated factor levels prior to execution of the DOE. Results from mode I Double Cantilever Beam (DCB) testing per ASTM D 5528 [1] standard and DOE statistical analysis software are used to produce a regression model and determine appropriate optimum settings for each factor.
Gradient descent for robust kernel-based regression
NASA Astrophysics Data System (ADS)
Guo, Zheng-Chu; Hu, Ting; Shi, Lei
2018-06-01
In this paper, we study the gradient descent algorithm generated by a robust loss function over a reproducing kernel Hilbert space (RKHS). The loss function is defined by a windowing function G and a scale parameter σ, which can include a wide range of commonly used robust losses for regression. There is still a gap between theoretical analysis and optimization process of empirical risk minimization based on loss: the estimator needs to be global optimal in the theoretical analysis while the optimization method can not ensure the global optimality of its solutions. In this paper, we aim to fill this gap by developing a novel theoretical analysis on the performance of estimators generated by the gradient descent algorithm. We demonstrate that with an appropriately chosen scale parameter σ, the gradient update with early stopping rules can approximate the regression function. Our elegant error analysis can lead to convergence in the standard L 2 norm and the strong RKHS norm, both of which are optimal in the mini-max sense. We show that the scale parameter σ plays an important role in providing robustness as well as fast convergence. The numerical experiments implemented on synthetic examples and real data set also support our theoretical results.
The Evaluation on the Cadmium Net Concentration for Soil Ecosystems.
Yao, Yu; Wang, Pei-Fang; Wang, Chao; Hou, Jun; Miao, Ling-Zhan
2017-03-12
Yixing, known as the "City of Ceramics", is facing a new dilemma: a raw material crisis. Cadmium (Cd) exists in extremely high concentrations in soil due to the considerable input of industrial wastewater into the soil ecosystem. The in situ technique of diffusive gradients in thin film (DGT), the ex situ static equilibrium approach (HAc, EDTA and CaCl2), and the dissolved concentration in soil solution, as well as microwave digestion, were applied to predict the Cd bioavailability of soil, aiming to provide a robust and accurate method for Cd bioavailability evaluation in Yixing. Moreover, the typical local cash crops-paddy and zizania aquatica-were selected for Cd accumulation, aiming to select the ideal plants with tolerance to the soil Cd contamination. The results indicated that the biomasses of the two applied plants were sufficiently sensitive to reflect the stark regional differences of different sampling sites. The zizania aquatica could effectively reduce the total Cd concentration, as indicated by the high accumulation coefficients. However, the fact that the zizania aquatica has extremely high transfer coefficients, and its stem, as the edible part, might accumulate large amounts of Cd, led to the conclusion that zizania aquatica was not an ideal cash crop in Yixing. Furthermore, the labile Cd concentrations which were obtained by the DGT technique and dissolved in the soil solution showed a significant correlation with the Cd concentrations of the biota accumulation. However, the ex situ methods and the microwave digestion-obtained Cd concentrations showed a poor correlation with the accumulated Cd concentration in plant tissue. Correspondingly, the multiple linear regression models were built for fundamental analysis of the performance of different methods available for Cd bioavailability evaluation. The correlation coefficients of DGT obtained by the improved multiple linear regression model have not significantly improved compared to the coefficients obtained by the simple linear regression model. The results revealed that DGT was a robust measurement, which could obtain the labile Cd concentrations independent of the physicochemical features' variation in the soil ecosystem. Consequently, these findings provide stronger evidence that DGT is an effective and ideal tool for labile Cd evaluation in Yixing.
The Evaluation on the Cadmium Net Concentration for Soil Ecosystems
Yao, Yu; Wang, Pei-Fang; Wang, Chao; Hou, Jun; Miao, Ling-Zhan
2017-01-01
Yixing, known as the “City of Ceramics”, is facing a new dilemma: a raw material crisis. Cadmium (Cd) exists in extremely high concentrations in soil due to the considerable input of industrial wastewater into the soil ecosystem. The in situ technique of diffusive gradients in thin film (DGT), the ex situ static equilibrium approach (HAc, EDTA and CaCl2), and the dissolved concentration in soil solution, as well as microwave digestion, were applied to predict the Cd bioavailability of soil, aiming to provide a robust and accurate method for Cd bioavailability evaluation in Yixing. Moreover, the typical local cash crops—paddy and zizania aquatica—were selected for Cd accumulation, aiming to select the ideal plants with tolerance to the soil Cd contamination. The results indicated that the biomasses of the two applied plants were sufficiently sensitive to reflect the stark regional differences of different sampling sites. The zizania aquatica could effectively reduce the total Cd concentration, as indicated by the high accumulation coefficients. However, the fact that the zizania aquatica has extremely high transfer coefficients, and its stem, as the edible part, might accumulate large amounts of Cd, led to the conclusion that zizania aquatica was not an ideal cash crop in Yixing. Furthermore, the labile Cd concentrations which were obtained by the DGT technique and dissolved in the soil solution showed a significant correlation with the Cd concentrations of the biota accumulation. However, the ex situ methods and the microwave digestion-obtained Cd concentrations showed a poor correlation with the accumulated Cd concentration in plant tissue. Correspondingly, the multiple linear regression models were built for fundamental analysis of the performance of different methods available for Cd bioavailability evaluation. The correlation coefficients of DGT obtained by the improved multiple linear regression model have not significantly improved compared to the coefficients obtained by the simple linear regression model. The results revealed that DGT was a robust measurement, which could obtain the labile Cd concentrations independent of the physicochemical features’ variation in the soil ecosystem. Consequently, these findings provide stronger evidence that DGT is an effective and ideal tool for labile Cd evaluation in Yixing. PMID:28287500
Mauer, Michael; Caramori, Maria Luiza; Fioretto, Paola; Najafian, Behzad
2015-06-01
Studies of structural-functional relationships have improved understanding of the natural history of diabetic nephropathy (DN). However, in order to consider structural end points for clinical trials, the robustness of the resultant models needs to be verified. This study examined whether structural-functional relationship models derived from a large cohort of type 1 diabetic (T1D) patients with a wide range of renal function are robust. The predictability of models derived from multiple regression analysis and piecewise linear regression analysis was also compared. T1D patients (n = 161) with research renal biopsies were divided into two equal groups matched for albumin excretion rate (AER). Models to explain AER and glomerular filtration rate (GFR) by classical DN lesions in one group (T1D-model, or T1D-M) were applied to the other group (T1D-test, or T1D-T) and regression analyses were performed. T1D-M-derived models explained 70 and 63% of AER variance and 32 and 21% of GFR variance in T1D-M and T1D-T, respectively, supporting the substantial robustness of the models. Piecewise linear regression analyses substantially improved predictability of the models with 83% of AER variance and 66% of GFR variance explained by classical DN glomerular lesions alone. These studies demonstrate that DN structural-functional relationship models are robust, and if appropriate models are used, glomerular lesions alone explain a major proportion of AER and GFR variance in T1D patients. © The Author 2014. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Robust Mediation Analysis Based on Median Regression
Yuan, Ying; MacKinnon, David P.
2014-01-01
Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925
NASA Astrophysics Data System (ADS)
Verrelst, Jochem; Malenovský, Zbyněk; Van der Tol, Christiaan; Camps-Valls, Gustau; Gastellu-Etchegorry, Jean-Philippe; Lewis, Philip; North, Peter; Moreno, Jose
2018-06-01
An unprecedented spectroscopic data stream will soon become available with forthcoming Earth-observing satellite missions equipped with imaging spectroradiometers. This data stream will open up a vast array of opportunities to quantify a diversity of biochemical and structural vegetation properties. The processing requirements for such large data streams require reliable retrieval techniques enabling the spatiotemporally explicit quantification of biophysical variables. With the aim of preparing for this new era of Earth observation, this review summarizes the state-of-the-art retrieval methods that have been applied in experimental imaging spectroscopy studies inferring all kinds of vegetation biophysical variables. Identified retrieval methods are categorized into: (1) parametric regression, including vegetation indices, shape indices and spectral transformations; (2) nonparametric regression, including linear and nonlinear machine learning regression algorithms; (3) physically based, including inversion of radiative transfer models (RTMs) using numerical optimization and look-up table approaches; and (4) hybrid regression methods, which combine RTM simulations with machine learning regression methods. For each of these categories, an overview of widely applied methods with application to mapping vegetation properties is given. In view of processing imaging spectroscopy data, a critical aspect involves the challenge of dealing with spectral multicollinearity. The ability to provide robust estimates, retrieval uncertainties and acceptable retrieval processing speed are other important aspects in view of operational processing. Recommendations towards new-generation spectroscopy-based processing chains for operational production of biophysical variables are given.
Enhancing hyperspectral spatial resolution using multispectral image fusion: A wavelet approach
NASA Astrophysics Data System (ADS)
Jazaeri, Amin
High spectral and spatial resolution images have a significant impact in remote sensing applications. Because both spatial and spectral resolutions of spaceborne sensors are fixed by design and it is not possible to further increase the spatial or spectral resolution, techniques such as image fusion must be applied to achieve such goals. This dissertation introduces the concept of wavelet fusion between hyperspectral and multispectral sensors in order to enhance the spectral and spatial resolution of a hyperspectral image. To test the robustness of this concept, images from Hyperion (hyperspectral sensor) and Advanced Land Imager (multispectral sensor) were first co-registered and then fused using different wavelet algorithms. A regression-based fusion algorithm was also implemented for comparison purposes. The results show that the fused images using a combined bi-linear wavelet-regression algorithm have less error than other methods when compared to the ground truth. In addition, a combined regression-wavelet algorithm shows more immunity to misalignment of the pixels due to the lack of proper registration. The quantitative measures of average mean square error show that the performance of wavelet-based methods degrades when the spatial resolution of hyperspectral images becomes eight times less than its corresponding multispectral image. Regardless of what method of fusion is utilized, the main challenge in image fusion is image registration, which is also a very time intensive process. Because the combined regression wavelet technique is computationally expensive, a hybrid technique based on regression and wavelet methods was also implemented to decrease computational overhead. However, the gain in faster computation was offset by the introduction of more error in the outcome. The secondary objective of this dissertation is to examine the feasibility and sensor requirements for image fusion for future NASA missions in order to be able to perform onboard image fusion. In this process, the main challenge of image registration was resolved by registering the input images using transformation matrices of previously acquired data. The composite image resulted from the fusion process remarkably matched the ground truth, indicating the possibility of real time onboard fusion processing.
Robust inference in the negative binomial regression model with an application to falls data.
Aeberhard, William H; Cantoni, Eva; Heritier, Stephane
2014-12-01
A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.
SMOS salinity retrieval by using Support Vector Regression (SVR)
NASA Astrophysics Data System (ADS)
Katagis, Thomas; Fernández-Prieto, Diego; Marconcini, Mattia; Sabia, Roberto; Martinez, Justino
2013-04-01
The Soil Moisture and Ocean Salinity (SMOS) mission was launched in November 2009 within the framework of the European Space Agency (ESA) Living Planet programme. Over the oceans, it aims at providing Sea Surface Salinity (SSS) maps with spatial and temporal coverage adequate for large scale oceanography. A comprehensive inversion scheme has been defined and implemented in the operational retrieval chain to allow proper SSS estimates in a single satellite overpass (L2 product) from the multi-angular brightness temperatures (TBs) measured by SMOS. Such SMOS operational L2 salinity processor minimizes the difference between the measured and modeled TBs, including additional constraints on Sea Surface Temperature (SST) and wind speed auxiliary fields. In particular, by adopting a maximum-likelihood Bayesian approach, the inversion scheme retrieves salinity under an iterative convergence loop. However, despite the implemented iterative technique is well established and robust, it is still prone to limitations; for instance, the presence of local minima in the cost function cannot be excluded. Moreover, previous studies have demonstrated that the background and observational terms of the cost function are not properly balanced and this is likely to introduce errors in the retrieval procedure. In order to overcome such potential drawbacks, in this study it is proposed a novel approach for the SSS estimation based on the ɛ-insensitive Support Vector Regression (SVR), where both SMOS L1 measurements and auxiliary parameters are used as input. The SVR technique already proved capable of high generalization and robustness in a variety of different applications, with a limited complexity in handling the learning phase. Notably, instead of minimizing the observed training error, it attempts to minimize the generalization error bound so as to achieve generalized performance. For this purpose, the original input domain is mapped into a higher dimensionality space (where the function underlying the data is supposed to have increased flatness) and linear regression is performed. The SVR training is performed using suitable in situ SSS data (i.e., ARGO buoys data) collected in a representative region of the ocean. So far, in situ data coming from a match-up ARGO database in November 2010 over the South Pacific constitute the preliminary benchmark of the study. Ongoing activities point at extending this spatial and temporal frame to assess the robustness of the method. The in situ data have been collocated with SMOS TB measurements and additional parameters (e.g., SST and wind speed) in the learning phase of the SVR under various training/testing configurations. Afterwards, the SSS regression has been performed out of the SMOS TBs or emissivities. Estimated SVR salinity fields are in general (very) well correlated with ARGO data. The analysis of the different impact of the various features has been performed once a rigorous data filtering/flagging is applied, and misfit (SSSSVR-SSSARGO) statistics have been computed. For assessing the effectiveness of the proposed method, final results will be compared to those obtained using the official SMOS SSS retrieval algorithm.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-01
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026
Dynamic multifactor clustering of financial networks
NASA Astrophysics Data System (ADS)
Ross, Gordon J.
2014-02-01
We investigate the tendency for financial instruments to form clusters when there are multiple factors influencing the correlation structure. Specifically, we consider a stock portfolio which contains companies from different industrial sectors, located in several different countries. Both sector membership and geography combine to create a complex clustering structure where companies seem to first be divided based on sector, with geographical subclusters emerging within each industrial sector. We argue that standard techniques for detecting overlapping clusters and communities are not able to capture this type of structure and show how robust regression techniques can instead be used to remove the influence of both sector and geography from the correlation matrix separately. Our analysis reveals that prior to the 2008 financial crisis, companies did not tend to form clusters based on geography. This changed immediately following the crisis, with geography becoming a more important determinant of clustering structure.
NASA Astrophysics Data System (ADS)
Rocha, Alby D.; Groen, Thomas A.; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Willemen, Louise
2017-11-01
The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that are only valid for the data used, and too complex to make inferences about the underlying process.
Advanced signal processing based on support vector regression for lidar applications
NASA Astrophysics Data System (ADS)
Gelfusa, M.; Murari, A.; Malizia, A.; Lungaroni, M.; Peluso, E.; Parracino, S.; Talebzadeh, S.; Vega, J.; Gaudio, P.
2015-10-01
The LIDAR technique has recently found many applications in atmospheric physics and remote sensing. One of the main issues, in the deployment of systems based on LIDAR, is the filtering of the backscattered signal to alleviate the problems generated by noise. Improvement in the signal to noise ratio is typically achieved by averaging a quite large number (of the order of hundreds) of successive laser pulses. This approach can be effective but presents significant limitations. First of all, it implies a great stress on the laser source, particularly in the case of systems for automatic monitoring of large areas for long periods. Secondly, this solution can become difficult to implement in applications characterised by rapid variations of the atmosphere, for example in the case of pollutant emissions, or by abrupt changes in the noise. In this contribution, a new method for the software filtering and denoising of LIDAR signals is presented. The technique is based on support vector regression. The proposed new method is insensitive to the statistics of the noise and is therefore fully general and quite robust. The developed numerical tool has been systematically compared with the most powerful techniques available, using both synthetic and experimental data. Its performances have been tested for various statistical distributions of the noise and also for other disturbances of the acquired signal such as outliers. The competitive advantages of the proposed method are fully documented. The potential of the proposed approach to widen the capability of the LIDAR technique, particularly in the detection of widespread smoke, is discussed in detail.
Vector autoregressive models: A Gini approach
NASA Astrophysics Data System (ADS)
Mussard, Stéphane; Ndiaye, Oumar Hamady
2018-02-01
In this paper, it is proven that the usual VAR models may be performed in the Gini sense, that is, on a ℓ1 metric space. The Gini regression is robust to outliers. As a consequence, when data are contaminated by extreme values, we show that semi-parametric VAR-Gini regressions may be used to obtain robust estimators. The inference about the estimators is made with the ℓ1 norm. Also, impulse response functions and Gini decompositions for prevision errors are introduced. Finally, Granger's causality tests are properly derived based on U-statistics.
Robust boosting via convex optimization
NASA Astrophysics Data System (ADS)
Rätsch, Gunnar
2001-12-01
In this work we consider statistical learning problems. A learning machine aims to extract information from a set of training examples such that it is able to predict the associated label on unseen examples. We consider the case where the resulting classification or regression rule is a combination of simple rules - also called base hypotheses. The so-called boosting algorithms iteratively find a weighted linear combination of base hypotheses that predict well on unseen data. We address the following issues: o The statistical learning theory framework for analyzing boosting methods. We study learning theoretic guarantees on the prediction performance on unseen examples. Recently, large margin classification techniques emerged as a practical result of the theory of generalization, in particular Boosting and Support Vector Machines. A large margin implies a good generalization performance. Hence, we analyze how large the margins in boosting are and find an improved algorithm that is able to generate the maximum margin solution. o How can boosting methods be related to mathematical optimization techniques? To analyze the properties of the resulting classification or regression rule, it is of high importance to understand whether and under which conditions boosting converges. We show that boosting can be used to solve large scale constrained optimization problems, whose solutions are well characterizable. To show this, we relate boosting methods to methods known from mathematical optimization, and derive convergence guarantees for a quite general family of boosting algorithms. o How to make Boosting noise robust? One of the problems of current boosting techniques is that they are sensitive to noise in the training sample. In order to make boosting robust, we transfer the soft margin idea from support vector learning to boosting. We develop theoretically motivated regularized algorithms that exhibit a high noise robustness. o How to adapt boosting to regression problems? Boosting methods are originally designed for classification problems. To extend the boosting idea to regression problems, we use the previous convergence results and relations to semi-infinite programming to design boosting-like algorithms for regression problems. We show that these leveraging algorithms have desirable theoretical and practical properties. o Can boosting techniques be useful in practice? The presented theoretical results are guided by simulation results either to illustrate properties of the proposed algorithms or to show that they work well in practice. We report on successful applications in a non-intrusive power monitoring system, chaotic time series analysis and a drug discovery process. --- Anmerkung: Der Autor ist Träger des von der Mathematisch-Naturwissenschaftlichen Fakultät der Universität Potsdam vergebenen Michelson-Preises für die beste Promotion des Jahres 2001/2002. In dieser Arbeit werden statistische Lernprobleme betrachtet. Lernmaschinen extrahieren Informationen aus einer gegebenen Menge von Trainingsmustern, so daß sie in der Lage sind, Eigenschaften von bisher ungesehenen Mustern - z.B. eine Klassenzugehörigkeit - vorherzusagen. Wir betrachten den Fall, bei dem die resultierende Klassifikations- oder Regressionsregel aus einfachen Regeln - den Basishypothesen - zusammengesetzt ist. Die sogenannten Boosting Algorithmen erzeugen iterativ eine gewichtete Summe von Basishypothesen, die gut auf ungesehenen Mustern vorhersagen. Die Arbeit behandelt folgende Sachverhalte: o Die zur Analyse von Boosting-Methoden geeignete Statistische Lerntheorie. Wir studieren lerntheoretische Garantien zur Abschätzung der Vorhersagequalität auf ungesehenen Mustern. Kürzlich haben sich sogenannte Klassifikationstechniken mit großem Margin als ein praktisches Ergebnis dieser Theorie herausgestellt - insbesondere Boosting und Support-Vektor-Maschinen. Ein großer Margin impliziert eine hohe Vorhersagequalität der Entscheidungsregel. Deshalb wird analysiert, wie groß der Margin bei Boosting ist und ein verbesserter Algorithmus vorgeschlagen, der effizient Regeln mit maximalem Margin erzeugt. o Was ist der Zusammenhang von Boosting und Techniken der konvexen Optimierung? Um die Eigenschaften der entstehenden Klassifikations- oder Regressionsregeln zu analysieren, ist es sehr wichtig zu verstehen, ob und unter welchen Bedingungen iterative Algorithmen wie Boosting konvergieren. Wir zeigen, daß solche Algorithmen benutzt werden koennen, um sehr große Optimierungsprobleme mit Nebenbedingungen zu lösen, deren Lösung sich gut charakterisieren laesst. Dazu werden Verbindungen zum Wissenschaftsgebiet der konvexen Optimierung aufgezeigt und ausgenutzt, um Konvergenzgarantien für eine große Familie von Boosting-ähnlichen Algorithmen zu geben. o Kann man Boosting robust gegenüber Meßfehlern und Ausreissern in den Daten machen? Ein Problem bisheriger Boosting-Methoden ist die relativ hohe Sensitivität gegenüber Messungenauigkeiten und Meßfehlern in der Trainingsdatenmenge. Um dieses Problem zu beheben, wird die sogenannte 'Soft-Margin' Idee, die beim Support-Vector Lernen schon benutzt wird, auf Boosting übertragen. Das führt zu theoretisch gut motivierten, regularisierten Algorithmen, die ein hohes Maß an Robustheit aufweisen. o Wie kann man die Anwendbarkeit von Boosting auf Regressionsprobleme erweitern? Boosting-Methoden wurden ursprünglich für Klassifikationsprobleme entwickelt. Um die Anwendbarkeit auf Regressionsprobleme zu erweitern, werden die vorherigen Konvergenzresultate benutzt und neue Boosting-ähnliche Algorithmen zur Regression entwickelt. Wir zeigen, daß diese Algorithmen gute theoretische und praktische Eigenschaften haben. o Ist Boosting praktisch anwendbar? Die dargestellten theoretischen Ergebnisse werden begleitet von Simulationsergebnissen, entweder, um bestimmte Eigenschaften von Algorithmen zu illustrieren, oder um zu zeigen, daß sie in der Praxis tatsächlich gut funktionieren und direkt einsetzbar sind. Die praktische Relevanz der entwickelten Methoden wird in der Analyse chaotischer Zeitreihen und durch industrielle Anwendungen wie ein Stromverbrauch-Überwachungssystem und bei der Entwicklung neuer Medikamente illustriert.
Structured Kernel Subspace Learning for Autonomous Robot Navigation.
Kim, Eunwoo; Choi, Sungjoon; Oh, Songhwai
2018-02-14
This paper considers two important problems for autonomous robot navigation in a dynamic environment, where the goal is to predict pedestrian motion and control a robot with the prediction for safe navigation. While there are several methods for predicting the motion of a pedestrian and controlling a robot to avoid incoming pedestrians, it is still difficult to safely navigate in a dynamic environment due to challenges, such as the varying quality and complexity of training data with unwanted noises. This paper addresses these challenges simultaneously by proposing a robust kernel subspace learning algorithm based on the recent advances in nuclear-norm and l 1 -norm minimization. We model the motion of a pedestrian and the robot controller using Gaussian processes. The proposed method efficiently approximates a kernel matrix used in Gaussian process regression by learning low-rank structured matrix (with symmetric positive semi-definiteness) to find an orthogonal basis, which eliminates the effects of erroneous and inconsistent data. Based on structured kernel subspace learning, we propose a robust motion model and motion controller for safe navigation in dynamic environments. We evaluate the proposed robust kernel learning in various tasks, including regression, motion prediction, and motion control problems, and demonstrate that the proposed learning-based systems are robust against outliers and outperform existing regression and navigation methods.
NASA Astrophysics Data System (ADS)
Ahmed, S.; Salucci, M.; Miorelli, R.; Anselmi, N.; Oliveri, G.; Calmon, P.; Reboud, C.; Massa, A.
2017-10-01
A quasi real-time inversion strategy is presented for groove characterization of a conductive non-ferromagnetic tube structure by exploiting eddy current testing (ECT) signal. Inversion problem has been formulated by non-iterative Learning-by-Examples (LBE) strategy. Within the framework of LBE, an efficient training strategy has been adopted with the combination of feature extraction and a customized version of output space filling (OSF) adaptive sampling in order to get optimal training set during offline phase. Partial Least Squares (PLS) and Support Vector Regression (SVR) have been exploited for feature extraction and prediction technique respectively to have robust and accurate real time inversion during online phase.
Bounded-Influence Inference in Regression.
1984-02-01
be viewed as generalization of the classical F-test. By means of the influence function their robustness properties are investigated and optimally...robust tests that maximize the asymptotic power within each class, under the side condition of a bounded influence function , are constructed. Finally, an
Noninvasive and fast measurement of blood glucose in vivo by near infrared (NIR) spectroscopy
NASA Astrophysics Data System (ADS)
Jintao, Xue; Liming, Ye; Yufei, Liu; Chunyan, Li; Han, Chen
2017-05-01
This research was to develop a method for noninvasive and fast blood glucose assay in vivo. Near-infrared (NIR) spectroscopy, a more promising technique compared to other methods, was investigated in rats with diabetes and normal rats. Calibration models are generated by two different multivariate strategies: partial least squares (PLS) as linear regression method and artificial neural networks (ANN) as non-linear regression method. The PLS model was optimized individually by considering spectral range, spectral pretreatment methods and number of model factors, while the ANN model was studied individually by selecting spectral pretreatment methods, parameters of network topology, number of hidden neurons, and times of epoch. The results of the validation showed the two models were robust, accurate and repeatable. Compared to the ANN model, the performance of the PLS model was much better, with lower root mean square error of validation (RMSEP) of 0.419 and higher correlation coefficients (R) of 96.22%.
Genotype-phenotype association study via new multi-task learning model
Huo, Zhouyuan; Shen, Dinggang
2018-01-01
Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2,1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2,1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs. PMID:29218896
YoTube: Searching Action Proposal Via Recurrent and Static Regression Networks
NASA Astrophysics Data System (ADS)
Zhu, Hongyuan; Vial, Romain; Lu, Shijian; Peng, Xi; Fu, Huazhu; Tian, Yonghong; Cao, Xianbin
2018-06-01
In this paper, we present YoTube-a novel network fusion framework for searching action proposals in untrimmed videos, where each action proposal corresponds to a spatialtemporal video tube that potentially locates one human action. Our method consists of a recurrent YoTube detector and a static YoTube detector, where the recurrent YoTube explores the regression capability of RNN for candidate bounding boxes predictions using learnt temporal dynamics and the static YoTube produces the bounding boxes using rich appearance cues in a single frame. Both networks are trained using rgb and optical flow in order to fully exploit the rich appearance, motion and temporal context, and their outputs are fused to produce accurate and robust proposal boxes. Action proposals are finally constructed by linking these boxes using dynamic programming with a novel trimming method to handle the untrimmed video effectively and efficiently. Extensive experiments on the challenging UCF-101 and UCF-Sports datasets show that our proposed technique obtains superior performance compared with the state-of-the-art.
Mammalian cell culture monitoring using in situ spectroscopy: Is your method really optimised?
André, Silvère; Lagresle, Sylvain; Hannas, Zahia; Calvosa, Éric; Duponchel, Ludovic
2017-03-01
In recent years, as a result of the process analytical technology initiative of the US Food and Drug Administration, many different works have been carried out on direct and in situ monitoring of critical parameters for mammalian cell cultures by Raman spectroscopy and multivariate regression techniques. However, despite interesting results, it cannot be said that the proposed monitoring strategies, which will reduce errors of the regression models and thus confidence limits of the predictions, are really optimized. Hence, the aim of this article is to optimize some critical steps of spectroscopic acquisition and data treatment in order to reach a higher level of accuracy and robustness of bioprocess monitoring. In this way, we propose first an original strategy to assess the most suited Raman acquisition time for the processes involved. In a second part, we demonstrate the importance of the interbatch variability on the accuracy of the predictive models with a particular focus on the optical probes adjustment. Finally, we propose a methodology for the optimization of the spectral variables selection in order to decrease prediction errors of multivariate regressions. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:308-316, 2017. © 2017 American Institute of Chemical Engineers.
MicroCT angiography detects vascular formation and regression in skin wound healing
Urao, Norifumi; Okonkwo, Uzoagu A.; Fang, Milie M.; Zhuang, Zhen W.; Koh, Timothy J.; DiPietro, Luisa A.
2016-01-01
Properly regulated angiogenesis and arteriogenesis are essential for effective wound healing. Tissue injury induces robust new vessel formation and subsequent vessel maturation, which involves vessel regression and remodeling. Although formation of functional vasculature is essential for healing, alterations in vascular structure over the time course of skin wound healing are not well understood. Here, using high-resolution ex vivo X-ray micro-computed tomography (microCT), we describe the vascular network during healing of skin excisional wounds with highly detailed three-dimensional (3D) reconstructed images and associated quantitative analysis. We found that relative vessel volume, surface area and branching number are significantly decreased in wounds from day 7 to day 14 and 21. Segmentation and skeletonization analysis of selected branches from high-resolution images as small as 2.5 μm voxel size show that branching orders are decreased in the wound vessels during healing. In histological analysis, we found that the contrast agent fills mainly arterioles, but not small capillaries nor large veins. In summary, high-resolution microCT revealed dynamic alterations of vessel structures during wound healing. This technique may be useful as a key tool in the study of the formation and regression of wound vessels. PMID:27009591
Doubly Robust Additive Hazards Models to Estimate Effects of a Continuous Exposure on Survival.
Wang, Yan; Lee, Mihye; Liu, Pengfei; Shi, Liuhua; Yu, Zhi; Abu Awad, Yara; Zanobetti, Antonella; Schwartz, Joel D
2017-11-01
The effect of an exposure on survival can be biased when the regression model is misspecified. Hazard difference is easier to use in risk assessment than hazard ratio and has a clearer interpretation in the assessment of effect modifications. We proposed two doubly robust additive hazards models to estimate the causal hazard difference of a continuous exposure on survival. The first model is an inverse probability-weighted additive hazards regression. The second model is an extension of the doubly robust estimator for binary exposures by categorizing the continuous exposure. We compared these with the marginal structural model and outcome regression with correct and incorrect model specifications using simulations. We applied doubly robust additive hazard models to the estimation of hazard difference of long-term exposure to PM2.5 (particulate matter with an aerodynamic diameter less than or equal to 2.5 microns) on survival using a large cohort of 13 million older adults residing in seven states of the Southeastern United States. We showed that the proposed approaches are doubly robust. We found that each 1 μg m increase in annual PM2.5 exposure was associated with a causal hazard difference in mortality of 8.0 × 10 (95% confidence interval 7.4 × 10, 8.7 × 10), which was modified by age, medical history, socioeconomic status, and urbanicity. The overall hazard difference translates to approximately 5.5 (5.1, 6.0) thousand deaths per year in the study population. The proposed approaches improve the robustness of the additive hazards model and produce a novel additive causal estimate of PM2.5 on survival and several additive effect modifications, including social inequality.
Chang, Yeong-Chan
2005-12-01
This paper addresses the problem of designing adaptive fuzzy-based (or neural network-based) robust controls for a large class of uncertain nonlinear time-varying systems. This class of systems can be perturbed by plant uncertainties, unmodeled perturbations, and external disturbances. Nonlinear H(infinity) control technique incorporated with adaptive control technique and VSC technique is employed to construct the intelligent robust stabilization controller such that an H(infinity) control is achieved. The problem of the robust tracking control design for uncertain robotic systems is employed to demonstrate the effectiveness of the developed robust stabilization control scheme. Therefore, an intelligent robust tracking controller for uncertain robotic systems in the presence of high-degree uncertainties can easily be implemented. Its solution requires only to solve a linear algebraic matrix inequality and a satisfactorily transient and asymptotical tracking performance is guaranteed. A simulation example is made to confirm the performance of the developed control algorithms.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework.
Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S
2011-09-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework
Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.
2012-01-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015
Skeletal Correlates for Body Mass Estimation in Modern and Fossil Flying Birds
Field, Daniel J.; Lynner, Colton; Brown, Christian; Darroch, Simon A. F.
2013-01-01
Scaling relationships between skeletal dimensions and body mass in extant birds are often used to estimate body mass in fossil crown-group birds, as well as in stem-group avialans. However, useful statistical measurements for constraining the precision and accuracy of fossil mass estimates are rarely provided, which prevents the quantification of robust upper and lower bound body mass estimates for fossils. Here, we generate thirteen body mass correlations and associated measures of statistical robustness using a sample of 863 extant flying birds. By providing robust body mass regressions with upper- and lower-bound prediction intervals for individual skeletal elements, we address the longstanding problem of body mass estimation for highly fragmentary fossil birds. We demonstrate that the most precise proxy for estimating body mass in the overall dataset, measured both as coefficient determination of ordinary least squares regression and percent prediction error, is the maximum diameter of the coracoid’s humeral articulation facet (the glenoid). We further demonstrate that this result is consistent among the majority of investigated avian orders (10 out of 18). As a result, we suggest that, in the majority of cases, this proxy may provide the most accurate estimates of body mass for volant fossil birds. Additionally, by presenting statistical measurements of body mass prediction error for thirteen different body mass regressions, this study provides a much-needed quantitative framework for the accurate estimation of body mass and associated ecological correlates in fossil birds. The application of these regressions will enhance the precision and robustness of many mass-based inferences in future paleornithological studies. PMID:24312392
A Robust Shape Reconstruction Method for Facial Feature Point Detection.
Tan, Shuqiu; Chen, Dongyi; Guo, Chenggang; Huang, Zhiqi
2017-01-01
Facial feature point detection has been receiving great research advances in recent years. Numerous methods have been developed and applied in practical face analysis systems. However, it is still a quite challenging task because of the large variability in expression and gestures and the existence of occlusions in real-world photo shoot. In this paper, we present a robust sparse reconstruction method for the face alignment problems. Instead of a direct regression between the feature space and the shape space, the concept of shape increment reconstruction is introduced. Moreover, a set of coupled overcomplete dictionaries termed the shape increment dictionary and the local appearance dictionary are learned in a regressive manner to select robust features and fit shape increments. Additionally, to make the learned model more generalized, we select the best matched parameter set through extensive validation tests. Experimental results on three public datasets demonstrate that the proposed method achieves a better robustness over the state-of-the-art methods.
NASA Technical Reports Server (NTRS)
Newsom, J. R.; Mukhopadhyay, V.
1983-01-01
A method for designing robust feedback controllers for multiloop systems is presented. Robustness is characterized in terms of the minimum singular value of the system return difference matrix at the plant input. Analytical gradients of the singular values with respect to design variables in the controller are derived. A cumulative measure of the singular values and their gradients with respect to the design variables is used with a numerical optimization technique to increase the system's robustness. Both unconstrained and constrained optimization techniques are evaluated. Numerical results are presented for a two-input/two-output drone flight control system.
NASA Technical Reports Server (NTRS)
Newsom, J. R.; Mukhopadhyay, V.
1983-01-01
A method for designing robust feedback controllers for multiloop systems is presented. Robustness is characterized in terms of the minimum singular value of the system return difference matrix at the plant input. Analytical gradients of the singular values with respect to design variables in the controller are derived. A cumulative measure of the singular values and their gradients with respect to the design variables is used with a numerical optimization technique to increase the system's robustness. Both unconstrained and constrained optimization techniques are evaluated. Numerical results are presented for a two output drone flight control system.
Anesthesia Technique and Outcomes of Mechanical Thrombectomy in Patients With Acute Ischemic Stroke.
Bekelis, Kimon; Missios, Symeon; MacKenzie, Todd A; Tjoumakaris, Stavropoula; Jabbour, Pascal
2017-02-01
The impact of anesthesia technique on the outcomes of mechanical thrombectomy for acute ischemic stroke remains an issue of debate. We investigated the association of general anesthesia with outcomes in patients undergoing mechanical thrombectomy for ischemic stroke. We performed a cohort study involving patients undergoing mechanical thrombectomy for ischemic stroke from 2009 to 2013, who were registered in the New York Statewide Planning and Research Cooperative System database. An instrumental variable (hospital rate of general anesthesia) analysis was used to simulate the effects of randomization and investigate the association of anesthesia technique with case-fatality and length of stay. Among 1174 patients, 441 (37.6%) underwent general anesthesia and 733 (62.4%) underwent conscious sedation. Using an instrumental variable analysis, we identified that general anesthesia was associated with a 6.4% increased case-fatality (95% confidence interval, 1.9%-11.0%) and 8.4 days longer length of stay (95% confidence interval, 2.9-14.0) in comparison to conscious sedation. This corresponded to 15 patients needing to be treated with conscious sedation to prevent 1 death. Our results were robust in sensitivity analysis with mixed effects regression and propensity score-adjusted regression models. Using a comprehensive all-payer cohort of acute ischemic stroke patients undergoing mechanical thrombectomy in New York State, we identified an association of general anesthesia with increased case-fatality and length of stay. These considerations should be taken into account when standardizing acute stroke care. © 2017 American Heart Association, Inc.
Using Robust Standard Errors to Combine Multiple Regression Estimates with Meta-Analysis
ERIC Educational Resources Information Center
Williams, Ryan T.
2012-01-01
Combining multiple regression estimates with meta-analysis has continued to be a difficult task. A variety of methods have been proposed and used to combine multiple regression slope estimates with meta-analysis, however, most of these methods have serious methodological and practical limitations. The purpose of this study was to explore the use…
Are Assumptions of Well-Known Statistical Techniques Checked, and Why (Not)?
Hoekstra, Rink; Kiers, Henk A. L.; Johnson, Addie
2012-01-01
A valid interpretation of most statistical techniques requires that one or more assumptions be met. In published articles, however, little information tends to be reported on whether the data satisfy the assumptions underlying the statistical techniques used. This could be due to self-selection: Only manuscripts with data fulfilling the assumptions are submitted. Another explanation could be that violations of assumptions are rarely checked for in the first place. We studied whether and how 30 researchers checked fictitious data for violations of assumptions in their own working environment. Participants were asked to analyze the data as they would their own data, for which often used and well-known techniques such as the t-procedure, ANOVA and regression (or non-parametric alternatives) were required. It was found that the assumptions of the techniques were rarely checked, and that if they were, it was regularly by means of a statistical test. Interviews afterward revealed a general lack of knowledge about assumptions, the robustness of the techniques with regards to the assumptions, and how (or whether) assumptions should be checked. These data suggest that checking for violations of assumptions is not a well-considered choice, and that the use of statistics can be described as opportunistic. PMID:22593746
Testing Gene-Gene Interactions in the Case-Parents Design
Yu, Zhaoxia
2011-01-01
The case-parents design has been widely used to detect genetic associations as it can prevent spurious association that could occur in population-based designs. When examining the effect of an individual genetic locus on a disease, logistic regressions developed by conditioning on parental genotypes provide complete protection from spurious association caused by population stratification. However, when testing gene-gene interactions, it is unknown whether conditional logistic regressions are still robust. Here we evaluate the robustness and efficiency of several gene-gene interaction tests that are derived from conditional logistic regressions. We found that in the presence of SNP genotype correlation due to population stratification or linkage disequilibrium, tests with incorrectly specified main-genetic-effect models can lead to inflated type I error rates. We also found that a test with fully flexible main genetic effects always maintains correct test size and its robustness can be achieved with negligible sacrifice of its power. When testing gene-gene interactions is the focus, the test allowing fully flexible main effects is recommended to be used. PMID:21778736
A subagging regression method for estimating the qualitative and quantitative state of groundwater
NASA Astrophysics Data System (ADS)
Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young
2017-08-01
A subsample aggregating (subagging) regression (SBR) method for the analysis of groundwater data pertaining to trend-estimation-associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of other methods, and the uncertainties are reasonably estimated; the others have no uncertainty analysis option. To validate further, actual groundwater data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by both SBR and GPR regardless of Gaussian or non-Gaussian skewed data. However, it is expected that GPR has a limitation in applications to severely corrupted data by outliers owing to its non-robustness. From the implementations, it is determined that the SBR method has the potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data such as the groundwater level and contaminant concentration.
Robust control synthesis for uncertain dynamical systems
NASA Technical Reports Server (NTRS)
Byun, Kuk-Whan; Wie, Bong; Sunkel, John
1989-01-01
This paper presents robust control synthesis techniques for uncertain dynamical systems subject to structured parameter perturbation. Both QFT (quantitative feedback theory) and H-infinity control synthesis techniques are investigated. Although most H-infinity-related control techniques are not concerned with the structured parameter perturbation, a new way of incorporating the parameter uncertainty in the robust H-infinity control design is presented. A generic model of uncertain dynamical systems is used to illustrate the design methodologies investigated in this paper. It is shown that, for a certain noncolocated structural control problem, use of both techniques results in nonminimum phase compensation.
Improving power and robustness for detecting genetic association with extreme-value sampling design.
Chen, Hua Yun; Li, Mingyao
2011-12-01
Extreme-value sampling design that samples subjects with extremely large or small quantitative trait values is commonly used in genetic association studies. Samples in such designs are often treated as "cases" and "controls" and analyzed using logistic regression. Such a case-control analysis ignores the potential dose-response relationship between the quantitative trait and the underlying trait locus and thus may lead to loss of power in detecting genetic association. An alternative approach to analyzing such data is to model the dose-response relationship by a linear regression model. However, parameter estimation from this model can be biased, which may lead to inflated type I errors. We propose a robust and efficient approach that takes into consideration of both the biased sampling design and the potential dose-response relationship. Extensive simulations demonstrate that the proposed method is more powerful than the traditional logistic regression analysis and is more robust than the linear regression analysis. We applied our method to the analysis of a candidate gene association study on high-density lipoprotein cholesterol (HDL-C) which includes study subjects with extremely high or low HDL-C levels. Using our method, we identified several SNPs showing a stronger evidence of association with HDL-C than the traditional case-control logistic regression analysis. Our results suggest that it is important to appropriately model the quantitative traits and to adjust for the biased sampling when dose-response relationship exists in extreme-value sampling designs. © 2011 Wiley Periodicals, Inc.
Monitoring TASCC Injections Using A Field-Ready Wet Chemistry Nutrient Autoanalyzer
NASA Astrophysics Data System (ADS)
Snyder, L. E.; Herstand, M. R.; Bowden, W. B.
2011-12-01
Quantification of nutrient cycling and transport (spiraling) in stream systems is a fundamental component of stream ecology. Additions of isotopic tracer and bulk inorganic nutrient to streams have been frequently used to evaluate nutrient transfer between ecosystem compartments and nutrient uptake estimation, respectively. The Tracer Addition for Spiraling Curve Characterization (TASCC) methodology of Covino et al. (2010) instantaneously and simultaneously adds conservative and biologically active tracers to a stream system to quantify nutrient uptake metrics. In this method, comparing the ratio of mass of nutrient and conservative solute recovered in each sample throughout a breakthrough curve to that of the injectate, a distribution of spiraling metrics is calculated across a range of nutrient concentrations. This distribution across concentrations allows for both a robust estimation of ambient spiraling parameters by regression techniques, and comparison with uptake kinetic models. We tested a unique sampling strategy for TASCC injections in which samples were taken manually throughout the nutrient breakthrough curves while, simultaneously, continuously monitoring with a field-ready wet chemistry autoanalyzer. The autoanalyzer was programmed to measure concentrations of nitrate, phosphate and ammonium at the rate of one measurement per second throughout each experiment. Utilization of an autoanalyzer in the field during the experiment results in the return of several thousand additional nutrient data points when compared with manual sampling. This technique, then, allows for a deeper understanding and more statistically robust estimation of stream nutrient spiraling parameters.
Robust and efficient estimation with weighted composite quantile regression
NASA Astrophysics Data System (ADS)
Jiang, Xuejun; Li, Jingzhi; Xia, Tian; Yan, Wanfeng
2016-09-01
In this paper we introduce a weighted composite quantile regression (CQR) estimation approach and study its application in nonlinear models such as exponential models and ARCH-type models. The weighted CQR is augmented by using a data-driven weighting scheme. With the error distribution unspecified, the proposed estimators share robustness from quantile regression and achieve nearly the same efficiency as the oracle maximum likelihood estimator (MLE) for a variety of error distributions including the normal, mixed-normal, Student's t, Cauchy distributions, etc. We also suggest an algorithm for the fast implementation of the proposed methodology. Simulations are carried out to compare the performance of different estimators, and the proposed approach is used to analyze the daily S&P 500 Composite index, which verifies the effectiveness and efficiency of our theoretical results.
NASA Astrophysics Data System (ADS)
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Li, Haojie; Graham, Daniel J
2016-08-01
This paper estimates the causal effect of 20mph zones on road casualties in London. Potential confounders in the key relationship of interest are included within outcome regression and propensity score models, and the models are then combined to form a doubly robust estimator. A total of 234 treated zones and 2844 potential control zones are included in the data sample. The propensity score model is used to select a viable control group which has common support in the covariate distributions. We compare the doubly robust estimates with those obtained using three other methods: inverse probability weighting, regression adjustment, and propensity score matching. The results indicate that 20mph zones have had a significant causal impact on road casualty reduction in both absolute and proportional terms. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sumatran tiger survival threatened by deforestation despite increasing densities in parks.
Luskin, Matthew Scott; Albert, Wido Rizki; Tobler, Mathias W
2017-12-05
The continuing development of improved capture-recapture (CR) modeling techniques used to study apex predators has also limited robust temporal and cross-site analyses due to different methods employed. We develop an approach to standardize older non-spatial CR and newer spatial CR density estimates and examine trends for critically endangered Sumatran tigers (Panthera tigris sumatrae) using a meta-regression of 17 existing densities and new estimates from our own fieldwork. We find that tiger densities were 47% higher in primary versus degraded forests and, unexpectedly, increased 4.9% per yr from 1996 to 2014, likely indicating a recovery from earlier poaching. However, while tiger numbers may have temporarily risen, the total potential island-wide population declined by 16.6% from 2000 to 2012 due to forest loss and degradation and subpopulations are significantly more fragmented. Thus, despite increasing densities in smaller parks, we conclude that there are only two robust populations left with >30 breeding females, indicating Sumatran tigers still face a high risk of extinction unless deforestation can be controlled.
Benign-malignant mass classification in mammogram using edge weighted local texture features
NASA Astrophysics Data System (ADS)
Rabidas, Rinku; Midya, Abhishek; Sadhu, Anup; Chakraborty, Jayasree
2016-03-01
This paper introduces novel Discriminative Robust Local Binary Pattern (DRLBP) and Discriminative Robust Local Ternary Pattern (DRLTP) for the classification of mammographic masses as benign or malignant. Mass is one of the common, however, challenging evidence of breast cancer in mammography and diagnosis of masses is a difficult task. Since DRLBP and DRLTP overcome the drawbacks of Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) by discriminating a brighter object against the dark background and vice-versa, in addition to the preservation of the edge information along with the texture information, several edge-preserving texture features are extracted, in this study, from DRLBP and DRLTP. Finally, a Fisher Linear Discriminant Analysis method is incorporated with discriminating features, selected by stepwise logistic regression method, for the classification of benign and malignant masses. The performance characteristics of DRLBP and DRLTP features are evaluated using a ten-fold cross-validation technique with 58 masses from the mini-MIAS database, and the best result is observed with DRLBP having an area under the receiver operating characteristic curve of 0.982.
Robustness of meta-analyses in finding gene × environment interactions
Shi, Gang; Nehorai, Arye
2017-01-01
Meta-analyses that synthesize statistical evidence across studies have become important analytical tools for genetic studies. Inspired by the success of genome-wide association studies of the genetic main effect, researchers are searching for gene × environment interactions. Confounders are routinely included in the genome-wide gene × environment interaction analysis as covariates; however, this does not control for any confounding effects on the results if covariate × environment interactions are present. We carried out simulation studies to evaluate the robustness to the covariate × environment confounder for meta-regression and joint meta-analysis, which are two commonly used meta-analysis methods for testing the gene × environment interaction or the genetic main effect and interaction jointly. Here we show that meta-regression is robust to the covariate × environment confounder while joint meta-analysis is subject to the confounding effect with inflated type I error rates. Given vast sample sizes employed in genome-wide gene × environment interaction studies, non-significant covariate × environment interactions at the study level could substantially elevate the type I error rate at the consortium level. When covariate × environment confounders are present, type I errors can be controlled in joint meta-analysis by including the covariate × environment terms in the analysis at the study level. Alternatively, meta-regression can be applied, which is robust to potential covariate × environment confounders. PMID:28362796
TU-AB-BRB-01: Coverage Evaluation and Probabilistic Treatment Planning as a Margin Alternative
DOE Office of Scientific and Technical Information (OSTI.GOV)
Siebers, J.
The accepted clinical method to accommodate targeting uncertainties inherent in fractionated external beam radiation therapy is to utilize GTV-to-CTV and CTV-to-PTV margins during the planning process to design a PTV-conformal static dose distribution on the planning image set. Ideally, margins are selected to ensure a high (e.g. >95%) target coverage probability (CP) in spite of inherent inter- and intra-fractional positional variations, tissue motions, and initial contouring uncertainties. Robust optimization techniques, also known as probabilistic treatment planning techniques, explicitly incorporate the dosimetric consequences of targeting uncertainties by including CP evaluation into the planning optimization process along with coverage-based planning objectives. Themore » treatment planner no longer needs to use PTV and/or PRV margins; instead robust optimization utilizes probability distributions of the underlying uncertainties in conjunction with CP-evaluation for the underlying CTVs and OARs to design an optimal treated volume. This symposium will describe CP-evaluation methods as well as various robust planning techniques including use of probability-weighted dose distributions, probability-weighted objective functions, and coverage optimized planning. Methods to compute and display the effect of uncertainties on dose distributions will be presented. The use of robust planning to accommodate inter-fractional setup uncertainties, organ deformation, and contouring uncertainties will be examined as will its use to accommodate intra-fractional organ motion. Clinical examples will be used to inter-compare robust and margin-based planning, highlighting advantages of robust-plans in terms of target and normal tissue coverage. Robust-planning limitations as uncertainties approach zero and as the number of treatment fractions becomes small will be presented, as well as the factors limiting clinical implementation of robust planning. Learning Objectives: To understand robust-planning as a clinical alternative to using margin-based planning. To understand conceptual differences between uncertainty and predictable motion. To understand fundamental limitations of the PTV concept that probabilistic planning can overcome. To understand the major contributing factors to target and normal tissue coverage probability. To understand the similarities and differences of various robust planning techniques To understand the benefits and limitations of robust planning techniques.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, H.
The accepted clinical method to accommodate targeting uncertainties inherent in fractionated external beam radiation therapy is to utilize GTV-to-CTV and CTV-to-PTV margins during the planning process to design a PTV-conformal static dose distribution on the planning image set. Ideally, margins are selected to ensure a high (e.g. >95%) target coverage probability (CP) in spite of inherent inter- and intra-fractional positional variations, tissue motions, and initial contouring uncertainties. Robust optimization techniques, also known as probabilistic treatment planning techniques, explicitly incorporate the dosimetric consequences of targeting uncertainties by including CP evaluation into the planning optimization process along with coverage-based planning objectives. Themore » treatment planner no longer needs to use PTV and/or PRV margins; instead robust optimization utilizes probability distributions of the underlying uncertainties in conjunction with CP-evaluation for the underlying CTVs and OARs to design an optimal treated volume. This symposium will describe CP-evaluation methods as well as various robust planning techniques including use of probability-weighted dose distributions, probability-weighted objective functions, and coverage optimized planning. Methods to compute and display the effect of uncertainties on dose distributions will be presented. The use of robust planning to accommodate inter-fractional setup uncertainties, organ deformation, and contouring uncertainties will be examined as will its use to accommodate intra-fractional organ motion. Clinical examples will be used to inter-compare robust and margin-based planning, highlighting advantages of robust-plans in terms of target and normal tissue coverage. Robust-planning limitations as uncertainties approach zero and as the number of treatment fractions becomes small will be presented, as well as the factors limiting clinical implementation of robust planning. Learning Objectives: To understand robust-planning as a clinical alternative to using margin-based planning. To understand conceptual differences between uncertainty and predictable motion. To understand fundamental limitations of the PTV concept that probabilistic planning can overcome. To understand the major contributing factors to target and normal tissue coverage probability. To understand the similarities and differences of various robust planning techniques To understand the benefits and limitations of robust planning techniques.« less
TU-AB-BRB-02: Stochastic Programming Methods for Handling Uncertainty and Motion in IMRT Planning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Unkelbach, J.
The accepted clinical method to accommodate targeting uncertainties inherent in fractionated external beam radiation therapy is to utilize GTV-to-CTV and CTV-to-PTV margins during the planning process to design a PTV-conformal static dose distribution on the planning image set. Ideally, margins are selected to ensure a high (e.g. >95%) target coverage probability (CP) in spite of inherent inter- and intra-fractional positional variations, tissue motions, and initial contouring uncertainties. Robust optimization techniques, also known as probabilistic treatment planning techniques, explicitly incorporate the dosimetric consequences of targeting uncertainties by including CP evaluation into the planning optimization process along with coverage-based planning objectives. Themore » treatment planner no longer needs to use PTV and/or PRV margins; instead robust optimization utilizes probability distributions of the underlying uncertainties in conjunction with CP-evaluation for the underlying CTVs and OARs to design an optimal treated volume. This symposium will describe CP-evaluation methods as well as various robust planning techniques including use of probability-weighted dose distributions, probability-weighted objective functions, and coverage optimized planning. Methods to compute and display the effect of uncertainties on dose distributions will be presented. The use of robust planning to accommodate inter-fractional setup uncertainties, organ deformation, and contouring uncertainties will be examined as will its use to accommodate intra-fractional organ motion. Clinical examples will be used to inter-compare robust and margin-based planning, highlighting advantages of robust-plans in terms of target and normal tissue coverage. Robust-planning limitations as uncertainties approach zero and as the number of treatment fractions becomes small will be presented, as well as the factors limiting clinical implementation of robust planning. Learning Objectives: To understand robust-planning as a clinical alternative to using margin-based planning. To understand conceptual differences between uncertainty and predictable motion. To understand fundamental limitations of the PTV concept that probabilistic planning can overcome. To understand the major contributing factors to target and normal tissue coverage probability. To understand the similarities and differences of various robust planning techniques To understand the benefits and limitations of robust planning techniques.« less
TU-AB-BRB-00: New Methods to Ensure Target Coverage
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
2015-06-15
The accepted clinical method to accommodate targeting uncertainties inherent in fractionated external beam radiation therapy is to utilize GTV-to-CTV and CTV-to-PTV margins during the planning process to design a PTV-conformal static dose distribution on the planning image set. Ideally, margins are selected to ensure a high (e.g. >95%) target coverage probability (CP) in spite of inherent inter- and intra-fractional positional variations, tissue motions, and initial contouring uncertainties. Robust optimization techniques, also known as probabilistic treatment planning techniques, explicitly incorporate the dosimetric consequences of targeting uncertainties by including CP evaluation into the planning optimization process along with coverage-based planning objectives. Themore » treatment planner no longer needs to use PTV and/or PRV margins; instead robust optimization utilizes probability distributions of the underlying uncertainties in conjunction with CP-evaluation for the underlying CTVs and OARs to design an optimal treated volume. This symposium will describe CP-evaluation methods as well as various robust planning techniques including use of probability-weighted dose distributions, probability-weighted objective functions, and coverage optimized planning. Methods to compute and display the effect of uncertainties on dose distributions will be presented. The use of robust planning to accommodate inter-fractional setup uncertainties, organ deformation, and contouring uncertainties will be examined as will its use to accommodate intra-fractional organ motion. Clinical examples will be used to inter-compare robust and margin-based planning, highlighting advantages of robust-plans in terms of target and normal tissue coverage. Robust-planning limitations as uncertainties approach zero and as the number of treatment fractions becomes small will be presented, as well as the factors limiting clinical implementation of robust planning. Learning Objectives: To understand robust-planning as a clinical alternative to using margin-based planning. To understand conceptual differences between uncertainty and predictable motion. To understand fundamental limitations of the PTV concept that probabilistic planning can overcome. To understand the major contributing factors to target and normal tissue coverage probability. To understand the similarities and differences of various robust planning techniques To understand the benefits and limitations of robust planning techniques.« less
Jha, Abhinav K; Caffo, Brian; Frey, Eric C
2016-01-01
The objective optimization and evaluation of nuclear-medicine quantitative imaging methods using patient data is highly desirable but often hindered by the lack of a gold standard. Previously, a regression-without-truth (RWT) approach has been proposed for evaluating quantitative imaging methods in the absence of a gold standard, but this approach implicitly assumes that bounds on the distribution of true values are known. Several quantitative imaging methods in nuclear-medicine imaging measure parameters where these bounds are not known, such as the activity concentration in an organ or the volume of a tumor. We extended upon the RWT approach to develop a no-gold-standard (NGS) technique for objectively evaluating such quantitative nuclear-medicine imaging methods with patient data in the absence of any ground truth. Using the parameters estimated with the NGS technique, a figure of merit, the noise-to-slope ratio (NSR), can be computed, which can rank the methods on the basis of precision. An issue with NGS evaluation techniques is the requirement of a large number of patient studies. To reduce this requirement, the proposed method explored the use of multiple quantitative measurements from the same patient, such as the activity concentration values from different organs in the same patient. The proposed technique was evaluated using rigorous numerical experiments and using data from realistic simulation studies. The numerical experiments demonstrated that the NSR was estimated accurately using the proposed NGS technique when the bounds on the distribution of true values were not precisely known, thus serving as a very reliable metric for ranking the methods on the basis of precision. In the realistic simulation study, the NGS technique was used to rank reconstruction methods for quantitative single-photon emission computed tomography (SPECT) based on their performance on the task of estimating the mean activity concentration within a known volume of interest. Results showed that the proposed technique provided accurate ranking of the reconstruction methods for 97.5% of the 50 noise realizations. Further, the technique was robust to the choice of evaluated reconstruction methods. The simulation study pointed to possible violations of the assumptions made in the NGS technique under clinical scenarios. However, numerical experiments indicated that the NGS technique was robust in ranking methods even when there was some degree of such violation. PMID:26982626
Jha, Abhinav K; Caffo, Brian; Frey, Eric C
2016-04-07
The objective optimization and evaluation of nuclear-medicine quantitative imaging methods using patient data is highly desirable but often hindered by the lack of a gold standard. Previously, a regression-without-truth (RWT) approach has been proposed for evaluating quantitative imaging methods in the absence of a gold standard, but this approach implicitly assumes that bounds on the distribution of true values are known. Several quantitative imaging methods in nuclear-medicine imaging measure parameters where these bounds are not known, such as the activity concentration in an organ or the volume of a tumor. We extended upon the RWT approach to develop a no-gold-standard (NGS) technique for objectively evaluating such quantitative nuclear-medicine imaging methods with patient data in the absence of any ground truth. Using the parameters estimated with the NGS technique, a figure of merit, the noise-to-slope ratio (NSR), can be computed, which can rank the methods on the basis of precision. An issue with NGS evaluation techniques is the requirement of a large number of patient studies. To reduce this requirement, the proposed method explored the use of multiple quantitative measurements from the same patient, such as the activity concentration values from different organs in the same patient. The proposed technique was evaluated using rigorous numerical experiments and using data from realistic simulation studies. The numerical experiments demonstrated that the NSR was estimated accurately using the proposed NGS technique when the bounds on the distribution of true values were not precisely known, thus serving as a very reliable metric for ranking the methods on the basis of precision. In the realistic simulation study, the NGS technique was used to rank reconstruction methods for quantitative single-photon emission computed tomography (SPECT) based on their performance on the task of estimating the mean activity concentration within a known volume of interest. Results showed that the proposed technique provided accurate ranking of the reconstruction methods for 97.5% of the 50 noise realizations. Further, the technique was robust to the choice of evaluated reconstruction methods. The simulation study pointed to possible violations of the assumptions made in the NGS technique under clinical scenarios. However, numerical experiments indicated that the NGS technique was robust in ranking methods even when there was some degree of such violation.
REGRESSION MODELS OF RESIDENTIAL EXPOSURE TO CHLORPYRIFOS AND DIAZINON
This study examines the ability of regression models to predict residential exposures to chlorpyrifos and diazinon, based on the information from the NHEXAS-AZ database. The robust method was used to generate "fill-in" values for samples that are below the detection l...
Esteki, M; Nouroozi, S; Shahsavari, Z
2016-02-01
To develop a simple and efficient spectrophotometric technique combined with chemometrics for the simultaneous determination of methyl paraben (MP) and hydroquinone (HQ) in cosmetic products, and specifically, to: (i) evaluate the potential use of successive projections algorithm (SPA) to derivative spectrophotometric data in order to provide sufficient accuracy and model robustness and (ii) determine MP and HQ concentration in cosmetics without tedious pre-treatments such as derivatization or extraction techniques which are time-consuming and require hazardous solvents. The absorption spectra were measured in the wavelength range of 200-350 nm. Prior to performing chemometric models, the original and first-derivative absorption spectra of binary mixtures were used as calibration matrices. Variable selected by successive projections algorithm was used to obtain multiple linear regression (MLR) models based on a small subset of wavelengths. The number of wavelengths and the starting vector were optimized, and the comparison of the root mean square error of calibration (RMSEC) and cross-validation (RMSECV) was applied to select effective wavelengths with the least collinearity and redundancy. Principal component regression (PCR) and partial least squares (PLS) were also developed for comparison. The concentrations of the calibration matrix ranged from 0.1 to 20 μg mL(-1) for MP, and from 0.1 to 25 μg mL(-1) for HQ. The constructed models were tested on an external validation data set and finally cosmetic samples. The results indicated that successive projections algorithm-multiple linear regression (SPA-MLR), applied on the first-derivative spectra, achieved the optimal performance for two compounds when compared with the full-spectrum PCR and PLS. The root mean square error of prediction (RMSEP) was 0.083, 0.314 for MP and HQ, respectively. To verify the accuracy of the proposed method, a recovery study on real cosmetic samples was carried out with satisfactory results (84-112%). The proposed method, which is an environmentally friendly approach, using minimum amount of solvent, is a simple, fast and low-cost analysis method that can provide high accuracy and robust models. The suggested method does not need any complex extraction procedure which is time-consuming and requires hazardous solvents. © 2015 Society of Cosmetic Scientists and the Société Française de Cosmétologie.
A Robust Zero-Watermarking Algorithm for Audio
NASA Astrophysics Data System (ADS)
Chen, Ning; Zhu, Jie
2007-12-01
In traditional watermarking algorithms, the insertion of watermark into the host signal inevitably introduces some perceptible quality degradation. Another problem is the inherent conflict between imperceptibility and robustness. Zero-watermarking technique can solve these problems successfully. Instead of embedding watermark, the zero-watermarking technique extracts some essential characteristics from the host signal and uses them for watermark detection. However, most of the available zero-watermarking schemes are designed for still image and their robustness is not satisfactory. In this paper, an efficient and robust zero-watermarking technique for audio signal is presented. The multiresolution characteristic of discrete wavelet transform (DWT), the energy compression characteristic of discrete cosine transform (DCT), and the Gaussian noise suppression property of higher-order cumulant are combined to extract essential features from the host audio signal and they are then used for watermark recovery. Simulation results demonstrate the effectiveness of our scheme in terms of inaudibility, detection reliability, and robustness.
ERIC Educational Resources Information Center
Hollingsworth, Holly H.
This study shows that the test statistic for Analysis of Covariance (ANCOVA) has a noncentral F-districution with noncentrality parameter equal to zero if and only if the regression planes are homogeneous and/or the vector of overall covariate means is the null vector. The effect of heterogeneous regression slope parameters is to either increase…
Standard and Robust Methods in Regression Imputation
ERIC Educational Resources Information Center
Moraveji, Behjat; Jafarian, Koorosh
2014-01-01
The aim of this paper is to provide an introduction of new imputation algorithms for estimating missing values from official statistics in larger data sets of data pre-processing, or outliers. The goal is to propose a new algorithm called IRMI (iterative robust model-based imputation). This algorithm is able to deal with all challenges like…
NASA Astrophysics Data System (ADS)
Achiman, Ori; Mekhmandarov, Yonatan; Pirkner, Moran; Tanny, Josef
2016-04-01
Previous studies have established that the eddy covariance (EC) technique is reliable for whole canopy flux measurements in agricultural crops covered by porous screens, i.e., screenhouses. Nevertheless, the eddy covariance technique remains difficult to apply in the farm due to costs, operational complexity, and post-processing of data - thereby inviting alternative techniques to be developed. The subject of this research was estimating the sensible heat flux by two turbulent transport techniques, namely, Flux-Variance (FV) and Half-order Time Derivative (HTD) whose instrumentation needs and operational demands are not as elaborate as the EC. The FV is based on the standard deviation of high frequency temperature measurements and a similarity constant CT. The HTD method requires mean air temperature and air velocity data. Measurements were carried out in two types of screenhouses: (i) a banana plantation in a light shading (8%) screenhouse; (ii) a pepper crop in a dense insect-proof (50-mesh) screenhouse. In each screenhouse an EC system was deployed for reference and high frequency air temperature measurements were conducted using miniature thermocouples installed at several levels to identify the optimal measurement height. Quality control analysis showed that turbulence development and flow stationarity conditions in the two structures were suitable for flux measurements by the EC technique. Energy balance closure slopes in the two screenhouses were larger than 0.71, in agreement with results for open fields. Regressions between sensible heat flux measured by EC and estimated by FV resulted with CT values that were usually larger than 1, the typical value for open field. In both shading and insect-proof screenhouses the CT value generally increased with height. The optimal measurement height, defined as the height with maximum R2 of the regression between EC and FV sensible heat fluxes, was just above the screen. CT value at optimal height was 2.64 and 1.52 for the shading and insect-proof screenhouses, respectively, with R2 = 0.73 in both types of structures. FV data analysis of the temperature signal at frequencies lower than 10 Hz showed that R2 of these regressions was insensitive to the data analysis frequency up to 0.5 Hz. This suggests that turbulent transport in the screenhouses was governed by large scale vortices. Regressions between EC and HTD sensible heat fluxes resulted with R2 which slightly decreased with height and had values between 0.3 and 0.4 for both screenhouses. The regression slopes also decreased with height and had values between 0.4 and 0.6. We conclude that in screenhouses the FV technique provides a more reliable estimate of the sensible heat flux than the HTD; however, the latter is simpler and more robust in terms of equipment, operation and data analysis and hence may be more attainable for day-to-day use by the growers.
NASA Astrophysics Data System (ADS)
Girinoto, Sadik, Kusman; Indahwati
2017-03-01
The National Socio-Economic Survey samples are designed to produce estimates of parameters of planned domains (provinces and districts). The estimation of unplanned domains (sub-districts and villages) has its limitation to obtain reliable direct estimates. One of the possible solutions to overcome this problem is employing small area estimation techniques. The popular choice of small area estimation is based on linear mixed models. However, such models need strong distributional assumptions and do not easy allow for outlier-robust estimation. As an alternative approach for this purpose, M-quantile regression approach to small area estimation based on modeling specific M-quantile coefficients of conditional distribution of study variable given auxiliary covariates. It obtained outlier-robust estimation from influence function of M-estimator type and also no need strong distributional assumptions. In this paper, the aim of study is to estimate the poverty indicator at sub-district level in Bogor District-West Java using M-quantile models for small area estimation. Using data taken from National Socioeconomic Survey and Villages Potential Statistics, the results provide a detailed description of pattern of incidence and intensity of poverty within Bogor district. We also compare the results with direct estimates. The results showed the framework may be preferable when direct estimate having no incidence of poverty at all in the small area.
NASA Astrophysics Data System (ADS)
Brümmer, C.; Moffat, A. M.; Huth, V.; Augustin, J.; Herbst, M.; Kutsch, W. L.
2016-12-01
Manual carbon dioxide flux measurements with closed chambers at scheduled campaigns are a versatile method to study management effects at small scales in multiple-plot experiments. The eddy covariance technique has the advantage of quasi-continuous measurements but requires large homogeneous areas of a few hectares. To evaluate the uncertainties associated with interpolating from individual campaigns to the whole vegetation period, we installed both techniques at an agricultural site in Northern Germany. The presented comparison covers two cropping seasons, winter oilseed rape in 2012/13 and winter wheat in 2013/14. Modeling half-hourly carbon fluxes from campaigns is commonly performed based on non-linear regressions for the light response and respiration. The daily averages of net CO2 modeled from chamber data deviated from eddy covariance measurements in the range of ± 5 g C m-2 day-1. To understand the observed differences and to disentangle the effects, we performed four additional setups (expert versus default settings of the non-linear regressions based algorithm, purely empirical modeling with artificial neural networks versus non-linear regressions, cross-validating using eddy covariance measurements as campaign fluxes, weekly versus monthly scheduling of campaigns) to model the half-hourly carbon fluxes for the whole vegetation period. The good agreement of the seasonal course of net CO2 at plot and field scale for our agricultural site demonstrates that both techniques are robust and yield consistent results at seasonal time scale even for a managed ecosystem with high temporal dynamics in the fluxes. This allows combining the respective advantages of factorial experiments at plot scale with dense time series data at field scale. Furthermore, the information from the quasi-continuous eddy covariance measurements can be used to derive vegetation proxies to support the interpolation of carbon fluxes in-between the manual chamber campaigns.
Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.
2017-01-01
Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868
MicroCT angiography detects vascular formation and regression in skin wound healing.
Urao, Norifumi; Okonkwo, Uzoagu A; Fang, Milie M; Zhuang, Zhen W; Koh, Timothy J; DiPietro, Luisa A
2016-07-01
Properly regulated angiogenesis and arteriogenesis are essential for effective wound healing. Tissue injury induces robust new vessel formation and subsequent vessel maturation, which involves vessel regression and remodeling. Although formation of functional vasculature is essential for healing, alterations in vascular structure over the time course of skin wound healing are not well understood. Here, using high-resolution ex vivo X-ray micro-computed tomography (microCT), we describe the vascular network during healing of skin excisional wounds with highly detailed three-dimensional (3D) reconstructed images and associated quantitative analysis. We found that relative vessel volume, surface area and branching number are significantly decreased in wounds from day 7 to days 14 and 21. Segmentation and skeletonization analysis of selected branches from high-resolution images as small as 2.5μm voxel size show that branching orders are decreased in the wound vessels during healing. In histological analysis, we found that the contrast agent fills mainly arterioles, but not small capillaries nor large veins. In summary, high-resolution microCT revealed dynamic alterations of vessel structures during wound healing. This technique may be useful as a key tool in the study of the formation and regression of wound vessels. Copyright © 2016 Elsevier Inc. All rights reserved.
Developing global regression models for metabolite concentration prediction regardless of cell line.
André, Silvère; Lagresle, Sylvain; Da Sliva, Anthony; Heimendinger, Pierre; Hannas, Zahia; Calvosa, Éric; Duponchel, Ludovic
2017-11-01
Following the Process Analytical Technology (PAT) of the Food and Drug Administration (FDA), drug manufacturers are encouraged to develop innovative techniques in order to monitor and understand their processes in a better way. Within this framework, it has been demonstrated that Raman spectroscopy coupled with chemometric tools allow to predict critical parameters of mammalian cell cultures in-line and in real time. However, the development of robust and predictive regression models clearly requires many batches in order to take into account inter-batch variability and enhance models accuracy. Nevertheless, this heavy procedure has to be repeated for every new line of cell culture involving many resources. This is why we propose in this paper to develop global regression models taking into account different cell lines. Such models are finally transferred to any culture of the cells involved. This article first demonstrates the feasibility of developing regression models, not only for mammalian cell lines (CHO and HeLa cell cultures), but also for insect cell lines (Sf9 cell cultures). Then global regression models are generated, based on CHO cells, HeLa cells, and Sf9 cells. Finally, these models are evaluated considering a fourth cell line(HEK cells). In addition to suitable predictions of glucose and lactate concentration of HEK cell cultures, we expose that by adding a single HEK-cell culture to the calibration set, the predictive ability of the regression models are substantially increased. In this way, we demonstrate that using global models, it is not necessary to consider many cultures of a new cell line in order to obtain accurate models. Biotechnol. Bioeng. 2017;114: 2550-2559. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Martin, R C; Sawrie, S M; Roth, D L; Gilliam, F G; Faught, E; Morawetz, R B; Kuzniecky, R
1998-10-01
To characterize patterns of base rate change on measures of verbal and visual memory after anterior temporal lobectomy (ATL) using a newly developed regression-based outcome methodology that accounts for effects of practice and regression towards the mean, and to comment on the predictive utility of baseline memory measures on postoperative memory outcome. Memory change was operationalized using regression-based change norms in a group of left (n = 53) and right (n = 48) ATL patients. All patients were administered tests of episodic verbal (prose recall, list learning) and visual (figure reproduction) memory, and semantic memory before and after ATL. ATL patients displayed a wide range of memory outcome across verbal and visual memory domains. Significant performance declines were noted for 25-50% of left ATL patients on verbal semantic and episodic memory tasks, while one-third of right ATL patients displayed significant declines in immediate and delayed episodic prose recall. Significant performance improvement was noted in an additional one-third of right ATL patients on delayed prose recall. Base rate change was similar between the two ATL groups across immediate and delayed visual memory. Approximately one-fourth of all patients displayed clinically meaningful losses on the visual memory task following surgery. Robust relationships between preoperative memory measures and nonstandardized change scores were attenuated or reversed using standardized memory outcome techniques. Our results demonstrated substantial group variability in memory outcome for ATL patients. These results extend previous research by incorporating known effects of practice and regression to the mean when addressing meaningful neuropsychological change following epilepsy surgery. Our findings also suggest that future neuropsychological outcome studies should take steps towards controlling for regression-to-the-mean before drawing predictive conclusions.
2011-01-01
Background Previous research addressed the development of a classification scheme for quality improvement systems in European hospitals. In this study we explore associations between the 'maturity' of the hospitals' quality improvement system and clinical outcomes. Methods The maturity classification scheme was developed based on survey results from 389 hospitals in eight European countries. We matched the hospitals from the Spanish sample (113 hospitals) with those hospitals participating in a nation-wide, voluntary hospital performance initiative. We then compared sample distributions and explored associations between the 'maturity' of the hospitals' quality improvement system and a range of composite outcomes measures, such as adjusted hospital-wide mortality, -readmission, -complication and -length of stay indices. Statistical analysis includes bivariate correlations for parametrically and non-parametrically distributed data, multiple robust regression models and bootstrapping techniques to obtain confidence-intervals for the correlation and regression estimates. Results Overall, 43 hospitals were included. Compared to the original sample of 113, this sample was characterized by a higher representation of university hospitals. Maturity of the quality improvement system was similar, although the matched sample showed less variability. Analysis of associations between the quality improvement system and hospital-wide outcomes suggests significant correlations for the indicator adjusted hospital complications, borderline significance for adjusted hospital readmissions and non-significance for the adjusted hospital mortality and length of stay indicators. These results are confirmed by the bootstrap estimates of the robust regression model after adjusting for hospital characteristics. Conclusions We assessed associations between hospitals' quality improvement systems and clinical outcomes. From this data it seems that having a more developed quality improvement system is associated with lower rates of adjusted hospital complications. A number of methodological and logistic hurdles remain to link hospital quality improvement systems to outcomes. Further research should aim at identifying the latent dimensions of quality improvement systems that predict quality and safety outcomes. Such research would add pertinent knowledge regarding the implementation of organizational strategies related with quality of care outcomes. PMID:22185479
Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard
2016-10-01
In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.
Statistical procedures for evaluating daily and monthly hydrologic model predictions
Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.
2004-01-01
The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.
Bayesian nonparametric regression with varying residual density
Pati, Debdeep; Dunson, David B.
2013-01-01
We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053
Lim, Changwon
2015-03-30
Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Zhai, Mengting; Chen, Yan; Li, Jing; Zhou, Jun
2017-12-01
The molecular electrongativity distance vector (MEDV-13) was used to describe the molecular structure of benzyl ether diamidine derivatives in this paper, Based on MEDV-13, The three-parameter (M 3, M 15, M 47) QSAR model of insecticidal activity (pIC 50) for 60 benzyl ether diamidine derivatives was constructed by leaps-and-bounds regression (LBR) . The traditional correlation coefficient (R) and the cross-validation correlation coefficient (R CV ) were 0.975 and 0.971, respectively. The robustness of the regression model was validated by Jackknife method, the correlation coefficient R were between 0.971 and 0.983. Meanwhile, the independent variables in the model were tested to be no autocorrelation. The regression results indicate that the model has good robust and predictive capabilities. The research would provide theoretical guidance for the development of new generation of anti African trypanosomiasis drugs with efficiency and low toxicity.
A subagging regression method for estimating the qualitative and quantitative state of groundwater
NASA Astrophysics Data System (ADS)
Jeong, J.; Park, E.; Choi, J.; Han, W. S.; Yun, S. T.
2016-12-01
A subagging regression (SBR) method for the analysis of groundwater data pertaining to the estimation of trend and the associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of the other methods and the uncertainties are reasonably estimated where the others have no uncertainty analysis option. To validate further, real quantitative and qualitative data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by SBR, whereas the GPR has limitations in representing the variability of non-Gaussian skewed data. From the implementations, it is determined that the SBR method has potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data.
Real-Time Tracking of Selective Auditory Attention From M/EEG: A Bayesian Filtering Approach
Miran, Sina; Akram, Sahar; Sheikhattar, Alireza; Simon, Jonathan Z.; Zhang, Tao; Babadi, Behtash
2018-01-01
Humans are able to identify and track a target speaker amid a cacophony of acoustic interference, an ability which is often referred to as the cocktail party phenomenon. Results from several decades of studying this phenomenon have culminated in recent years in various promising attempts to decode the attentional state of a listener in a competing-speaker environment from non-invasive neuroimaging recordings such as magnetoencephalography (MEG) and electroencephalography (EEG). To this end, most existing approaches compute correlation-based measures by either regressing the features of each speech stream to the M/EEG channels (the decoding approach) or vice versa (the encoding approach). To produce robust results, these procedures require multiple trials for training purposes. Also, their decoding accuracy drops significantly when operating at high temporal resolutions. Thus, they are not well-suited for emerging real-time applications such as smart hearing aid devices or brain-computer interface systems, where training data might be limited and high temporal resolutions are desired. In this paper, we close this gap by developing an algorithmic pipeline for real-time decoding of the attentional state. Our proposed framework consists of three main modules: (1) Real-time and robust estimation of encoding or decoding coefficients, achieved by sparse adaptive filtering, (2) Extracting reliable markers of the attentional state, and thereby generalizing the widely-used correlation-based measures thereof, and (3) Devising a near real-time state-space estimator that translates the noisy and variable attention markers to robust and statistically interpretable estimates of the attentional state with minimal delay. Our proposed algorithms integrate various techniques including forgetting factor-based adaptive filtering, ℓ1-regularization, forward-backward splitting algorithms, fixed-lag smoothing, and Expectation Maximization. We validate the performance of our proposed framework using comprehensive simulations as well as application to experimentally acquired M/EEG data. Our results reveal that the proposed real-time algorithms perform nearly as accurately as the existing state-of-the-art offline techniques, while providing a significant degree of adaptivity, statistical robustness, and computational savings. PMID:29765298
Real-Time Tracking of Selective Auditory Attention From M/EEG: A Bayesian Filtering Approach.
Miran, Sina; Akram, Sahar; Sheikhattar, Alireza; Simon, Jonathan Z; Zhang, Tao; Babadi, Behtash
2018-01-01
Humans are able to identify and track a target speaker amid a cacophony of acoustic interference, an ability which is often referred to as the cocktail party phenomenon. Results from several decades of studying this phenomenon have culminated in recent years in various promising attempts to decode the attentional state of a listener in a competing-speaker environment from non-invasive neuroimaging recordings such as magnetoencephalography (MEG) and electroencephalography (EEG). To this end, most existing approaches compute correlation-based measures by either regressing the features of each speech stream to the M/EEG channels (the decoding approach) or vice versa (the encoding approach). To produce robust results, these procedures require multiple trials for training purposes. Also, their decoding accuracy drops significantly when operating at high temporal resolutions. Thus, they are not well-suited for emerging real-time applications such as smart hearing aid devices or brain-computer interface systems, where training data might be limited and high temporal resolutions are desired. In this paper, we close this gap by developing an algorithmic pipeline for real-time decoding of the attentional state. Our proposed framework consists of three main modules: (1) Real-time and robust estimation of encoding or decoding coefficients, achieved by sparse adaptive filtering, (2) Extracting reliable markers of the attentional state, and thereby generalizing the widely-used correlation-based measures thereof, and (3) Devising a near real-time state-space estimator that translates the noisy and variable attention markers to robust and statistically interpretable estimates of the attentional state with minimal delay. Our proposed algorithms integrate various techniques including forgetting factor-based adaptive filtering, ℓ 1 -regularization, forward-backward splitting algorithms, fixed-lag smoothing, and Expectation Maximization. We validate the performance of our proposed framework using comprehensive simulations as well as application to experimentally acquired M/EEG data. Our results reveal that the proposed real-time algorithms perform nearly as accurately as the existing state-of-the-art offline techniques, while providing a significant degree of adaptivity, statistical robustness, and computational savings.
Mumford, Jeanette A.
2017-01-01
Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL’s Flame 1 and FSL’s outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL’s Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall’s Tau. Additionally, subject omission using the Cook’s Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed. PMID:28030782
Analytical robustness of quantitative NIR chemical imaging for Islamic paper characterization
NASA Astrophysics Data System (ADS)
Mahgoub, Hend; Gilchrist, John R.; Fearn, Thomas; Strlič, Matija
2017-07-01
Recently, spectral imaging techniques such as Multispectral (MSI) and Hyperspectral Imaging (HSI) have gained importance in the field of heritage conservation. This paper explores the analytical robustness of quantitative chemical imaging for Islamic paper characterization by focusing on the effect of different measurement and processing parameters, i.e. acquisition conditions and calibration on the accuracy of the collected spectral data. This will provide a better understanding of the technique that can provide a measure of change in collections through imaging. For the quantitative model, special calibration target was devised using 105 samples from a well-characterized reference Islamic paper collection. Two material properties were of interest: starch sizing and cellulose degree of polymerization (DP). Multivariate data analysis methods were used to develop discrimination and regression models which were used as an evaluation methodology for the metrology of quantitative NIR chemical imaging. Spectral data were collected using a pushbroom HSI scanner (Gilden Photonics Ltd) in the 1000-2500 nm range with a spectral resolution of 6.3 nm using a mirror scanning setup and halogen illumination. Data were acquired at different measurement conditions and acquisition parameters. Preliminary results showed the potential of the evaluation methodology to show that measurement parameters such as the use of different lenses and different scanning backgrounds may not have a great influence on the quantitative results. Moreover, the evaluation methodology allowed for the selection of the best pre-treatment method to be applied to the data.
Nagwani, Naresh Kumar; Deo, Shirish V
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Robust Feature Selection Technique using Rank Aggregation.
Sarkar, Chandrima; Cooley, Sarah; Srivastava, Jaideep
2014-01-01
Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.
Optimisation in the Design of Environmental Sensor Networks with Robustness Consideration
Budi, Setia; de Souza, Paulo; Timms, Greg; Malhotra, Vishv; Turner, Paul
2015-01-01
This work proposes the design of Environmental Sensor Networks (ESN) through balancing robustness and redundancy. An Evolutionary Algorithm (EA) is employed to find the optimal placement of sensor nodes in the Region of Interest (RoI). Data quality issues are introduced to simulate their impact on the performance of the ESN. Spatial Regression Test (SRT) is also utilised to promote robustness in data quality of the designed ESN. The proposed method provides high network representativeness (fit for purpose) with minimum sensor redundancy (cost), and ensures robustness by enabling the network to continue to achieve its objectives when some sensors fail. PMID:26633392
Efficient Robust Optimization of Metal Forming Processes using a Sequential Metamodel Based Strategy
NASA Astrophysics Data System (ADS)
Wiebenga, J. H.; Klaseboer, G.; van den Boogaard, A. H.
2011-08-01
The coupling of Finite Element (FE) simulations to mathematical optimization techniques has contributed significantly to product improvements and cost reductions in the metal forming industries. The next challenge is to bridge the gap between deterministic optimization techniques and the industrial need for robustness. This paper introduces a new and generally applicable structured methodology for modeling and solving robust optimization problems. Stochastic design variables or noise variables are taken into account explicitly in the optimization procedure. The metamodel-based strategy is combined with a sequential improvement algorithm to efficiently increase the accuracy of the objective function prediction. This is only done at regions of interest containing the optimal robust design. Application of the methodology to an industrial V-bending process resulted in valuable process insights and an improved robust process design. Moreover, a significant improvement of the robustness (>2σ) was obtained by minimizing the deteriorating effects of several noise variables. The robust optimization results demonstrate the general applicability of the robust optimization strategy and underline the importance of including uncertainty and robustness explicitly in the numerical optimization procedure.
NASA Astrophysics Data System (ADS)
Khan, Firdos; Pilz, Jürgen
2016-04-01
South Asia is under the severe impacts of changing climate and global warming. The last two decades showed that climate change or global warming is happening and the first decade of 21st century is considered as the warmest decade over Pakistan ever in history where temperature reached 53 0C in 2010. Consequently, the spatio-temporal distribution and intensity of precipitation is badly effected and causes floods, cyclones and hurricanes in the region which further have impacts on agriculture, water, health etc. To cope with the situation, it is important to conduct impact assessment studies and take adaptation and mitigation remedies. For impact assessment studies, we need climate variables at higher resolution. Downscaling techniques are used to produce climate variables at higher resolution; these techniques are broadly divided into two types, statistical downscaling and dynamical downscaling. The target location of this study is the monsoon dominated region of Pakistan. One reason for choosing this area is because the contribution of monsoon rains in this area is more than 80 % of the total rainfall. This study evaluates a statistical downscaling technique which can be then used for downscaling climatic variables. Two statistical techniques i.e. quantile regression and copula modeling are combined in order to produce realistic results for climate variables in the area under-study. To reduce the dimension of input data and deal with multicollinearity problems, empirical orthogonal functions will be used. Advantages of this new method are: (1) it is more robust to outliers as compared to ordinary least squares estimates and other estimation methods based on central tendency and dispersion measures; (2) it preserves the dependence among variables and among sites and (3) it can be used to combine different types of distributions. This is important in our case because we are dealing with climatic variables having different distributions over different meteorological stations. The proposed model will be validated by using the (National Centers for Environmental Prediction / National Center for Atmospheric Research) NCEP/NCAR predictors for the period of 1960-1990 and validated for 1990-2000. To investigate the efficiency of the proposed model, it will be compared with the multivariate multiple regression model and with dynamical downscaling climate models by using different climate indices that describe the frequency, intensity and duration of the variables of interest. KEY WORDS: Climate change, Copula, Monsoon, Quantile regression, Spatio-temporal distribution.
How Robust Is Linear Regression with Dummy Variables?
ERIC Educational Resources Information Center
Blankmeyer, Eric
2006-01-01
Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…
2012-09-01
Robust global image registration based on a hybrid algorithm combining Fourier and spatial domain techniques Peter N. Crabtree, Collin Seanor...00-00-2012 to 00-00-2012 4. TITLE AND SUBTITLE Robust global image registration based on a hybrid algorithm combining Fourier and spatial domain...demonstrate performance of a hybrid algorithm . These results are from analysis of a set of images of an ISO 12233 [12] resolution chart captured in the
Lee, Seokho; Shin, Hyejin; Lee, Sang Han
2016-12-01
Alzheimer's disease (AD) is usually diagnosed by clinicians through cognitive and functional performance test with a potential risk of misdiagnosis. Since the progression of AD is known to cause structural changes in the corpus callosum (CC), the CC thickness can be used as a functional covariate in AD classification problem for a diagnosis. However, misclassified class labels negatively impact the classification performance. Motivated by AD-CC association studies, we propose a logistic regression for functional data classification that is robust to misdiagnosis or label noise. Specifically, our logistic regression model is constructed by adopting individual intercepts to functional logistic regression model. This approach enables to indicate which observations are possibly mislabeled and also lead to a robust and efficient classifier. An effective algorithm using MM algorithm provides simple closed-form update formulas. We test our method using synthetic datasets to demonstrate its superiority over an existing method, and apply it to differentiating patients with AD from healthy normals based on CC from MRI. © 2016, The International Biometric Society.
Image Alignment for Multiple Camera High Dynamic Range Microscopy.
Eastwood, Brian S; Childs, Elisabeth C
2012-01-09
This paper investigates the problem of image alignment for multiple camera high dynamic range (HDR) imaging. HDR imaging combines information from images taken with different exposure settings. Combining information from multiple cameras requires an alignment process that is robust to the intensity differences in the images. HDR applications that use a limited number of component images require an alignment technique that is robust to large exposure differences. We evaluate the suitability for HDR alignment of three exposure-robust techniques. We conclude that image alignment based on matching feature descriptors extracted from radiant power images from calibrated cameras yields the most accurate and robust solution. We demonstrate the use of this alignment technique in a high dynamic range video microscope that enables live specimen imaging with a greater level of detail than can be captured with a single camera.
Image Alignment for Multiple Camera High Dynamic Range Microscopy
Eastwood, Brian S.; Childs, Elisabeth C.
2012-01-01
This paper investigates the problem of image alignment for multiple camera high dynamic range (HDR) imaging. HDR imaging combines information from images taken with different exposure settings. Combining information from multiple cameras requires an alignment process that is robust to the intensity differences in the images. HDR applications that use a limited number of component images require an alignment technique that is robust to large exposure differences. We evaluate the suitability for HDR alignment of three exposure-robust techniques. We conclude that image alignment based on matching feature descriptors extracted from radiant power images from calibrated cameras yields the most accurate and robust solution. We demonstrate the use of this alignment technique in a high dynamic range video microscope that enables live specimen imaging with a greater level of detail than can be captured with a single camera. PMID:22545028
Schwibbe, Anja; Kothe, Christian; Hampe, Wolfgang; Konradt, Udo
2016-10-01
Sixty years of research have not added up to a concordant evaluation of the influence of spatial and manual abilities on dental skill acquisition. We used Ackerman's theory of ability determinants of skill acquisition to explain the influence of spatial visualization and manual dexterity on the task performance of dental students in two consecutive preclinical technique courses. We measured spatial and manual abilities of applicants to Hamburg Dental School by means of a multiple choice test on Technical Aptitude and a wire-bending test, respectively. Preclinical dental technique tasks were categorized as consistent-simple and inconsistent-complex based on their contents. For analysis, we used robust regression to circumvent typical limitations in dental studies like small sample size and non-normal residual distributions. We found that manual, but not spatial ability exhibited a moderate influence on the performance in consistent-simple tasks during dental skill acquisition in preclinical dentistry. Both abilities revealed a moderate relation with the performance in inconsistent-complex tasks. These findings support the hypotheses which we had postulated on the basis of Ackerman's work. Therefore, spatial as well as manual ability are required for the acquisition of dental skills in preclinical technique courses. These results support the view that both abilities should be addressed in dental admission procedures in addition to cognitive measures.
Different techniques of multispectral data analysis for vegetation fraction retrieval
NASA Astrophysics Data System (ADS)
Kancheva, Rumiana; Georgiev, Georgi
2012-07-01
Vegetation monitoring is one of the most important applications of remote sensing technologies. In respect to farmlands, the assessment of crop condition constitutes the basis of growth, development, and yield processes monitoring. Plant condition is defined by a set of biometric variables, such as density, height, biomass amount, leaf area index, and etc. The canopy cover fraction is closely related to these variables, and is state-indicative of the growth process. At the same time it is a defining factor of the soil-vegetation system spectral signatures. That is why spectral mixtures decomposition is a primary objective in remotely sensed data processing and interpretation, specifically in agricultural applications. The actual usefulness of the applied methods depends on their prediction reliability. The goal of this paper is to present and compare different techniques for quantitative endmember extraction from soil-crop patterns reflectance. These techniques include: linear spectral unmixing, two-dimensional spectra analysis, spectral ratio analysis (vegetation indices), spectral derivative analysis (red edge position), colorimetric analysis (tristimulus values sum, chromaticity coordinates and dominant wavelength). The objective is to reveal their potential, accuracy and robustness for plant fraction estimation from multispectral data. Regression relationships have been established between crop canopy cover and various spectral estimators.
A robust empirical seasonal prediction of winter NAO and surface climate.
Wang, L; Ting, M; Kushner, P J
2017-03-21
A key determinant of winter weather and climate in Europe and North America is the North Atlantic Oscillation (NAO), the dominant mode of atmospheric variability in the Atlantic domain. Skilful seasonal forecasting of the surface climate in both Europe and North America is reflected largely in how accurately models can predict the NAO. Most dynamical models, however, have limited skill in seasonal forecasts of the winter NAO. A new empirical model is proposed for the seasonal forecast of the winter NAO that exhibits higher skill than current dynamical models. The empirical model provides robust and skilful prediction of the December-January-February (DJF) mean NAO index using a multiple linear regression (MLR) technique with autumn conditions of sea-ice concentration, stratospheric circulation, and sea-surface temperature. The predictability is, for the most part, derived from the relatively long persistence of sea ice in the autumn. The lower stratospheric circulation and sea-surface temperature appear to play more indirect roles through a series of feedbacks among systems driving NAO evolution. This MLR model also provides skilful seasonal outlooks of winter surface temperature and precipitation over many regions of Eurasia and eastern North America.
NASA Astrophysics Data System (ADS)
Li, Yi; Abdel-Monem, Mohamed; Gopalakrishnan, Rahul; Berecibar, Maitane; Nanini-Maury, Elise; Omar, Noshin; van den Bossche, Peter; Van Mierlo, Joeri
2018-01-01
This paper proposes an advanced state of health (SoH) estimation method for high energy NMC lithium-ion batteries based on the incremental capacity (IC) analysis. IC curves are used due to their ability of detect and quantify battery degradation mechanism. A simple and robust smoothing method is proposed based on Gaussian filter to reduce the noise on IC curves, the signatures associated with battery ageing can therefore be accurately identified. A linear regression relationship is found between the battery capacity with the positions of features of interest (FOIs) on IC curves. Results show that the developed SoH estimation function from one single battery cell is able to evaluate the SoH of other batteries cycled under different cycling depth with less than 2.5% maximum errors, which proves the robustness of the proposed method on SoH estimation. With this technique, partial charging voltage curves can be used for SoH estimation and the testing time can be therefore largely reduced. This method shows great potential to be applied in reality, as it only requires static charging curves and can be easily implemented in battery management system (BMS).
Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection.
Zhu, Xiaofeng; Li, Xuelong; Zhang, Shichao; Ju, Chunhua; Wu, Xindong
2017-06-01
In this paper, we propose a new unsupervised spectral feature selection model by embedding a graph regularizer into the framework of joint sparse regression for preserving the local structures of data. To do this, we first extract the bases of training data by previous dictionary learning methods and, then, map original data into the basis space to generate their new representations, by proposing a novel joint graph sparse coding (JGSC) model. In JGSC, we first formulate its objective function by simultaneously taking subspace learning and joint sparse regression into account, then, design a new optimization solution to solve the resulting objective function, and further prove the convergence of the proposed solution. Furthermore, we extend JGSC to a robust JGSC (RJGSC) via replacing the least square loss function with a robust loss function, for achieving the same goals and also avoiding the impact of outliers. Finally, experimental results on real data sets showed that both JGSC and RJGSC outperformed the state-of-the-art algorithms in terms of k -nearest neighbor classification performance.
Sehgal, Muhammad Shoaib B; Gondal, Iqbal; Dooley, Laurence S
2005-05-15
Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm. The CMVE software is available upon request from the authors.
NASA Astrophysics Data System (ADS)
Rogers, A. M.; Harmsen, S. C.; Herrmann, R. B.; Meremonte, M. E.
1987-04-01
As a first step in the assessment of the earthquake hazard in the southern Great Basin of Nevada-California, this study evaluates the attenuation of peak vertical ground motions using a number of different regression models applied to unfiltered and band-pass-filtered ground motion data. These data are concentrated in the distance range 10-250 km. The regression models include parameters to account for geometric spreading, anelastic attenuation with a power law frequency dependence, source size, and station site effects. We find that the data are most consistent with an essentially frequency-independent Q and a geometric spreading coefficient less than 1.0. Regressions are also performed on vertical component peak amplitudes reexpressed as pseudo-Wood-Anderson peak amplitude estimates (PWA), permitting comparison with earlier work that used Wood-Anderson (WA) data from California. Both of these results show that Q values in this region are high relative to California, having values in the range 700-900 over the frequency band 1-10 Hz. Comparison of ML magnitudes from stations BRK and PAS for earthquakes in the southern Great Basin shows that these two stations report magnitudes with differences that are distance dependent. This bias suggests that the Richter log A0 curve appropriate to California is too steep for earthquakes occurring in southern Nevada, a result implicitly supporting our finding that Q values are higher than those in California. The PWA attenuation functions derived from our data also indicate that local magnitudes reported by California observatories for earthquakes in this region may be overestimated by as much as 0.8 magnitude units in some cases. Both of these results will have an effect on the assessment of the earthquake hazard in this region. The robustness of our regression technique to extract the correct geometric spreading coefficient n and anelastic attenuation Q is tested by applying the technique to simulated data computed with given n and Q values. Using a stochastic modeling technique, we generate suites of seismograms for the distance range 10-200 km and for both WA and short-period vertical component seismometers. Regressions on the peak amplitudes from these records show that our regression model extracts values of n and Q approximately equal to the input values for either low-Q California attenuation or high-Q southern Nevada attenuation. Regressions on stochastically modeled WA and PWA amplitudes also provides a method of evaluating differences in magnitudes from WA and PWA amplitudes due to recording instrument response characteristics alone. These results indicate a difference between MLWA and MLPWA equal to 0.15 magnitude units, which we term the residual instrument correction. In contrast to the peak amplitude results, coda Q determinations using the single scatterer theory indicate that Qc values are dependent on source type and are proportional to ƒp, where p = 0.8 to 1.0. This result suggests that a difference exists between attenuation mechanisms for direct waves and backscattered waves in this region.
NASA Astrophysics Data System (ADS)
Chiron, L.; Oger, G.; de Leffe, M.; Le Touzé, D.
2018-02-01
While smoothed-particle hydrodynamics (SPH) simulations are usually performed using uniform particle distributions, local particle refinement techniques have been developed to concentrate fine spatial resolutions in identified areas of interest. Although the formalism of this method is relatively easy to implement, its robustness at coarse/fine interfaces can be problematic. Analysis performed in [16] shows that the radius of refined particles should be greater than half the radius of unrefined particles to ensure robustness. In this article, the basics of an Adaptive Particle Refinement (APR) technique, inspired by AMR in mesh-based methods, are presented. This approach ensures robustness with alleviated constraints. Simulations applying the new formalism proposed achieve accuracy comparable to fully refined spatial resolutions, together with robustness, low CPU times and maintained parallel efficiency.
Comparing Four Instructional Techniques for Promoting Robust Knowledge
ERIC Educational Resources Information Center
Richey, J. Elizabeth; Nokes-Malach, Timothy J.
2015-01-01
Robust knowledge serves as a common instructional target in academic settings. Past research identifying characteristics of experts' knowledge across many domains can help clarify the features of robust knowledge as well as ways of assessing it. We review the expertise literature and identify three key features of robust knowledge (deep,…
NASA Technical Reports Server (NTRS)
Rummler, D. R.
1976-01-01
The results are presented of investigations to apply regression techniques to the development of methodology for creep-rupture data analysis. Regression analysis techniques are applied to the explicit description of the creep behavior of materials for space shuttle thermal protection systems. A regression analysis technique is compared with five parametric methods for analyzing three simulated and twenty real data sets, and a computer program for the evaluation of creep-rupture data is presented.
Tanner-Smith, Emily E; Tipton, Elizabeth
2014-03-01
Methodologists have recently proposed robust variance estimation as one way to handle dependent effect sizes in meta-analysis. Software macros for robust variance estimation in meta-analysis are currently available for Stata (StataCorp LP, College Station, TX, USA) and spss (IBM, Armonk, NY, USA), yet there is little guidance for authors regarding the practical application and implementation of those macros. This paper provides a brief tutorial on the implementation of the Stata and spss macros and discusses practical issues meta-analysts should consider when estimating meta-regression models with robust variance estimates. Two example databases are used in the tutorial to illustrate the use of meta-analysis with robust variance estimates. Copyright © 2013 John Wiley & Sons, Ltd.
Multi-Target Regression via Robust Low-Rank Learning.
Zhen, Xiantong; Yu, Mengyang; He, Xiaofei; Li, Shuo
2018-02-01
Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.
The Outlier Detection for Ordinal Data Using Scalling Technique of Regression Coefficients
NASA Astrophysics Data System (ADS)
Adnan, Arisman; Sugiarto, Sigit
2017-06-01
The aims of this study is to detect the outliers by using coefficients of Ordinal Logistic Regression (OLR) for the case of k category responses where the score from 1 (the best) to 8 (the worst). We detect them by using the sum of moduli of the ordinal regression coefficients calculated by jackknife technique. This technique is improved by scalling the regression coefficients to their means. R language has been used on a set of ordinal data from reference distribution. Furthermore, we compare this approach by using studentised residual plots of jackknife technique for ANOVA (Analysis of Variance) and OLR. This study shows that the jackknifing technique along with the proper scaling may lead us to reveal outliers in ordinal regression reasonably well.
Neural robust stabilization via event-triggering mechanism and adaptive learning technique.
Wang, Ding; Liu, Derong
2018-06-01
The robust control synthesis of continuous-time nonlinear systems with uncertain term is investigated via event-triggering mechanism and adaptive critic learning technique. We mainly focus on combining the event-triggering mechanism with adaptive critic designs, so as to solve the nonlinear robust control problem. This can not only make better use of computation and communication resources, but also conduct controller design from the view of intelligent optimization. Through theoretical analysis, the nonlinear robust stabilization can be achieved by obtaining an event-triggered optimal control law of the nominal system with a newly defined cost function and a certain triggering condition. The adaptive critic technique is employed to facilitate the event-triggered control design, where a neural network is introduced as an approximator of the learning phase. The performance of the event-triggered robust control scheme is validated via simulation studies and comparisons. The present method extends the application domain of both event-triggered control and adaptive critic control to nonlinear systems possessing dynamical uncertainties. Copyright © 2018 Elsevier Ltd. All rights reserved.
Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E
2017-06-14
Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.
Ha, Hojin; Lantz, Jonas; Ziegler, Magnus; Casas, Belen; Karlsson, Matts; Dyverfeldt, Petter; Ebbers, Tino
2017-01-01
The pressure drop across a stenotic vessel is an important parameter in medicine, providing a commonly used and intuitive metric for evaluating the severity of the stenosis. However, non-invasive estimation of the pressure drop under pathological conditions has remained difficult. This study demonstrates a novel method to quantify the irreversible pressure drop across a stenosis using 4D Flow MRI by calculating the total turbulence production of the flow. Simulation MRI acquisitions showed that the energy lost to turbulence production can be accurately quantified with 4D Flow MRI within a range of practical spatial resolutions (1–3 mm; regression slope = 0.91, R2 = 0.96). The quantification of the turbulence production was not substantially influenced by the signal-to-noise ratio (SNR), resulting in less than 2% mean bias at SNR > 10. Pressure drop estimation based on turbulence production robustly predicted the irreversible pressure drop, regardless of the stenosis severity and post-stenosis dilatation (regression slope = 0.956, R2 = 0.96). In vitro validation of the technique in a 75% stenosis channel confirmed that pressure drop prediction based on the turbulence production agreed with the measured pressure drop (regression slope = 1.15, R2 = 0.999, Bland-Altman agreement = 0.75 ± 3.93 mmHg). PMID:28425452
NASA Astrophysics Data System (ADS)
Luna, A. S.; Paredes, M. L. L.; de Oliveira, G. C. G.; Corrêa, S. M.
2014-12-01
It is well known that air quality is a complex function of emissions, meteorology and topography, and statistical tools provide a sound framework for relating these variables. The observed data were contents of nitrogen dioxide (NO2), nitrogen monoxide (NO), nitrogen oxides (NOx), carbon monoxide (CO), ozone (O3), scalar wind speed (SWS), global solar radiation (GSR), temperature (TEM), moisture content in the air (HUM), collected by a mobile automatic monitoring station at Rio de Janeiro City in two places of the metropolitan area during 2011 and 2012. The aims of this study were: (1) to analyze the behavior of the variables, using the method of PCA for exploratory data analysis; (2) to propose forecasts of O3 levels from primary pollutants and meteorological factors, using nonlinear regression methods like ANN and SVM, from primary pollutants and meteorological factors. The PCA technique showed that for first dataset, variables NO, NOx and SWS have a greater impact on the concentration of O3 and the other data set had the TEM and GSR as the most influential variables. The obtained results from the nonlinear regression techniques ANN and SVM were remarkably closely and acceptable to one dataset presenting coefficient of determination for validation respectively 0.9122 and 0.9152, and root mean square error of 7.66 and 7.85, respectively. For these datasets, the PCA, SVM and ANN had demonstrated their robustness as useful tools for evaluation, and forecast scenarios for air quality.
NASA Astrophysics Data System (ADS)
Creaco, E.; Berardi, L.; Sun, Siao; Giustolisi, O.; Savic, D.
2016-04-01
The growing availability of field data, from information and communication technologies (ICTs) in "smart" urban infrastructures, allows data modeling to understand complex phenomena and to support management decisions. Among the analyzed phenomena, those related to storm water quality modeling have recently been gaining interest in the scientific literature. Nonetheless, the large amount of available data poses the problem of selecting relevant variables to describe a phenomenon and enable robust data modeling. This paper presents a procedure for the selection of relevant input variables using the multiobjective evolutionary polynomial regression (EPR-MOGA) paradigm. The procedure is based on scrutinizing the explanatory variables that appear inside the set of EPR-MOGA symbolic model expressions of increasing complexity and goodness of fit to target output. The strategy also enables the selection to be validated by engineering judgement. In such context, the multiple case study extension of EPR-MOGA, called MCS-EPR-MOGA, is adopted. The application of the proposed procedure to modeling storm water quality parameters in two French catchments shows that it was able to significantly reduce the number of explanatory variables for successive analyses. Finally, the EPR-MOGA models obtained after the input selection are compared with those obtained by using the same technique without benefitting from input selection and with those obtained in previous works where other data-modeling techniques were used on the same data. The comparison highlights the effectiveness of both EPR-MOGA and the input selection procedure.
Yu, Xiaojin; Liu, Pei; Min, Jie; Chen, Qiguang
2009-01-01
To explore the application of regression on order statistics (ROS) in estimating nondetects for food exposure assessment. Regression on order statistics was adopted in analysis of cadmium residual data set from global food contaminant monitoring, the mean residual was estimated basing SAS programming and compared with the results from substitution methods. The results show that ROS method performs better obviously than substitution methods for being robust and convenient for posterior analysis. Regression on order statistics is worth to adopt,but more efforts should be make for details of application of this method.
Solution to the Problem of Calibration of Low-Cost Air Quality Measurement Sensors in Networks.
Miskell, Georgia; Salmond, Jennifer A; Williams, David E
2018-04-27
We provide a simple, remote, continuous calibration technique suitable for application in a hierarchical network featuring a few well-maintained, high-quality instruments ("proxies") and a larger number of low-cost devices. The ideas are grounded in a clear definition of the purpose of a low-cost network, defined here as providing reliable information on air quality at small spatiotemporal scales. The technique assumes linearity of the sensor signal. It derives running slope and offset estimates by matching mean and standard deviations of the sensor data to values derived from proxies over the same time. The idea is extremely simple: choose an appropriate proxy and an averaging-time that is sufficiently long to remove the influence of short-term fluctuations but sufficiently short that it preserves the regular diurnal variations. The use of running statistical measures rather than cross-correlation of sites means that the method is robust against periods of missing data. Ideas are first developed using simulated data and then demonstrated using field data, at hourly and 1 min time-scales, from a real network of low-cost semiconductor-based sensors. Despite the almost naïve simplicity of the method, it was robust for both drift detection and calibration correction applications. We discuss the use of generally available geographic and environmental data as well as microscale land-use regression as means to enhance the proxy estimates and to generalize the ideas to other pollutants with high spatial variability, such as nitrogen dioxide and particulates. These improvements can also be used to minimize the required number of proxy sites.
GenInfoGuard--a robust and distortion-free watermarking technique for genetic data.
Iftikhar, Saman; Khan, Sharifullah; Anwar, Zahid; Kamran, Muhammad
2015-01-01
Genetic data, in digital format, is used in different biological phenomena such as DNA translation, mRNA transcription and protein synthesis. The accuracy of these biological phenomena depend on genetic codes and all subsequent processes. To computerize the biological procedures, different domain experts are provided with the authorized access of the genetic codes; as a consequence, the ownership protection of such data is inevitable. For this purpose, watermarks serve as the proof of ownership of data. While protecting data, embedded hidden messages (watermarks) influence the genetic data; therefore, the accurate execution of the relevant processes and the overall result becomes questionable. Most of the DNA based watermarking techniques modify the genetic data and are therefore vulnerable to information loss. Distortion-free techniques make sure that no modifications occur during watermarking; however, they are fragile to malicious attacks and therefore cannot be used for ownership protection (particularly, in presence of a threat model). Therefore, there is a need for a technique that must be robust and should also prevent unwanted modifications. In this spirit, a watermarking technique with aforementioned characteristics has been proposed in this paper. The proposed technique makes sure that: (i) the ownership rights are protected by means of a robust watermark; and (ii) the integrity of genetic data is preserved. The proposed technique-GenInfoGuard-ensures its robustness through the "watermark encoding" in permuted values, and exhibits high decoding accuracy against various malicious attacks.
ERIC Educational Resources Information Center
Wilcox, Rand R.; Serang, Sarfaraz
2017-01-01
The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and "p" values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have…
Application of stepwise multiple regression techniques to inversion of Nimbus 'IRIS' observations.
NASA Technical Reports Server (NTRS)
Ohring, G.
1972-01-01
Exploratory studies with Nimbus-3 infrared interferometer-spectrometer (IRIS) data indicate that, in addition to temperature, such meteorological parameters as geopotential heights of pressure surfaces, tropopause pressure, and tropopause temperature can be inferred from the observed spectra with the use of simple regression equations. The technique of screening the IRIS spectral data by means of stepwise regression to obtain the best radiation predictors of meteorological parameters is validated. The simplicity of application of the technique and the simplicity of the derived linear regression equations - which contain only a few terms - suggest usefulness for this approach. Based upon the results obtained, suggestions are made for further development and exploitation of the stepwise regression analysis technique.
Card, Alan J; Simsekler, Mecit Can Emre; Clark, Michael; Ward, James R; Clarkson, P John
2014-01-01
Risk assessment is widely used to improve patient safety, but healthcare workers are not trained to design robust solutions to the risks they uncover. This leads to an overreliance on the weakest category of risk control recommendations: administrative controls. Increasing the proportion of non-administrative risk control options (NARCOs) generated would enable (though not ensure) the adoption of more robust solutions. Experimentally assess a method for generating stronger risk controls: The Generating Options for Active Risk Control (GO-ARC) Technique. Participants generated risk control options in response to two patient safety scenarios. Scenario 1 (baseline): All participants used current practice (unstructured brainstorming). Scenario 2: Control group used current practice; intervention group used the GO-ARC Technique. To control for individual differences between participants, analysis focused on the change in the proportion of NARCOs for each group. Proportion of NARCOs decreased from 0.18 at baseline to 0.12. Intervention group: Proportion increased from 0.10 at baseline to 0.29 using the GO-ARC Technique. Results were statistically significant. There was no decrease in the number of administrative controls generated by the intervention group. The Generating Options for Active Risk Control (GO-ARC) Technique appears to lead to more robust risk control options.
Kauhanen, Heikki; Komi, Paavo V; Häkkinen, Keijo
2002-02-01
The problems in comparing the performances of Olympic weightlifters arise from the fact that the relationship between body weight and weightlifting results is not linear. In the present study, this relationship was examined by using a nonparametric curve fitting technique of robust locally weighted regression (LOWESS) on relatively large data sets of the weightlifting results made in top international competitions. Power function formulas were derived from the fitted LOWESS values to represent the relationship between the 2 variables in a way that directly compares the snatch, clean-and-jerk, and total weightlifting results of a given athlete with those of the world-class weightlifters (golden standards). A residual analysis of several other parametric models derived from the initial results showed that they all experience inconsistencies, yielding either underestimation or overestimation of certain body weights. In addition, the existing handicapping formulas commonly used in normalizing the performances of Olympic weightlifters did not yield satisfactory results when applied to the present data. It was concluded that the devised formulas may provide objective means for the evaluation of the performances of male weightlifters, regardless of their body weights, ages, or performance levels.
T-wave end detection using neural networks and Support Vector Machines.
Suárez-León, Alexander Alexeis; Varon, Carolina; Willems, Rik; Van Huffel, Sabine; Vázquez-Seisdedos, Carlos Román
2018-05-01
In this paper we propose a new approach for detecting the end of the T-wave in the electrocardiogram (ECG) using Neural Networks and Support Vector Machines. Both, Multilayer Perceptron (MLP) neural networks and Fixed-Size Least-Squares Support Vector Machines (FS-LSSVM) were used as regression algorithms to determine the end of the T-wave. Different strategies for selecting the training set such as random selection, k-means, robust clustering and maximum quadratic (Rényi) entropy were evaluated. Individual parameters were tuned for each method during training and the results are given for the evaluation set. A comparison between MLP and FS-LSSVM approaches was performed. Finally, a fair comparison of the FS-LSSVM method with other state-of-the-art algorithms for detecting the end of the T-wave was included. The experimental results show that FS-LSSVM approaches are more suitable as regression algorithms than MLP neural networks. Despite the small training sets used, the FS-LSSVM methods outperformed the state-of-the-art techniques. FS-LSSVM can be successfully used as a T-wave end detection algorithm in ECG even with small training set sizes. Copyright © 2018 Elsevier Ltd. All rights reserved.
Can Selforganizing Maps Accurately Predict Photometric Redshifts?
NASA Technical Reports Server (NTRS)
Way, Michael J.; Klose, Christian
2012-01-01
We present an unsupervised machine-learning approach that can be employed for estimating photometric redshifts. The proposed method is based on a vector quantization called the self-organizing-map (SOM) approach. A variety of photometrically derived input values were utilized from the Sloan Digital Sky Survey's main galaxy sample, luminous red galaxy, and quasar samples, along with the PHAT0 data set from the Photo-z Accuracy Testing project. Regression results obtained with this new approach were evaluated in terms of root-mean-square error (RMSE) to estimate the accuracy of the photometric redshift estimates. The results demonstrate competitive RMSE and outlier percentages when compared with several other popular approaches, such as artificial neural networks and Gaussian process regression. SOM RMSE results (using delta(z) = z(sub phot) - z(sub spec)) are 0.023 for the main galaxy sample, 0.027 for the luminous red galaxy sample, 0.418 for quasars, and 0.022 for PHAT0 synthetic data. The results demonstrate that there are nonunique solutions for estimating SOM RMSEs. Further research is needed in order to find more robust estimation techniques using SOMs, but the results herein are a positive indication of their capabilities when compared with other well-known methods
Cox regression analysis with missing covariates via nonparametric multiple imputation.
Hsu, Chiu-Hsieh; Yu, Mandi
2018-01-01
We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
Robust model predictive control for multi-step short range spacecraft rendezvous
NASA Astrophysics Data System (ADS)
Zhu, Shuyi; Sun, Ran; Wang, Jiaolong; Wang, Jihe; Shao, Xiaowei
2018-07-01
This work presents a robust model predictive control (MPC) approach for the multi-step short range spacecraft rendezvous problem. During the specific short range phase concerned, the chaser is supposed to be initially outside the line-of-sight (LOS) cone. Therefore, the rendezvous process naturally includes two steps: the first step is to transfer the chaser into the LOS cone and the second step is to transfer the chaser into the aimed region with its motion confined within the LOS cone. A novel MPC framework named after Mixed MPC (M-MPC) is proposed, which is the combination of the Variable-Horizon MPC (VH-MPC) framework and the Fixed-Instant MPC (FI-MPC) framework. The M-MPC framework enables the optimization for the two steps to be implemented jointly rather than to be separated factitiously, and its computation workload is acceptable for the usually low-power processors onboard spacecraft. Then considering that disturbances including modeling error, sensor noise and thrust uncertainty may induce undesired constraint violations, a robust technique is developed and it is attached to the above M-MPC framework to form a robust M-MPC approach. The robust technique is based on the chance-constrained idea, which ensures that constraints can be satisfied with a prescribed probability. It improves the robust technique proposed by Gavilan et al., because it eliminates the unnecessary conservativeness by explicitly incorporating known statistical properties of the navigation uncertainty. The efficacy of the robust M-MPC approach is shown in a simulation study.
Workers' compensation costs among construction workers: a robust regression analysis.
Friedman, Lee S; Forst, Linda S
2009-11-01
Workers' compensation data are an important source for evaluating costs associated with construction injuries. We describe the characteristics of injured construction workers filing claims in Illinois between 2000 and 2005 and the factors associated with compensation costs using a robust regression model. In the final multivariable model, the cumulative percent temporary and permanent disability-measures of severity of injury-explained 38.7% of the variance of cost. Attorney costs explained only 0.3% of the variance of the dependent variable. The model used in this study clearly indicated that percent disability was the most important determinant of cost, although the method and uniformity of percent impairment allocation could be better elucidated. There is a need to integrate analytical methods that are suitable for skewed data when analyzing claim costs.
Using dynamic mode decomposition for real-time background/foreground separation in video
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kutz, Jose Nathan; Grosek, Jacob; Brunton, Steven
The technique of dynamic mode decomposition (DMD) is disclosed herein for the purpose of robustly separating video frames into background (low-rank) and foreground (sparse) components in real-time. Foreground/background separation is achieved at the computational cost of just one singular value decomposition (SVD) and one linear equation solve, thus producing results orders of magnitude faster than robust principal component analysis (RPCA). Additional techniques, including techniques for analyzing the video for multi-resolution time-scale components, and techniques for reusing computations to allow processing of streaming video in real time, are also described herein.
A robust nonparametric framework for reconstruction of stochastic differential equation models
NASA Astrophysics Data System (ADS)
Rajabzadeh, Yalda; Rezaie, Amir Hossein; Amindavar, Hamidreza
2016-05-01
In this paper, we employ a nonparametric framework to robustly estimate the functional forms of drift and diffusion terms from discrete stationary time series. The proposed method significantly improves the accuracy of the parameter estimation. In this framework, drift and diffusion coefficients are modeled through orthogonal Legendre polynomials. We employ the least squares regression approach along with the Euler-Maruyama approximation method to learn coefficients of stochastic model. Next, a numerical discrete construction of mean squared prediction error (MSPE) is established to calculate the order of Legendre polynomials in drift and diffusion terms. We show numerically that the new method is robust against the variation in sample size and sampling rate. The performance of our method in comparison with the kernel-based regression (KBR) method is demonstrated through simulation and real data. In case of real dataset, we test our method for discriminating healthy electroencephalogram (EEG) signals from epilepsy ones. We also demonstrate the efficiency of the method through prediction in the financial data. In both simulation and real data, our algorithm outperforms the KBR method.
Image interpolation via regularized local linear regression.
Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang
2011-12-01
The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE
ERIC Educational Resources Information Center
Tipton, Elizabeth; Pustejovsky, James E.
2015-01-01
Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…
Development of Optimal Stressor Scenarios for New Operational Energy Systems
2017-12-01
Analyzing the previous model using a design of experiments (DOE) and regression analysis provides critical information about the associated operational...from experimentation. The resulting system requirements can be used to revisit the design requirements and develop a more robust system. This process...stressor scenarios for acceptance testing. Analyzing the previous model using a design of experiments (DOE) and regression analysis provides critical
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Use of Thematic Mapper for water quality assessment
NASA Technical Reports Server (NTRS)
Horn, E. M.; Morrissey, L. A.
1984-01-01
The evaluation of simulated TM data obtained on an ER-2 aircraft at twenty-five predesignated sample sites for mapping water quality factors such as conductivity, pH, suspended solids, turbidity, temperature, and depth, is discussed. Using a multiple regression for the seven TM bands, an equation is developed for the suspended solids. TM bands 1, 2, 3, 4, and 6 are used with logarithm conductivity in a multiple regression. The assessment of regression equations for a high coefficient of determination (R-squared) and statistical significance is considered. Confidence intervals about the mean regression point are calculated in order to assess the robustness of the regressions used for mapping conductivity, turbidity, and suspended solids, and by regressing random subsamples of sites and comparing the resultant range of R-squared, cross validation is conducted.
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
NASA Astrophysics Data System (ADS)
Bergeron, Charles; Labelle, Hubert; Ronsky, Janet; Zernicke, Ronald
2005-04-01
Spinal curvature progression in scoliosis patients is monitored from X-rays, and this serial exposure to harmful radiation increases the incidence of developing cancer. With the aim of reducing the invasiveness of follow-up, this study seeks to relate the three-dimensional external surface to the internal geometry, having assumed that that the physiological links between these are sufficiently regular across patients. A database was used of 194 quasi-simultaneous acquisitions of two X-rays and a 3D laser scan of the entire trunk. Data was processed to sets of datapoints representing the trunk surface and spinal curve. Functional data analyses were performed using generalized Fourier series using a Haar basis and functional minimum noise fractions. The resulting coefficients became inputs and outputs, respectively, to an array of support vector regression (SVR) machines. SVR parameters were set based on theoretical results, and cross-validation increased confidence in the system's performance. Predicted lateral and frontal views of the spinal curve from the back surface demonstrated average L2-errors of 6.13 and 4.38 millimetres, respectively, across the test set; these compared favourably with measurement error in data. This constitutes a first robust prediction of the 3D spinal curve from external data using learning techniques.
Robust estimation for partially linear models with large-dimensional covariates
Zhu, LiPing; Li, RunZe; Cui, HengJian
2014-01-01
We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087
Robust estimation for partially linear models with large-dimensional covariates.
Zhu, LiPing; Li, RunZe; Cui, HengJian
2013-10-01
We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.
NASA Astrophysics Data System (ADS)
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, H; Liang, X; Kalbasi, A
2014-06-01
Purpose: Advanced radiotherapy (RT) techniques such as proton pencil beam scanning (PBS) and photon-based volumetric modulated arc therapy (VMAT) have dosimetric advantages in the treatment of head and neck malignancies. However, anatomic or alignment changes during treatment may limit robustness of PBS and VMAT plans. We assess the feasibility of automated deformable registration tools for robustness evaluation in adaptive PBS and VMAT RT of oropharyngeal cancer (OPC). Methods: We treated 10 patients with bilateral OPC with advanced RT techniques and obtained verification CT scans with physician-reviewed target and OAR contours. We generated 3 advanced RT plans for each patient: protonmore » PBS plan using 2 posterior oblique fields (2F), proton PBS plan using an additional third low-anterior field (3F), and a photon VMAT plan using 2 arcs (Arc). For each of the planning techniques, we forward calculated initial (Ini) plans on the verification scans to create verification (V) plans. We extracted DVH indicators based on physician-generated contours for 2 target and 14 OAR structures to investigate the feasibility of two automated tools (contour propagation (CP) and dose deformation (DD)) as surrogates for routine clinical plan robustness evaluation. For each verification scan, we compared DVH indicators of V, CP and DD plans in a head-to-head fashion using Student's t-test. Results: We performed 39 verification scans; each patient underwent 3 to 6 verification scan. We found no differences in doses to target or OAR structures between V and CP, V and DD, and CP and DD plans across all patients (p > 0.05). Conclusions: Automated robustness evaluation tools, CP and DD, accurately predicted dose distributions of verification (V) plans using physician-generated contours. These tools may be further developed as a potential robustness screening tool in the workflow for adaptive treatment of OPC using advanced RT techniques, reducing the need for physician-generated contours.« less
NASA Astrophysics Data System (ADS)
Brown, M. G. L.; He, T.; Liang, S.
2016-12-01
Satellite-derived estimates of incident photosynthetically active radiation (PAR) can be used to monitor global change, are required by most terrestrial ecosystem models, and can be used to estimate primary production according to the theory of light use efficiency. Compared with parametric approaches, non-parametric techniques that include an artificial neural network (ANN), support vector machine regression (SVM), an artificial bee colony (ABC), and a look-up table (LUT) do not require many ancillary data as inputs for the estimation of PAR from satellite data. In this study, a selection of machine learning methods to estimate PAR from MODIS top of atmosphere (TOA) radiances are compared to a LUT approach to determine which techniques might best handle the nonlinear relationship between TOA radiance and incident PAR. Evaluation of these methods (ANN, SVM, and LUT) is performed with ground measurements at seven SURFRAD sites. Due to the design of the ANN, it can handle the nonlinear relationship between TOA radiance and PAR better than linearly interpolating between the values in the LUT; however, training the ANN has to be carried out on an angular-bin basis, which results in a LUT of ANNs. The SVM model may be better for incorporating multiple viewing angles than the ANN; however, both techniques require a large amount of training data, which may introduce a regional bias based on where the most training and validation data are available. Based on the literature, the ABC is a promising alternative to an ANN, SVM regression and a LUT, but further development for this application is required before concrete conclusions can be drawn. For now, the LUT method outperforms the machine-learning techniques, but future work should be directed at developing and testing the ABC method. A simple, robust method to estimate direct and diffuse incident PAR, with minimal inputs and a priori knowledge, would be very useful for monitoring global change of primary production, particularly of pastures and rangeland, which have implications for livestock and food security. Future work will delve deeper into the utility of satellite-derived PAR estimation for monitoring primary production in pasture and rangelands.
Regression Verification Using Impact Summaries
NASA Technical Reports Server (NTRS)
Backes, John; Person, Suzette J.; Rungta, Neha; Thachuk, Oksana
2013-01-01
Regression verification techniques are used to prove equivalence of syntactically similar programs. Checking equivalence of large programs, however, can be computationally expensive. Existing regression verification techniques rely on abstraction and decomposition techniques to reduce the computational effort of checking equivalence of the entire program. These techniques are sound but not complete. In this work, we propose a novel approach to improve scalability of regression verification by classifying the program behaviors generated during symbolic execution as either impacted or unimpacted. Our technique uses a combination of static analysis and symbolic execution to generate summaries of impacted program behaviors. The impact summaries are then checked for equivalence using an o-the-shelf decision procedure. We prove that our approach is both sound and complete for sequential programs, with respect to the depth bound of symbolic execution. Our evaluation on a set of sequential C artifacts shows that reducing the size of the summaries can help reduce the cost of software equivalence checking. Various reduction, abstraction, and compositional techniques have been developed to help scale software verification techniques to industrial-sized systems. Although such techniques have greatly increased the size and complexity of systems that can be checked, analysis of large software systems remains costly. Regression analysis techniques, e.g., regression testing [16], regression model checking [22], and regression verification [19], restrict the scope of the analysis by leveraging the differences between program versions. These techniques are based on the idea that if code is checked early in development, then subsequent versions can be checked against a prior (checked) version, leveraging the results of the previous analysis to reduce analysis cost of the current version. Regression verification addresses the problem of proving equivalence of closely related program versions [19]. These techniques compare two programs with a large degree of syntactic similarity to prove that portions of one program version are equivalent to the other. Regression verification can be used for guaranteeing backward compatibility, and for showing behavioral equivalence in programs with syntactic differences, e.g., when a program is refactored to improve its performance, maintainability, or readability. Existing regression verification techniques leverage similarities between program versions by using abstraction and decomposition techniques to improve scalability of the analysis [10, 12, 19]. The abstractions and decomposition in the these techniques, e.g., summaries of unchanged code [12] or semantically equivalent methods [19], compute an over-approximation of the program behaviors. The equivalence checking results of these techniques are sound but not complete-they may characterize programs as not functionally equivalent when, in fact, they are equivalent. In this work we describe a novel approach that leverages the impact of the differences between two programs for scaling regression verification. We partition program behaviors of each version into (a) behaviors impacted by the changes and (b) behaviors not impacted (unimpacted) by the changes. Only the impacted program behaviors are used during equivalence checking. We then prove that checking equivalence of the impacted program behaviors is equivalent to checking equivalence of all program behaviors for a given depth bound. In this work we use symbolic execution to generate the program behaviors and leverage control- and data-dependence information to facilitate the partitioning of program behaviors. The impacted program behaviors are termed as impact summaries. The dependence analyses that facilitate the generation of the impact summaries, we believe, could be used in conjunction with other abstraction and decomposition based approaches, [10, 12], as a complementary reduction technique. An evaluation of our regression verification technique shows that our approach is capable of leveraging similarities between program versions to reduce the size of the queries and the time required to check for logical equivalence. The main contributions of this work are: - A regression verification technique to generate impact summaries that can be checked for functional equivalence using an off-the-shelf decision procedure. - A proof that our approach is sound and complete with respect to the depth bound of symbolic execution. - An implementation of our technique using the LLVMcompiler infrastructure, the klee Symbolic Virtual Machine [4], and a variety of Satisfiability Modulo Theory (SMT) solvers, e.g., STP [7] and Z3 [6]. - An empirical evaluation on a set of C artifacts which shows that the use of impact summaries can reduce the cost of regression verification.
Synthesis Methods for Robust Passification and Control
NASA Technical Reports Server (NTRS)
Kelkar, Atul G.; Joshi, Suresh M. (Technical Monitor)
2000-01-01
The research effort under this cooperative agreement has been essentially the continuation of the work from previous grants. The ongoing work has primarily focused on developing passivity-based control techniques for Linear Time-Invariant (LTI) systems. During this period, there has been a significant progress made in the area of passivity-based control of LTI systems and some preliminary results have also been obtained for nonlinear systems, as well. The prior work has addressed optimal control design for inherently passive as well as non- passive linear systems. For exploiting the robustness characteristics of passivity-based controllers the passification methodology was developed for LTI systems that are not inherently passive. Various methods of passification were first proposed in and further developed. The robustness of passification was addressed for multi-input multi-output (MIMO) systems for certain classes of uncertainties using frequency-domain methods. For MIMO systems, a state-space approach using Linear Matrix Inequality (LMI)-based formulation was presented, for passification of non-passive LTI systems. An LMI-based robust passification technique was presented for systems with redundant actuators and sensors. The redundancy in actuators and sensors was used effectively for robust passification using the LMI formulation. The passification was designed to be robust to an interval-type uncertainties in system parameters. The passification techniques were used to design a robust controller for Benchmark Active Control Technology wing under parametric uncertainties. The results on passive nonlinear systems, however, are very limited to date. Our recent work in this area was presented, wherein some stability results were obtained for passive nonlinear systems that are affine in control.
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
Use of the forced-oscillation technique to estimate spirometry values.
Yamamoto, Shoichiro; Miyoshi, Seigo; Katayama, Hitoshi; Okazaki, Mikio; Shigematsu, Hisayuki; Sano, Yoshifumi; Matsubara, Minoru; Hamaguchi, Naohiko; Okura, Takafumi; Higaki, Jitsuo
2017-01-01
Spirometry is sometimes difficult to perform in elderly patients and in those with severe respiratory distress. The forced-oscillation technique (FOT) is a simple and noninvasive method of measuring respiratory impedance. The aim of this study was to determine if FOT data reflect spirometric indices. Patients underwent both FOT and spirometry procedures prior to inclusion in development (n=1,089) and validation (n=552) studies. Multivariate linear regression analysis was performed to identify FOT parameters predictive of vital capacity (VC), forced VC (FVC), and forced expiratory volume in 1 second (FEV 1 ). A regression equation was used to calculate estimated VC, FVC, and FEV 1 . We then determined whether the estimated data reflected spirometric indices. Agreement between actual and estimated spirometry data was assessed by Bland-Altman analysis. Significant correlations were observed between actual and estimated VC, FVC, and FEV 1 values (all r >0.8 and P <0.001). These results were deemed robust by a separate validation study (all r >0.8 and P <0.001). Bias between the actual data and estimated data for VC, FVC, and FEV 1 in the development study was 0.007 L (95% limits of agreement [LOA] 0.907 and -0.893 L), -0.064 L (95% LOA 0.843 and -0.971 L), and -0.039 L (95% LOA 0.735 and -0.814 L), respectively. On the other hand, bias between the actual data and estimated data for VC, FVC, and FEV 1 in the validation study was -0.201 L (95% LOA 0.62 and -1.022 L), -0.262 L (95% LOA 0.582 and -1.106 L), and -0.174 L (95% LOA 0.576 and -0.923 L), respectively, suggesting that the estimated data in the validation study did not have high accuracy. Further studies are needed to generate more accurate regression equations for spirometric indices based on FOT measurements.
Extensions and applications of ensemble-of-trees methods in machine learning
NASA Astrophysics Data System (ADS)
Bleich, Justin
Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P.; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS.
Robust dynamical decoupling for quantum computing and quantum memory.
Souza, Alexandre M; Alvarez, Gonzalo A; Suter, Dieter
2011-06-17
Dynamical decoupling (DD) is a popular technique for protecting qubits from the environment. However, unless special care is taken, experimental errors in the control pulses used in this technique can destroy the quantum information instead of preserving it. Here, we investigate techniques for making DD sequences robust against different types of experimental errors while retaining good decoupling efficiency in a fluctuating environment. We present experimental data from solid-state nuclear spin qubits and introduce a new DD sequence that is suitable for quantum computing and quantum memory.
Parametric robust control and system identification: Unified approach
NASA Technical Reports Server (NTRS)
Keel, Leehyun
1994-01-01
Despite significant advancement in the area of robust parametric control, the problem of synthesizing such a controller is still a wide open problem. Thus, we attempt to give a solution to this important problem. Our approach captures the parametric uncertainty as an H(sub infinity) unstructured uncertainty so that H(sub infinity) synthesis techniques are applicable. Although the techniques cannot cope with the exact parametric uncertainty, they give a reasonable guideline to model the unstructured uncertainty that contains the parametric uncertainty. An additional loop shaping technique is also introduced to relax its conservatism.
Liu, Wei; Li, Yupeng; Li, Xiaoqiang; Cao, Wenhua; Zhang, Xiaodong
2012-01-01
Purpose: The distal edge tracking (DET) technique in intensity-modulated proton therapy (IMPT) allows for high energy efficiency, fast and simple delivery, and simple inverse treatment planning; however, it is highly sensitive to uncertainties. In this study, the authors explored the application of DET in IMPT (IMPT-DET) and conducted robust optimization of IMPT-DET to see if the planning technique’s sensitivity to uncertainties was reduced. They also compared conventional and robust optimization of IMPT-DET with three-dimensional IMPT (IMPT-3D) to gain understanding about how plan robustness is achieved. Methods: They compared the robustness of IMPT-DET and IMPT-3D plans to uncertainties by analyzing plans created for a typical prostate cancer case and a base of skull (BOS) cancer case (using data for patients who had undergone proton therapy at our institution). Spots with the highest and second highest energy layers were chosen so that the Bragg peak would be at the distal edge of the targets in IMPT-DET using 36 equally spaced angle beams; in IMPT-3D, 3 beams with angles chosen by a beam angle optimization algorithm were planned. Dose contributions for a number of range and setup uncertainties were calculated, and a worst-case robust optimization was performed. A robust quantification technique was used to evaluate the plans’ sensitivity to uncertainties. Results: With no uncertainties considered, the DET is less robust to uncertainties than is the 3D method but offers better normal tissue protection. With robust optimization to account for range and setup uncertainties, robust optimization can improve the robustness of IMPT plans to uncertainties; however, our findings show the extent of improvement varies. Conclusions: IMPT’s sensitivity to uncertainties can be improved by using robust optimization. They found two possible mechanisms that made improvements possible: (1) a localized single-field uniform dose distribution (LSFUD) mechanism, in which the optimization algorithm attempts to produce a single-field uniform dose distribution while minimizing the patching field as much as possible; and (2) perturbed dose distribution, which follows the change in anatomical geometry. Multiple-instance optimization has more knowledge of the influence matrices; this greater knowledge improves IMPT plans’ ability to retain robustness despite the presence of uncertainties. PMID:22755694
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.
Mi, Gu; Di, Yanming; Schafer, Daniel W
2015-01-01
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
NASA Astrophysics Data System (ADS)
Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen
2017-12-01
Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
Tchetgen Tchetgen, Eric
2011-03-01
This article considers the detection and evaluation of genetic effects incorporating gene-environment interaction and independence. Whereas ordinary logistic regression cannot exploit the assumption of gene-environment independence, the proposed approach makes explicit use of the independence assumption to improve estimation efficiency. This method, which uses both cases and controls, fits a constrained retrospective regression in which the genetic variant plays the role of the response variable, and the disease indicator and the environmental exposure are the independent variables. The regression model constrains the association of the environmental exposure with the genetic variant among the controls to be null, thus explicitly encoding the gene-environment independence assumption, which yields substantial gain in accuracy in the evaluation of genetic effects. The proposed retrospective regression approach has several advantages. It is easy to implement with standard software, and it readily accounts for multiple environmental exposures of a polytomous or of a continuous nature, while easily incorporating extraneous covariates. Unlike the profile likelihood approach of Chatterjee and Carroll (Biometrika. 2005;92:399-418), the proposed method does not require a model for the association of a polytomous or continuous exposure with the disease outcome, and, therefore, it is agnostic to the functional form of such a model and completely robust to its possible misspecification.
Abdelkarim, Noha; Mohamed, Amr E; El-Garhy, Ahmed M; Dorrah, Hassen T
2016-01-01
The two-coupled distillation column process is a physically complicated system in many aspects. Specifically, the nested interrelationship between system inputs and outputs constitutes one of the significant challenges in system control design. Mostly, such a process is to be decoupled into several input/output pairings (loops), so that a single controller can be assigned for each loop. In the frame of this research, the Brain Emotional Learning Based Intelligent Controller (BELBIC) forms the control structure for each decoupled loop. The paper's main objective is to develop a parameterization technique for decoupling and control schemes, which ensures robust control behavior. In this regard, the novel optimization technique Bacterial Swarm Optimization (BSO) is utilized for the minimization of summation of the integral time-weighted squared errors (ITSEs) for all control loops. This optimization technique constitutes a hybrid between two techniques, which are the Particle Swarm and Bacterial Foraging algorithms. According to the simulation results, this hybridized technique ensures low mathematical burdens and high decoupling and control accuracy. Moreover, the behavior analysis of the proposed BELBIC shows a remarkable improvement in the time domain behavior and robustness over the conventional PID controller.
Mohamed, Amr E.; Dorrah, Hassen T.
2016-01-01
The two-coupled distillation column process is a physically complicated system in many aspects. Specifically, the nested interrelationship between system inputs and outputs constitutes one of the significant challenges in system control design. Mostly, such a process is to be decoupled into several input/output pairings (loops), so that a single controller can be assigned for each loop. In the frame of this research, the Brain Emotional Learning Based Intelligent Controller (BELBIC) forms the control structure for each decoupled loop. The paper's main objective is to develop a parameterization technique for decoupling and control schemes, which ensures robust control behavior. In this regard, the novel optimization technique Bacterial Swarm Optimization (BSO) is utilized for the minimization of summation of the integral time-weighted squared errors (ITSEs) for all control loops. This optimization technique constitutes a hybrid between two techniques, which are the Particle Swarm and Bacterial Foraging algorithms. According to the simulation results, this hybridized technique ensures low mathematical burdens and high decoupling and control accuracy. Moreover, the behavior analysis of the proposed BELBIC shows a remarkable improvement in the time domain behavior and robustness over the conventional PID controller. PMID:27807444
A secure and robust information hiding technique for covert communication
NASA Astrophysics Data System (ADS)
Parah, S. A.; Sheikh, J. A.; Hafiz, A. M.; Bhat, G. M.
2015-08-01
The unprecedented advancement of multimedia and growth of the internet has made it possible to reproduce and distribute digital media easier and faster. This has given birth to information security issues, especially when the information pertains to national security, e-banking transactions, etc. The disguised form of encrypted data makes an adversary suspicious and increases the chance of attack. Information hiding overcomes this inherent problem of cryptographic systems and is emerging as an effective means of securing sensitive data being transmitted over insecure channels. In this paper, a secure and robust information hiding technique referred to as Intermediate Significant Bit Plane Embedding (ISBPE) is presented. The data to be embedded is scrambled and embedding is carried out using the concept of Pseudorandom Address Vector (PAV) and Complementary Address Vector (CAV) to enhance the security of the embedded data. The proposed ISBPE technique is fully immune to Least Significant Bit (LSB) removal/replacement attack. Experimental investigations reveal that the proposed technique is more robust to various image processing attacks like JPEG compression, Additive White Gaussian Noise (AWGN), low pass filtering, etc. compared to conventional LSB techniques. The various advantages offered by ISBPE technique make it a good candidate for covert communication.
NASA Astrophysics Data System (ADS)
Li, Jianyong; Dodson, John; Yan, Hong; Cheng, Bo; Zhang, Xiaojian; Xu, Qinghai; Ni, Jian; Lu, Fengyan
2017-05-01
Quantitative information regarding the long-term variability of precipitation and vegetation during the period covering both the Late Glacial and the Holocene on the Qinghai-Tibetan Plateau (QTP) is scarce. Herein, we provide new and numerical reconstructions for annual mean precipitation (PANN) and vegetation history over the last 18,000 years using high-resolution pollen data from Lakes Dalianhai and Qinghai on the northeastern QTP. Hitherto, five calibration techniques including weighted averaging, weighted average-partial least squares regression, modern analogue technique, locally weighted weighted averaging regression, and maximum likelihood were first employed to construct robust inference models and to produce reliable PANN estimates on the QTP. The biomization method was applied for reconstructing the vegetation dynamics. The study area was dominated by steppe and characterized with a highly variable, relatively dry climate at 18,000-11,000 cal years B.P. PANN increased since the early Holocene, obtained a maximum at 8000-3000 cal years B.P. with coniferous-temperate mixed forest as the dominant biome, and thereafter declined to present. The PANN reconstructions are broadly consistent with other proxy-based paleoclimatic records from the northeastern QTP and the northern region of monsoonal China. The possible mechanisms behind the precipitation changes may be tentatively attributed to the internal feedback processes of higher latitude (e.g., North Atlantic) and lower latitude (e.g., subtropical monsoon) competing climatic regimes, which are primarily modulated by solar energy output as the external driving force. These findings may provide important insights into understanding the future Asian precipitation dynamics under the projected global warming.
NASA Astrophysics Data System (ADS)
Lee, Kun Chang; Park, Bong-Won
Many online game users purchase game items with which to play free-to-play games. Because of a lack of research into which there is no specified framework for categorizing the values of game items, this study proposes four types of online game item values based on an analysis of literature regarding online game characteristics. It then proposes to investigate how online game users perceive satisfaction and purchase intention from the proposed four types of online game item values. Though regression analysis has been used frequently to answer this kind of research question, we propose a new approach, a General Bayesian Network (GBN), which can be performed in an understandable way without sacrificing predictive accuracy. Conventional techniques, such as regression analysis, do not provide significant explanation for this kind of problem because they are fixed to a linear structure and are limited in explaining why customers are likely to purchase game items and if they are satisfied with their purchases. In contrast, the proposed GBN provides a flexible underlying structure based on questionnaire survey data and offers robust decision support on this kind of research question by identifying its causal relationships. To illustrate the validity of GBN in solving the research question in this study, 327 valid questionnaires were analyzed using GBN with what-if and goal-seeking approaches. The experimental results were promising and meaningful in comparison with regression analysis results.
Solar cycle in current reanalyses: (non)linear attribution study
NASA Astrophysics Data System (ADS)
Kuchar, A.; Sacha, P.; Miksovsky, J.; Pisoft, P.
2014-12-01
This study focusses on the variability of temperature, ozone and circulation characteristics in the stratosphere and lower mesosphere with regard to the influence of the 11 year solar cycle. It is based on attribution analysis using multiple nonlinear techniques (Support Vector Regression, Neural Networks) besides the traditional linear approach. The analysis was applied to several current reanalysis datasets for the 1979-2013 period, including MERRA, ERA-Interim and JRA-55, with the aim to compare how this type of data resolves especially the double-peaked solar response in temperature and ozone variables and the consequent changes induced by these anomalies. Equatorial temperature signals in the lower and upper stratosphere were found to be sufficiently robust and in qualitative agreement with previous observational studies. The analysis also pointed to the solar signal in the ozone datasets (i.e. MERRA and ERA-Interim) not being consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. Consequently the results obtained by linear regression were confirmed by the nonlinear approach through all datasets, suggesting that linear regression is a relevant tool to sufficiently resolve the solar signal in the middle atmosphere. Furthermore, the seasonal dependence of the solar response was also discussed, mainly as a source of dynamical causalities in the wave propagation characteristics in the zonal wind and the induced meridional circulation in the winter hemispheres. The hypothetical mechanism of a weaker Brewer Dobson circulation was reviewed together with discussion of polar vortex stability.
Zhang, Haihong; Guan, Cuntai; Ang, Kai Keng; Wang, Chuanchu
2012-01-01
Detecting motor imagery activities versus non-control in brain signals is the basis of self-paced brain-computer interfaces (BCIs), but also poses a considerable challenge to signal processing due to the complex and non-stationary characteristics of motor imagery as well as non-control. This paper presents a self-paced BCI based on a robust learning mechanism that extracts and selects spatio-spectral features for differentiating multiple EEG classes. It also employs a non-linear regression and post-processing technique for predicting the time-series of class labels from the spatio-spectral features. The method was validated in the BCI Competition IV on Dataset I where it produced the lowest prediction error of class labels continuously. This report also presents and discusses analysis of the method using the competition data set. PMID:22347153
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Katsarov, Plamen; Gergov, Georgi; Alin, Aylin; Pilicheva, Bissera; Al-Degs, Yahya; Simeonov, Vasil; Kassarova, Margarita
2018-03-01
The prediction power of partial least squares (PLS) and multivariate curve resolution-alternating least squares (MCR-ALS) methods have been studied for simultaneous quantitative analysis of the binary drug combination - doxylamine succinate and pyridoxine hydrochloride. Analysis of first-order UV overlapped spectra was performed using different PLS models - classical PLS1 and PLS2 as well as partial robust M-regression (PRM). These linear models were compared to MCR-ALS with equality and correlation constraints (MCR-ALS-CC). All techniques operated within the full spectral region and extracted maximum information for the drugs analysed. The developed chemometric methods were validated on external sample sets and were applied to the analyses of pharmaceutical formulations. The obtained statistical parameters were satisfactory for calibration and validation sets. All developed methods can be successfully applied for simultaneous spectrophotometric determination of doxylamine and pyridoxine both in laboratory-prepared mixtures and commercial dosage forms.
Trace element analysis of rough diamond by LA-ICP-MS: a case of source discrimination?
Dalpé, Claude; Hudon, Pierre; Ballantyne, David J; Williams, Darrell; Marcotte, Denis
2010-11-01
Current profiling of rough diamond source is performed using different physical and/or morphological techniques that require strong knowledge and experience in the field. More recently, chemical impurities have been used to discriminate diamond source and with the advance of laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS) empirical profiling of rough diamonds is possible to some extent. In this study, we present a LA-ICP-MS methodology that we developed for analyzing ultra-trace element impurities in rough diamond for origin determination ("profiling"). Diamonds from two sources were analyzed by LA-ICP-MS and were statistically classified by accepted methods. For the two diamond populations analyzed in this study, binomial logistic regression produced a better overall correct classification than linear discriminant analysis. The results suggest that an anticipated matrix match reference material would improve the robustness of our methodology for forensic applications. © 2010 American Academy of Forensic Sciences.
A Robust Absorbing Boundary Condition for Compressible Flows
NASA Technical Reports Server (NTRS)
Loh, Ching Y.; orgenson, Philip C. E.
2005-01-01
An absorbing non-reflecting boundary condition (NRBC) for practical computations in fluid dynamics and aeroacoustics is presented with theoretical proof. This paper is a continuation and improvement of a previous paper by the author. The absorbing NRBC technique is based on a first principle of non reflecting, which contains the essential physics that a plane wave solution of the Euler equations remains intact across the boundary. The technique is theoretically shown to work for a large class of finite volume approaches. When combined with the hyperbolic conservation laws, the NRBC is simple, robust and truly multi-dimensional; no additional implementation is needed except the prescribed physical boundary conditions. Several numerical examples in multi-dimensional spaces using two different finite volume schemes are illustrated to demonstrate its robustness in practical computations. Limitations and remedies of the technique are also discussed.
Hybrid approach for robust diagnostics of cutting tools
NASA Astrophysics Data System (ADS)
Ramamurthi, K.; Hough, C. L., Jr.
1994-03-01
A new multisensor based hybrid technique has been developed for robust diagnosis of cutting tools. The technique combines the concepts of pattern classification and real-time knowledge based systems (RTKBS) and draws upon their strengths; learning facility in the case of pattern classification and a higher level of reasoning in the case of RTKBS. It eliminates some of their major drawbacks: false alarms or delayed/lack of diagnosis in case of pattern classification and tedious knowledge base generation in case of RTKBS. It utilizes a dynamic distance classifier, developed upon a new separability criterion and a new definition of robust diagnosis for achieving these benefits. The promise of this technique has been proven concretely through an on-line diagnosis of drill wear. Its suitability for practical implementation is substantiated by the use of practical, inexpensive, machine-mounted sensors and low-cost delivery systems.
Geodesic least squares regression for scaling studies in magnetic confinement fusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, Geert
In regression analyses for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS) is the most popular. However, concerns have been raised with respect to several assumptions underlying OLS in its application to scaling laws. We here discuss a new regression method that is robust in the presence of significant uncertainty on both the data and the regression model. The method, which we call geodesic least squares regression (GLS), is based on minimization of the Rao geodesic distance on a probabilistic manifold. We demonstrate the superiority ofmore » the method using synthetic data and we present an application to the scaling law for the power threshold for the transition to the high confinement regime in magnetic confinement fusion devices.« less
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Popular song and lyrics synchronization and its application to music information retrieval
NASA Astrophysics Data System (ADS)
Chen, Kai; Gao, Sheng; Zhu, Yongwei; Sun, Qibin
2006-01-01
An automatic synchronization system of the popular song and its lyrics is presented in the paper. The system includes two main components: a) automatically detecting vocal/non-vocal in the audio signal and b) automatically aligning the acoustic signal of the song with its lyric using speech recognition techniques and positioning the boundaries of the lyrics in its acoustic realization at the multiple levels simultaneously (e.g. the word / syllable level and phrase level). The GMM models and a set of HMM-based acoustic model units are carefully designed and trained for the detection and alignment. To eliminate the severe mismatch due to the diversity of musical signal and sparse training data available, the unsupervised adaptation technique such as maximum likelihood linear regression (MLLR) is exploited for tailoring the models to the real environment, which improves robustness of the synchronization system. To further reduce the effect of the missed non-vocal music on alignment, a novel grammar net is build to direct the alignment. As we know, this is the first automatic synchronization system only based on the low-level acoustic feature such as MFCC. We evaluate the system on a Chinese song dataset collecting from 3 popular singers. We obtain 76.1% for the boundary accuracy at the syllable level (BAS) and 81.5% for the boundary accuracy at the phrase level (BAP) using fully automatic vocal/non-vocal detection and alignment. The synchronization system has many applications such as multi-modality (audio and textual) content-based popular song browsing and retrieval. Through the study, we would like to open up the discussion of some challenging problems when developing a robust synchronization system for largescale database.
ℓ(p)-Norm multikernel learning approach for stock market price forecasting.
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.
Robust Regression through Robust Covariances.
1985-01-01
we apply (2.3). But first let us examine the influence function (see Hampel (1974)). In order to simplify the formulas we will first consider the case...remember that the influence function is an asymptotic 0tooL" and that therefore the population Values of our estimators appear in the formula. V(GR) is...the parameter a , V) based on the data Z1 , ... DZ. via tp =~t 0. Now we can apply the standard formulas to get influence function (see Huber (1981
Lorenz, David L.; Sanocki, Chris A.; Kocian, Matthew J.
2010-01-01
Knowledge of the peak flow of floods of a given recurrence interval is essential for regulation and planning of water resources and for design of bridges, culverts, and dams along Minnesota's rivers and streams. Statistical techniques are needed to estimate peak flow at ungaged sites because long-term streamflow records are available at relatively few places. Because of the need to have up-to-date peak-flow frequency information in order to estimate peak flows at ungaged sites, the U.S. Geological Survey (USGS) conducted a peak-flow frequency study in cooperation with the Minnesota Department of Transportation and the Minnesota Pollution Control Agency. Estimates of peak-flow magnitudes for 1.5-, 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence intervals are presented for 330 streamflow-gaging stations in Minnesota and adjacent areas in Iowa and South Dakota based on data through water year 2005. The peak-flow frequency information was subsequently used in regression analyses to develop equations relating peak flows for selected recurrence intervals to various basin and climatic characteristics. Two statistically derived techniques-regional regression equation and region of influence regression-can be used to estimate peak flow on ungaged streams smaller than 3,000 square miles in Minnesota. Regional regression equations were developed for selected recurrence intervals in each of six regions in Minnesota: A (northwestern), B (north central and east central), C (northeastern), D (west central and south central), E (southwestern), and F (southeastern). The regression equations can be used to estimate peak flows at ungaged sites. The region of influence regression technique dynamically selects streamflow-gaging stations with characteristics similar to a site of interest. Thus, the region of influence regression technique allows use of a potentially unique set of gaging stations for estimating peak flow at each site of interest. Two methods of selecting streamflow-gaging stations, similarity and proximity, can be used for the region of influence regression technique. The regional regression equation technique is the preferred technique as an estimate of peak flow in all six regions for ungaged sites. The region of influence regression technique is not appropriate for regions C, E, and F because the interrelations of some characteristics of those regions do not agree with the interrelations throughout the rest of the State. Both the similarity and proximity methods for the region of influence technique can be used in the other regions (A, B, and D) to provide additional estimates of peak flow. The peak-flow-frequency estimates and basin characteristics for selected streamflow-gaging stations and regional peak-flow regression equations are included in this report.
Low order H∞ optimal control for ACFA blended wing body aircraft
NASA Astrophysics Data System (ADS)
Haniš, T.; Kucera, V.; Hromčík, M.
2013-12-01
Advanced nonconvex nonsmooth optimization techniques for fixed-order H∞ robust control are proposed in this paper for design of flight control systems (FCS) with prescribed structure. Compared to classical techniques - tuning of and successive closures of particular single-input single-output (SISO) loops like dampers, attitude stabilizers, etc. - all loops are designed simultaneously by means of quite intuitive weighting filters selection. In contrast to standard optimization techniques, though (H2, H∞ optimization), the resulting controller respects the prescribed structure in terms of engaged channels and orders (e. g., proportional (P), proportional-integral (PI), and proportional-integralderivative (PID) controllers). In addition, robustness with regard to multimodel uncertainty is also addressed which is of most importance for aerospace applications as well. Such a way, robust controllers for various Mach numbers, altitudes, or mass cases can be obtained directly, based only on particular mathematical models for respective combinations of the §ight parameters.
Correlation techniques to determine model form in robust nonlinear system realization/identification
NASA Technical Reports Server (NTRS)
Stry, Greselda I.; Mook, D. Joseph
1991-01-01
The fundamental challenge in identification of nonlinear dynamic systems is determining the appropriate form of the model. A robust technique is presented which essentially eliminates this problem for many applications. The technique is based on the Minimum Model Error (MME) optimal estimation approach. A detailed literature review is included in which fundamental differences between the current approach and previous work is described. The most significant feature is the ability to identify nonlinear dynamic systems without prior assumption regarding the form of the nonlinearities, in contrast to existing nonlinear identification approaches which usually require detailed assumptions of the nonlinearities. Model form is determined via statistical correlation of the MME optimal state estimates with the MME optimal model error estimates. The example illustrations indicate that the method is robust with respect to prior ignorance of the model, and with respect to measurement noise, measurement frequency, and measurement record length.
Regression Commonality Analysis: A Technique for Quantitative Theory Building
ERIC Educational Resources Information Center
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
Development of an automated energy audit protocol for office buildings
NASA Astrophysics Data System (ADS)
Deb, Chirag
This study aims to enhance the building energy audit process, and bring about reduction in time and cost requirements in the conduction of a full physical audit. For this, a total of 5 Energy Service Companies in Singapore have collaborated and provided energy audit reports for 62 office buildings. Several statistical techniques are adopted to analyse these reports. These techniques comprise cluster analysis and development of prediction models to predict energy savings for buildings. The cluster analysis shows that there are 3 clusters of buildings experiencing different levels of energy savings. To understand the effect of building variables on the change in EUI, a robust iterative process for selecting the appropriate variables is developed. The results show that the 4 variables of GFA, non-air-conditioning energy consumption, average chiller plant efficiency and installed capacity of chillers should be taken for clustering. This analysis is extended to the development of prediction models using linear regression and artificial neural networks (ANN). An exhaustive variable selection algorithm is developed to select the input variables for the two energy saving prediction models. The results show that the ANN prediction model can predict the energy saving potential of a given building with an accuracy of +/-14.8%.
Pollen-based continental climate reconstructions at 6 and 21 ka: a global synthesis
Bartlein, P.J.; Harrison, S.P.; Brewer, Sandra; Connor, S.; Davis, B.A.S.; Gajewski, K.; Guiot, J.; Harrison-Prentice, T. I.; Henderson, A.; Peyron, O.; Prentice, I.C.; Scholze, M.; Seppa, H.; Shuman, B.; Sugita, S.; Thompson, R.S.; Viau, A.E.; Williams, J.; Wu, H.
2010-01-01
Subfossil pollen and plant macrofossil data derived from 14C-dated sediment profiles can provide quantitative information on glacial and interglacial climates. The data allow climate variables related to growing-season warmth, winter cold, and plant-available moisture to be reconstructed. Continental-scale reconstructions have been made for the mid-Holocene (MH, around 6 ka) and Last Glacial Maximum (LGM, around 21 ka), allowing comparison with palaeoclimate simulations currently being carried out as part of the fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change. The synthesis of the available MH and LGM climate reconstructions and their uncertainties, obtained using modern-analogue, regression and model-inversion techniques, is presented for four temperature variables and two moisture variables. Reconstructions of the same variables based on surface-pollen assemblages are shown to be accurate and unbiased. Reconstructed LGM and MH climate anomaly patterns are coherent, consistent between variables, and robust with respect to the choice of technique. They support a conceptual model of the controls of Late Quaternary climate change whereby the first-order effects of orbital variations and greenhouse forcing on the seasonal cycle of temperature are predictably modified by responses of the atmospheric circulation and surface energy balance.
Robust kernel representation with statistical local features for face recognition.
Yang, Meng; Zhang, Lei; Shiu, Simon Chi-Keung; Zhang, David
2013-06-01
Factors such as misalignment, pose variation, and occlusion make robust face recognition a difficult problem. It is known that statistical features such as local binary pattern are effective for local feature extraction, whereas the recently proposed sparse or collaborative representation-based classification has shown interesting results in robust face recognition. In this paper, we propose a novel robust kernel representation model with statistical local features (SLF) for robust face recognition. Initially, multipartition max pooling is used to enhance the invariance of SLF to image registration error. Then, a kernel-based representation model is proposed to fully exploit the discrimination information embedded in the SLF, and robust regression is adopted to effectively handle the occlusion in face images. Extensive experiments are conducted on benchmark face databases, including extended Yale B, AR (A. Martinez and R. Benavente), multiple pose, illumination, and expression (multi-PIE), facial recognition technology (FERET), face recognition grand challenge (FRGC), and labeled faces in the wild (LFW), which have different variations of lighting, expression, pose, and occlusions, demonstrating the promising performance of the proposed method.
Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective
Qian, Xiaoning; Dougherty, Edward R.
2017-01-01
The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268
System Synthesis in Preliminary Aircraft Design using Statistical Methods
NASA Technical Reports Server (NTRS)
DeLaurentis, Daniel; Mavris, Dimitri N.; Schrage, Daniel P.
1996-01-01
This paper documents an approach to conceptual and preliminary aircraft design in which system synthesis is achieved using statistical methods, specifically design of experiments (DOE) and response surface methodology (RSM). These methods are employed in order to more efficiently search the design space for optimum configurations. In particular, a methodology incorporating three uses of these techniques is presented. First, response surface equations are formed which represent aerodynamic analyses, in the form of regression polynomials, which are more sophisticated than generally available in early design stages. Next, a regression equation for an overall evaluation criterion is constructed for the purpose of constrained optimization at the system level. This optimization, though achieved in a innovative way, is still traditional in that it is a point design solution. The methodology put forward here remedies this by introducing uncertainty into the problem, resulting a solutions which are probabilistic in nature. DOE/RSM is used for the third time in this setting. The process is demonstrated through a detailed aero-propulsion optimization of a high speed civil transport. Fundamental goals of the methodology, then, are to introduce higher fidelity disciplinary analyses to the conceptual aircraft synthesis and provide a roadmap for transitioning from point solutions to probabalistic designs (and eventually robust ones).
Robust, Decoupled, Flight Control Design with Rate Saturating Actuators
NASA Technical Reports Server (NTRS)
Snell, S. A.; Hess, R. A.
1997-01-01
Techniques for the design of control systems for manually controlled, high-performance aircraft must provide the following: (1) multi-input, multi-output (MIMO) solutions, (2) acceptable handling qualities including no tendencies for pilot-induced oscillations, (3) a tractable approach for compensator design, (4) performance and stability robustness in the presence of significant plant uncertainty, and (5) performance and stability robustness in the presence actuator saturation (particularly rate saturation). A design technique built upon Quantitative Feedback Theory is offered as a candidate methodology which can provide flight control systems meeting these requirements, and do so over a considerable part of the flight envelope. An example utilizing a simplified model of a supermaneuverable fighter aircraft demonstrates the proposed design methodology.
Efficient and robust analysis of complex scattering data under noise in microwave resonators.
Probst, S; Song, F B; Bushev, P A; Ustinov, A V; Weides, M
2015-02-01
Superconducting microwave resonators are reliable circuits widely used for detection and as test devices for material research. A reliable determination of their external and internal quality factors is crucial for many modern applications, which either require fast measurements or operate in the single photon regime with small signal to noise ratios. Here, we use the circle fit technique with diameter correction and provide a step by step guide for implementing an algorithm for robust fitting and calibration of complex resonator scattering data in the presence of noise. The speedup and robustness of the analysis are achieved by employing an algebraic rather than an iterative fit technique for the resonance circle.
Biometric feature embedding using robust steganography technique
NASA Astrophysics Data System (ADS)
Rashid, Rasber D.; Sellahewa, Harin; Jassim, Sabah A.
2013-05-01
This paper is concerned with robust steganographic techniques to hide and communicate biometric data in mobile media objects like images, over open networks. More specifically, the aim is to embed binarised features extracted using discrete wavelet transforms and local binary patterns of face images as a secret message in an image. The need for such techniques can arise in law enforcement, forensics, counter terrorism, internet/mobile banking and border control. What differentiates this problem from normal information hiding techniques is the added requirement that there should be minimal effect on face recognition accuracy. We propose an LSB-Witness embedding technique in which the secret message is already present in the LSB plane but instead of changing the cover image LSB values, the second LSB plane will be changed to stand as a witness/informer to the receiver during message recovery. Although this approach may affect the stego quality, it is eliminating the weakness of traditional LSB schemes that is exploited by steganalysis techniques for LSB, such as PoV and RS steganalysis, to detect the existence of secrete message. Experimental results show that the proposed method is robust against PoV and RS attacks compared to other variants of LSB. We also discussed variants of this approach and determine capacity requirements for embedding face biometric feature vectors while maintain accuracy of face recognition.
Adaptive torque estimation of robot joint with harmonic drive transmission
NASA Astrophysics Data System (ADS)
Shi, Zhiguo; Li, Yuankai; Liu, Guangjun
2017-11-01
Robot joint torque estimation using input and output position measurements is a promising technique, but the result may be affected by the load variation of the joint. In this paper, a torque estimation method with adaptive robustness and optimality adjustment according to load variation is proposed for robot joint with harmonic drive transmission. Based on a harmonic drive model and a redundant adaptive robust Kalman filter (RARKF), the proposed approach can adapt torque estimation filtering optimality and robustness to the load variation by self-tuning the filtering gain and self-switching the filtering mode between optimal and robust. The redundant factor of RARKF is designed as a function of the motor current for tolerating the modeling error and load-dependent filtering mode switching. The proposed joint torque estimation method has been experimentally studied in comparison with a commercial torque sensor and two representative filtering methods. The results have demonstrated the effectiveness of the proposed torque estimation technique.
Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.
Maniruzzaman, Md; Rahman, Md Jahanur; Al-MehediHasan, Md; Suri, Harman S; Abedin, Md Menhazul; El-Baz, Ayman; Suri, Jasjit S
2018-04-10
Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.
Handling nonnormality and variance heterogeneity for quantitative sublethal toxicity tests.
Ritz, Christian; Van der Vliet, Leana
2009-09-01
The advantages of using regression-based techniques to derive endpoints from environmental toxicity data are clear, and slowly, this superior analytical technique is gaining acceptance. As use of regression-based analysis becomes more widespread, some of the associated nuances and potential problems come into sharper focus. Looking at data sets that cover a broad spectrum of standard test species, we noticed that some model fits to data failed to meet two key assumptions-variance homogeneity and normality-that are necessary for correct statistical analysis via regression-based techniques. Failure to meet these assumptions often is caused by reduced variance at the concentrations showing severe adverse effects. Although commonly used with linear regression analysis, transformation of the response variable only is not appropriate when fitting data using nonlinear regression techniques. Through analysis of sample data sets, including Lemna minor, Eisenia andrei (terrestrial earthworm), and algae, we show that both the so-called Box-Cox transformation and use of the Poisson distribution can help to correct variance heterogeneity and nonnormality and so allow nonlinear regression analysis to be implemented. Both the Box-Cox transformation and the Poisson distribution can be readily implemented into existing protocols for statistical analysis. By correcting for nonnormality and variance heterogeneity, these two statistical tools can be used to encourage the transition to regression-based analysis and the depreciation of less-desirable and less-flexible analytical techniques, such as linear interpolation.
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Estimating the Standard Error of Robust Regression Estimates.
1987-03-01
error is 0(n4/5). In another Monte Carlo study, McKean and Schrader (1984) found that the tests resulting from studentizing ; by _3d/1/2 with d =0(n4 /5...44 4 -:~~-~*v: -. *;~ ~ ~*t .~ # ~ 44 % * ~ .%j % % % * . ., ~ -%. -14- Sheather, S. J. and McKean, J. W. (1987). A comparison of testing and...Wiley, New York. Welsch, R. E. (1980). Regression Sensitivity Analysis and Bounded- Influence Estimation, in Evaluation of Econometric Models eds. J
Quantitative Structure Retention Relationships of Polychlorinated Dibenzodioxins and Dibenzofurans
1991-08-01
be a projection onto the X-Y plane. The algorithm for this calculation can be found in Stouch and Jurs (22), but was further refined by Rohrbaugh and...throughspace distances. WPSA2 (c) Weighted positive charged surface area. MOMH2 (c) Second major moment of inertia with hydrogens attached. CSTR 3 (d) Sum...of the models. The robust regression analysis method calculates a regression model using a least median squares algorithm which is not as susceptible
ERIC Educational Resources Information Center
Coskuntuncel, Orkun
2013-01-01
The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…
NASA Astrophysics Data System (ADS)
Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza
2018-03-01
In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.
Dual measurement self-sensing technique of NiTi actuators for use in robust control
NASA Astrophysics Data System (ADS)
Gurley, Austin; Lambert, Tyler Ross; Beale, David; Broughton, Royall
2017-10-01
Using a shape memory alloy actuator as both an actuator and a sensor provides huge benefits in cost reduction and miniaturization of robotic devices. Despite much effort, reliable and robust self-sensing (using the actuator as a position sensor) had not been achieved for general temperature, loading, hysteresis path, and fatigue conditions. Prior research has sought to model the intricacies of the electrical resistivity changes within the NiTi material. However, for the models to be solvable, nearly every previous technique only models the actuator within very specific boundary conditions. Here, we measure both the voltage across the entire NiTi wire and of a fixed-length segment of it; these dual measurements allow direct calculation of the actuator length without a material model. We review previous self-sensing literature, illustrate the mechanism design that makes the new technique possible, and use the dual measurement technique to determine the length of a single straight wire actuator under controlled conditions. This robust measurement can be used for feedback control in unknown ambient and loading conditions.
Realisation and robustness evaluation of a blind spatial domain watermarking technique
NASA Astrophysics Data System (ADS)
Parah, Shabir A.; Sheikh, Javaid A.; Assad, Umer I.; Bhat, Ghulam M.
2017-04-01
A blind digital image watermarking scheme based on spatial domain is presented and investigated in this paper. The watermark has been embedded in intermediate significant bit planes besides the least significant bit plane at the address locations determined by pseudorandom address vector (PAV). The watermark embedding using PAV makes it difficult for an adversary to locate the watermark and hence adds to security of the system. The scheme has been evaluated to ascertain the spatial locations that are robust to various image processing and geometric attacks JPEG compression, additive white Gaussian noise, salt and pepper noise, filtering and rotation. The experimental results obtained, reveal an interesting fact, that, for all the above mentioned attacks, other than rotation, higher the bit plane in which watermark is embedded more robust the system. Further, the perceptual quality of the watermarked images obtained in the proposed system has been compared with some state-of-art watermarking techniques. The proposed technique outperforms the techniques under comparison, even if compared with the worst case peak signal-to-noise ratio obtained in our scheme.
Spatial Assessment of Model Errors from Four Regression Techniques
Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove
2005-01-01
Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561
Running Technique is an Important Component of Running Economy and Performance
FOLLAND, JONATHAN P.; ALLEN, SAM J.; BLACK, MATTHEW I.; HANDSAKER, JOSEPH C.; FORRESTER, STEPHANIE E.
2017-01-01
ABSTRACT Despite an intuitive relationship between technique and both running economy (RE) and performance, and the diverse techniques used by runners to achieve forward locomotion, the objective importance of overall technique and the key components therein remain to be elucidated. Purpose This study aimed to determine the relationship between individual and combined kinematic measures of technique with both RE and performance. Methods Ninety-seven endurance runners (47 females) of diverse competitive standards performed a discontinuous protocol of incremental treadmill running (4-min stages, 1-km·h−1 increments). Measurements included three-dimensional full-body kinematics, respiratory gases to determine energy cost, and velocity of lactate turn point. Five categories of kinematic measures (vertical oscillation, braking, posture, stride parameters, and lower limb angles) and locomotory energy cost (LEc) were averaged across 10–12 km·h−1 (the highest common velocity < velocity of lactate turn point). Performance was measured as season's best (SB) time converted to a sex-specific z-score. Results Numerous kinematic variables were correlated with RE and performance (LEc, 19 variables; SB time, 11 variables). Regression analysis found three variables (pelvis vertical oscillation during ground contact normalized to height, minimum knee joint angle during ground contact, and minimum horizontal pelvis velocity) explained 39% of LEc variability. In addition, four variables (minimum horizontal pelvis velocity, shank touchdown angle, duty factor, and trunk forward lean) combined to explain 31% of the variability in performance (SB time). Conclusions This study provides novel and robust evidence that technique explains a substantial proportion of the variance in RE and performance. We recommend that runners and coaches are attentive to specific aspects of stride parameters and lower limb angles in part to optimize pelvis movement, and ultimately enhance performance. PMID:28263283
Use of Empirical Estimates of Shrinkage in Multiple Regression: A Caution.
ERIC Educational Resources Information Center
Kromrey, Jeffrey D.; Hines, Constance V.
1995-01-01
The accuracy of four empirical techniques to estimate shrinkage in multiple regression was studied through Monte Carlo simulation. None of the techniques provided unbiased estimates of the population squared multiple correlation coefficient, but the normalized jackknife and bootstrap techniques demonstrated marginally acceptable performance with…
Yelland, Lisa N; Salter, Amy B; Ryan, Philip
2011-10-15
Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.
Visual tracking using objectness-bounding box regression and correlation filters
NASA Astrophysics Data System (ADS)
Mbelwa, Jimmy T.; Zhao, Qingjie; Lu, Yao; Wang, Fasheng; Mbise, Mercy
2018-03-01
Visual tracking is a fundamental problem in computer vision with extensive application domains in surveillance and intelligent systems. Recently, correlation filter-based tracking methods have shown a great achievement in terms of robustness, accuracy, and speed. However, such methods have a problem of dealing with fast motion (FM), motion blur (MB), illumination variation (IV), and drifting caused by occlusion (OCC). To solve this problem, a tracking method that integrates objectness-bounding box regression (O-BBR) model and a scheme based on kernelized correlation filter (KCF) is proposed. The scheme based on KCF is used to improve the tracking performance of FM and MB. For handling drift problem caused by OCC and IV, we propose objectness proposals trained in bounding box regression as prior knowledge to provide candidates and background suppression. Finally, scheme KCF as a base tracker and O-BBR are fused to obtain a state of a target object. Extensive experimental comparisons of the developed tracking method with other state-of-the-art trackers are performed on some of the challenging video sequences. Experimental comparison results show that our proposed tracking method outperforms other state-of-the-art tracking methods in terms of effectiveness, accuracy, and robustness.
Tools to Support Interpreting Multiple Regression in the Face of Multicollinearity
Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.
2012-01-01
While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses. PMID:22457655
Tools to support interpreting multiple regression in the face of multicollinearity.
Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K
2012-01-01
While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.
USDA-ARS?s Scientific Manuscript database
Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...
Geographically weighted regression and multicollinearity: dispelling the myth
NASA Astrophysics Data System (ADS)
Fotheringham, A. Stewart; Oshan, Taylor M.
2016-10-01
Geographically weighted regression (GWR) extends the familiar regression framework by estimating a set of parameters for any number of locations within a study area, rather than producing a single parameter estimate for each relationship specified in the model. Recent literature has suggested that GWR is highly susceptible to the effects of multicollinearity between explanatory variables and has proposed a series of local measures of multicollinearity as an indicator of potential problems. In this paper, we employ a controlled simulation to demonstrate that GWR is in fact very robust to the effects of multicollinearity. Consequently, the contention that GWR is highly susceptible to multicollinearity issues needs rethinking.
Quantile regression in the presence of monotone missingness with sensitivity analysis
Liu, Minzhao; Daniels, Michael J.; Perri, Michael G.
2016-01-01
In this paper, we develop methods for longitudinal quantile regression when there is monotone missingness. In particular, we propose pattern mixture models with a constraint that provides a straightforward interpretation of the marginal quantile regression parameters. Our approach allows sensitivity analysis which is an essential component in inference for incomplete data. To facilitate computation of the likelihood, we propose a novel way to obtain analytic forms for the required integrals. We conduct simulations to examine the robustness of our approach to modeling assumptions and compare its performance to competing approaches. The model is applied to data from a recent clinical trial on weight management. PMID:26041008
A Streamflow Statistics (StreamStats) Web Application for Ohio
Koltun, G.F.; Kula, Stephanie P.; Puskas, Barry M.
2006-01-01
A StreamStats Web application was developed for Ohio that implements equations for estimating a variety of streamflow statistics including the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year peak streamflows, mean annual streamflow, mean monthly streamflows, harmonic mean streamflow, and 25th-, 50th-, and 75th-percentile streamflows. StreamStats is a Web-based geographic information system application designed to facilitate the estimation of streamflow statistics at ungaged locations on streams. StreamStats can also serve precomputed streamflow statistics determined from streamflow-gaging station data. The basic structure, use, and limitations of StreamStats are described in this report. To facilitate the level of automation required for Ohio's StreamStats application, the technique used by Koltun (2003)1 for computing main-channel slope was replaced with a new computationally robust technique. The new channel-slope characteristic, referred to as SL10-85, differed from the National Hydrography Data based channel slope values (SL) reported by Koltun (2003)1 by an average of -28.3 percent, with the median change being -13.2 percent. In spite of the differences, the two slope measures are strongly correlated. The change in channel slope values resulting from the change in computational method necessitated revision of the full-model equations for flood-peak discharges originally presented by Koltun (2003)1. Average standard errors of prediction for the revised full-model equations presented in this report increased by a small amount over those reported by Koltun (2003)1, with increases ranging from 0.7 to 0.9 percent. Mean percentage changes in the revised regression and weighted flood-frequency estimates relative to regression and weighted estimates reported by Koltun (2003)1 were small, ranging from -0.72 to -0.25 percent and -0.22 to 0.07 percent, respectively.
Regression without truth with Markov chain Monte-Carlo
NASA Astrophysics Data System (ADS)
Madan, Hennadii; Pernuš, Franjo; Likar, Boštjan; Å piclin, Žiga
2017-03-01
Regression without truth (RWT) is a statistical technique for estimating error model parameters of each method in a group of methods used for measurement of a certain quantity. A very attractive aspect of RWT is that it does not rely on a reference method or "gold standard" data, which is otherwise difficult RWT was used for a reference-free performance comparison of several methods for measuring left ventricular ejection fraction (EF), i.e. a percentage of blood leaving the ventricle each time the heart contracts, and has since been applied for various other quantitative imaging biomarkerss (QIBs). Herein, we show how Markov chain Monte-Carlo (MCMC), a computational technique for drawing samples from a statistical distribution with probability density function known only up to a normalizing coefficient, can be used to augment RWT to gain a number of important benefits compared to the original approach based on iterative optimization. For instance, the proposed MCMC-based RWT enables the estimation of joint posterior distribution of the parameters of the error model, straightforward quantification of uncertainty of the estimates, estimation of true value of the measurand and corresponding credible intervals (CIs), does not require a finite support for prior distribution of the measureand generally has a much improved robustness against convergence to non-global maxima. The proposed approach is validated using synthetic data that emulate the EF data for 45 patients measured with 8 different methods. The obtained results show that 90% CI of the corresponding parameter estimates contain the true values of all error model parameters and the measurand. A potential real-world application is to take measurements of a certain QIB several different methods and then use the proposed framework to compute the estimates of the true values and their uncertainty, a vital information for diagnosis based on QIB.
Seed robustness of oriented relative fuzzy connectedness: core computation and its applications
NASA Astrophysics Data System (ADS)
Tavares, Anderson C. M.; Bejar, Hans H. C.; Miranda, Paulo A. V.
2017-02-01
In this work, we present a formal definition and an efficient algorithm to compute the cores of Oriented Relative Fuzzy Connectedness (ORFC), a recent seed-based segmentation technique. The core is a region where the seed can be moved without altering the segmentation, an important aspect for robust techniques and reduction of user effort. We show how ORFC cores can be used to build a powerful hybrid image segmentation approach. We also provide some new theoretical relations between ORFC and Oriented Image Foresting Transform (OIFT), as well as their cores. Experimental results among several methods show that the hybrid approach conserves high accuracy, avoids the shrinking problem and provides robustness to seed placement inside the desired object due to the cores properties.
H.264/AVC digital fingerprinting based on spatio-temporal just noticeable distortion
NASA Astrophysics Data System (ADS)
Ait Saadi, Karima; Bouridane, Ahmed; Guessoum, Abderrezak
2014-01-01
This paper presents a robust adaptive embedding scheme using a modified Spatio-Temporal noticeable distortion (JND) model that is designed for tracing the distribution of the H.264/AVC video content and protecting them from unauthorized redistribution. The Embedding process is performed during coding process in selected macroblocks type Intra 4x4 within I-Frame. The method uses spread-spectrum technique in order to obtain robustness against collusion attacks and the JND model to dynamically adjust the embedding strength and control the energy of the embedded fingerprints so as to ensure their imperceptibility. Linear and non linear collusion attacks are performed to show the robustness of the proposed technique against collusion attacks while maintaining visual quality unchanged.
Predicting School Enrollments Using the Modified Regression Technique.
ERIC Educational Resources Information Center
Grip, Richard S.; Young, John W.
This report is based on a study in which a regression model was constructed to increase accuracy in enrollment predictions. A model, known as the Modified Regression Technique (MRT), was used to examine K-12 enrollment over the past 20 years in 2 New Jersey school districts of similar size and ethnicity. To test the model's accuracy, MRT was…
What Are the Odds of that? A Primer on Understanding Logistic Regression
ERIC Educational Resources Information Center
Huang, Francis L.; Moon, Tonya R.
2013-01-01
The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
A Method to Achieve High Fidelity in Internet-Distributed Hardware-in-the-Loop Simulation
2012-08-01
2008. [17] M. Compere , J. Goodell, M. Simon, W. Smith, and M. Brudnak, "Robust control techniques enabling duty cycle experiments utilizing a 6-DOF...01-3077, 2006. [18] J. Goodell, M. Compere , M. Simon, W. Smith, R. Wright, and M. Brudnak, "Robust control techniques for state tracking in the...presence of variable time delays," SAE Technical Paper, 2006-01-1163, 2006. [19] M. Brudnak, M. Pozolo, V. Paul, S. Mohammad, W. Smith, M. Compere , J
2012-06-15
pp. 535-543. [17] Compere , M., Goodell, J., Simon, M., Smith, W., and Brudnak, M., 2006, "Robust Control Techniques Enabling Duty Cycle...Technical Paper, 2006-01-3077. [18] Goodell, J., Compere , M., Simon, M., Smith, W., Wright, R., and Brudnak, M., 2006, "Robust Control Techniques for...Smith, W., Compere , M., Goodell, J., Holtz, D., Mortsfield, T., and Shvartsman, A., 2007, "Soldier/Harware-in-the-Loop Simulation- Based Combat Vehicle
Gilliom, Robert J.; Helsel, Dennis R.
1986-01-01
A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensored observations, for determining the best performing parameter estimation method for any particular data set. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilliom, R.J.; Helsel, D.R.
1986-02-01
A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensoredmore » observations, for determining the best performing parameter estimation method for any particular data det. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.« less
McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
2009-01-01
Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Evaluation of Ares-I Control System Robustness to Uncertain Aerodynamics and Flex Dynamics
NASA Technical Reports Server (NTRS)
Jang, Jiann-Woei; VanTassel, Chris; Bedrossian, Nazareth; Hall, Charles; Spanos, Pol
2008-01-01
This paper discusses the application of robust control theory to evaluate robustness of the Ares-I control systems. Three techniques for estimating upper and lower bounds of uncertain parameters which yield stable closed-loop response are used here: (1) Monte Carlo analysis, (2) mu analysis, and (3) characteristic frequency response analysis. All three methods are used to evaluate stability envelopes of the Ares-I control systems with uncertain aerodynamics and flex dynamics. The results show that characteristic frequency response analysis is the most effective of these methods for assessing robustness.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Hierarchical Modeling and Robust Synthesis for the Preliminary Design of Large Scale Complex Systems
NASA Technical Reports Server (NTRS)
Koch, Patrick N.
1997-01-01
Large-scale complex systems are characterized by multiple interacting subsystems and the analysis of multiple disciplines. The design and development of such systems inevitably requires the resolution of multiple conflicting objectives. The size of complex systems, however, prohibits the development of comprehensive system models, and thus these systems must be partitioned into their constituent parts. Because simultaneous solution of individual subsystem models is often not manageable iteration is inevitable and often excessive. In this dissertation these issues are addressed through the development of a method for hierarchical robust preliminary design exploration to facilitate concurrent system and subsystem design exploration, for the concurrent generation of robust system and subsystem specifications for the preliminary design of multi-level, multi-objective, large-scale complex systems. This method is developed through the integration and expansion of current design techniques: Hierarchical partitioning and modeling techniques for partitioning large-scale complex systems into more tractable parts, and allowing integration of subproblems for system synthesis; Statistical experimentation and approximation techniques for increasing both the efficiency and the comprehensiveness of preliminary design exploration; and Noise modeling techniques for implementing robust preliminary design when approximate models are employed. Hierarchical partitioning and modeling techniques including intermediate responses, linking variables, and compatibility constraints are incorporated within a hierarchical compromise decision support problem formulation for synthesizing subproblem solutions for a partitioned system. Experimentation and approximation techniques are employed for concurrent investigations and modeling of partitioned subproblems. A modified composite experiment is introduced for fitting better predictive models across the ranges of the factors, and an approach for constructing partitioned response surfaces is developed to reduce the computational expense of experimentation for fitting models in a large number of factors. Noise modeling techniques are compared and recommendations are offered for the implementation of robust design when approximate models are sought. These techniques, approaches, and recommendations are incorporated within the method developed for hierarchical robust preliminary design exploration. This method as well as the associated approaches are illustrated through their application to the preliminary design of a commercial turbofan turbine propulsion system. The case study is developed in collaboration with Allison Engine Company, Rolls Royce Aerospace, and is based on the Allison AE3007 existing engine designed for midsize commercial, regional business jets. For this case study, the turbofan system-level problem is partitioned into engine cycle design and configuration design and a compressor modules integrated for more detailed subsystem-level design exploration, improving system evaluation. The fan and low pressure turbine subsystems are also modeled, but in less detail. Given the defined partitioning, these subproblems are investigated independently and concurrently, and response surface models are constructed to approximate the responses of each. These response models are then incorporated within a commercial turbofan hierarchical compromise decision support problem formulation. Five design scenarios are investigated, and robust solutions are identified. The method and solutions identified are verified by comparison with the AE3007 engine. The solutions obtained are similar to the AE3007 cycle and configuration, but are better with respect to many of the requirements.
DCT-based cyber defense techniques
NASA Astrophysics Data System (ADS)
Amsalem, Yaron; Puzanov, Anton; Bedinerman, Anton; Kutcher, Maxim; Hadar, Ofer
2015-09-01
With the increasing popularity of video streaming services and multimedia sharing via social networks, there is a need to protect the multimedia from malicious use. An attacker may use steganography and watermarking techniques to embed malicious content, in order to attack the end user. Most of the attack algorithms are robust to basic image processing techniques such as filtering, compression, noise addition, etc. Hence, in this article two novel, real-time, defense techniques are proposed: Smart threshold and anomaly correction. Both techniques operate at the DCT domain, and are applicable for JPEG images and H.264 I-Frames. The defense performance was evaluated against a highly robust attack, and the perceptual quality degradation was measured by the well-known PSNR and SSIM quality assessment metrics. A set of defense techniques is suggested for improving the defense efficiency. For the most aggressive attack configuration, the combination of all the defense techniques results in 80% protection against cyber-attacks with PSNR of 25.74 db.
Video multiple watermarking technique based on image interlacing using DWT.
Ibrahim, Mohamed M; Abdel Kader, Neamat S; Zorkany, M
2014-01-01
Digital watermarking is one of the important techniques to secure digital media files in the domains of data authentication and copyright protection. In the nonblind watermarking systems, the need of the original host file in the watermark recovery operation makes an overhead over the system resources, doubles memory capacity, and doubles communications bandwidth. In this paper, a robust video multiple watermarking technique is proposed to solve this problem. This technique is based on image interlacing. In this technique, three-level discrete wavelet transform (DWT) is used as a watermark embedding/extracting domain, Arnold transform is used as a watermark encryption/decryption method, and different types of media (gray image, color image, and video) are used as watermarks. The robustness of this technique is tested by applying different types of attacks such as: geometric, noising, format-compression, and image-processing attacks. The simulation results show the effectiveness and good performance of the proposed technique in saving system resources, memory capacity, and communications bandwidth.
Unsupervised, Robust Estimation-based Clustering for Multispectral Images
NASA Technical Reports Server (NTRS)
Netanyahu, Nathan S.
1997-01-01
To prepare for the challenge of handling the archiving and querying of terabyte-sized scientific spatial databases, the NASA Goddard Space Flight Center's Applied Information Sciences Branch (AISB, Code 935) developed a number of characterization algorithms that rely on supervised clustering techniques. The research reported upon here has been aimed at continuing the evolution of some of these supervised techniques, namely the neural network and decision tree-based classifiers, plus extending the approach to incorporating unsupervised clustering algorithms, such as those based on robust estimation (RE) techniques. The algorithms developed under this task should be suited for use by the Intelligent Information Fusion System (IIFS) metadata extraction modules, and as such these algorithms must be fast, robust, and anytime in nature. Finally, so that the planner/schedule module of the IlFS can oversee the use and execution of these algorithms, all information required by the planner/scheduler must be provided to the IIFS development team to ensure the timely integration of these algorithms into the overall system.
Robust learning for optimal treatment decision with NP-dimensionality
Shi, Chengchun; Song, Rui; Lu, Wenbin
2016-01-01
In order to identify important variables that are involved in making optimal treatment decision, Lu, Zhang and Zeng (2013) proposed a penalized least squared regression framework for a fixed number of predictors, which is robust against the misspecification of the conditional mean model. Two problems arise: (i) in a world of explosively big data, effective methods are needed to handle ultra-high dimensional data set, for example, with the dimension of predictors is of the non-polynomial (NP) order of the sample size; (ii) both the propensity score and conditional mean models need to be estimated from data under NP dimensionality. In this paper, we propose a robust procedure for estimating the optimal treatment regime under NP dimensionality. In both steps, penalized regressions are employed with the non-concave penalty function, where the conditional mean model of the response given predictors may be misspecified. The asymptotic properties, such as weak oracle properties, selection consistency and oracle distributions, of the proposed estimators are investigated. In addition, we study the limiting distribution of the estimated value function for the obtained optimal treatment regime. The empirical performance of the proposed estimation method is evaluated by simulations and an application to a depression dataset from the STAR*D study. PMID:28781717
Li, Zhaoying; Zhou, Wenjie; Liu, Hao
2016-09-01
This paper addresses the nonlinear robust tracking controller design problem for hypersonic vehicles. This problem is challenging due to strong coupling between the aerodynamics and the propulsion system, and the uncertainties involved in the vehicle dynamics including parametric uncertainties, unmodeled model uncertainties, and external disturbances. By utilizing the feedback linearization technique, a linear tracking error system is established with prescribed references. For the linear model, a robust controller is proposed based on the signal compensation theory to guarantee that the tracking error dynamics is robustly stable. Numerical simulation results are given to show the advantages of the proposed nonlinear robust control method, compared to the robust loop-shaping control approach. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Sutrisno, Agung; Gunawan, Indra; Vanany, Iwan
2017-11-01
In spite of being integral part in risk - based quality improvement effort, studies improving quality of selection of corrective action priority using FMEA technique are still limited in literature. If any, none is considering robustness and risk in selecting competing improvement initiatives. This study proposed a theoretical model to select risk - based competing corrective action by considering robustness and risk of competing corrective actions. We incorporated the principle of robust design in counting the preference score among corrective action candidates. Along with considering cost and benefit of competing corrective actions, we also incorporate the risk and robustness of corrective actions. An example is provided to represent the applicability of the proposed model.
Robust detrending, rereferencing, outlier detection, and inpainting for multichannel data.
de Cheveigné, Alain; Arzounian, Dorothée
2018-05-15
Electroencephalography (EEG), magnetoencephalography (MEG) and related techniques are prone to glitches, slow drift, steps, etc., that contaminate the data and interfere with the analysis and interpretation. These artifacts are usually addressed in a preprocessing phase that attempts to remove them or minimize their impact. This paper offers a set of useful techniques for this purpose: robust detrending, robust rereferencing, outlier detection, data interpolation (inpainting), step removal, and filter ringing artifact removal. These techniques provide a less wasteful alternative to discarding corrupted trials or channels, and they are relatively immune to artifacts that disrupt alternative approaches such as filtering. Robust detrending allows slow drifts and common mode signals to be factored out while avoiding the deleterious effects of glitches. Robust rereferencing reduces the impact of artifacts on the reference. Inpainting allows corrupt data to be interpolated from intact parts based on the correlation structure estimated over the intact parts. Outlier detection allows the corrupt parts to be identified. Step removal fixes the high-amplitude flux jump artifacts that are common with some MEG systems. Ringing removal allows the ringing response of the antialiasing filter to glitches (steps, pulses) to be suppressed. The performance of the methods is illustrated and evaluated using synthetic data and data from real EEG and MEG systems. These methods, which are mainly automatic and require little tuning, can greatly improve the quality of the data. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, Weiqiang; Chen, Rujun; Cai, Hongzhu; Luo, Weibin
2016-12-01
In this paper, we investigated the robust processing of noisy spread spectrum induced polarization (SSIP) data. SSIP is a new frequency domain induced polarization method that transmits pseudo-random m-sequence as source current where m-sequence is a broadband signal. The potential information at multiple frequencies can be obtained through measurement. Removing the noise is a crucial problem for SSIP data processing. Considering that if the ordinary mean stack and digital filter are not capable of reducing the impulse noise effectively in SSIP data processing, the impact of impulse noise will remain in the complex resistivity spectrum that will affect the interpretation of profile anomalies. We implemented a robust statistical method to SSIP data processing. The robust least-squares regression is used to fit and remove the linear trend from the original data before stacking. The robust M estimate is used to stack the data of all periods. The robust smooth filter is used to suppress the residual noise for data after stacking. For robust statistical scheme, the most appropriate influence function and iterative algorithm are chosen by testing the simulated data to suppress the outliers' influence. We tested the benefits of the robust SSIP data processing using examples of SSIP data recorded in a test site beside a mine in Gansu province, China.
Hierarchical modeling and robust synthesis for the preliminary design of large scale complex systems
NASA Astrophysics Data System (ADS)
Koch, Patrick Nathan
Large-scale complex systems are characterized by multiple interacting subsystems and the analysis of multiple disciplines. The design and development of such systems inevitably requires the resolution of multiple conflicting objectives. The size of complex systems, however, prohibits the development of comprehensive system models, and thus these systems must be partitioned into their constituent parts. Because simultaneous solution of individual subsystem models is often not manageable iteration is inevitable and often excessive. In this dissertation these issues are addressed through the development of a method for hierarchical robust preliminary design exploration to facilitate concurrent system and subsystem design exploration, for the concurrent generation of robust system and subsystem specifications for the preliminary design of multi-level, multi-objective, large-scale complex systems. This method is developed through the integration and expansion of current design techniques: (1) Hierarchical partitioning and modeling techniques for partitioning large-scale complex systems into more tractable parts, and allowing integration of subproblems for system synthesis, (2) Statistical experimentation and approximation techniques for increasing both the efficiency and the comprehensiveness of preliminary design exploration, and (3) Noise modeling techniques for implementing robust preliminary design when approximate models are employed. The method developed and associated approaches are illustrated through their application to the preliminary design of a commercial turbofan turbine propulsion system; the turbofan system-level problem is partitioned into engine cycle and configuration design and a compressor module is integrated for more detailed subsystem-level design exploration, improving system evaluation.
NASA Astrophysics Data System (ADS)
Wu, Cheng; Zhen Yu, Jian
2018-03-01
Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.
Robust control for a biaxial servo with time delay system based on adaptive tuning technique.
Chen, Tien-Chi; Yu, Chih-Hsien
2009-07-01
A robust control method for synchronizing a biaxial servo system motion is proposed in this paper. A new network based cross-coupled control and adaptive tuning techniques are used together to cancel out the skew error. The conventional fixed gain PID cross-coupled controller (CCC) is replaced with the adaptive cross-coupled controller (ACCC) in the proposed control scheme to maintain biaxial servo system synchronization motion. Adaptive-tuning PID (APID) position and velocity controllers provide the necessary control actions to maintain synchronization while following a variable command trajectory. A delay-time compensator (DTC) with an adaptive controller was augmented to set the time delay element, effectively moving it outside the closed loop, enhancing the stability of the robust controlled system. This scheme provides strong robustness with respect to uncertain dynamics and disturbances. The simulation and experimental results reveal that the proposed control structure adapts to a wide range of operating conditions and provides promising results under parameter variations and load changes.
Srinivasa, Narayan; Zhang, Deying; Grigorian, Beayna
2014-03-01
This paper describes a novel architecture for enabling robust and efficient neuromorphic communication. The architecture combines two concepts: 1) synaptic time multiplexing (STM) that trades space for speed of processing to create an intragroup communication approach that is firing rate independent and offers more flexibility in connectivity than cross-bar architectures and 2) a wired multiple input multiple output (MIMO) communication with orthogonal frequency division multiplexing (OFDM) techniques to enable a robust and efficient intergroup communication for neuromorphic systems. The MIMO-OFDM concept for the proposed architecture was analyzed by simulating large-scale spiking neural network architecture. Analysis shows that the neuromorphic system with MIMO-OFDM exhibits robust and efficient communication while operating in real time with a high bit rate. Through combining STM with MIMO-OFDM techniques, the resulting system offers a flexible and scalable connectivity as well as a power and area efficient solution for the implementation of very large-scale spiking neural architectures in hardware.
NASA Astrophysics Data System (ADS)
Sumantari, Y. D.; Slamet, I.; Sugiyanto
2017-06-01
Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.
Robust scoring functions for protein-ligand interactions with quantum chemical charge models.
Wang, Jui-Chih; Lin, Jung-Hsin; Chen, Chung-Ming; Perryman, Alex L; Olson, Arthur J
2011-10-24
Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).
Sparse alignment for robust tensor learning.
Lai, Zhihui; Wong, Wai Keung; Xu, Yong; Zhao, Cairong; Sun, Mingming
2014-10-01
Multilinear/tensor extensions of manifold learning based algorithms have been widely used in computer vision and pattern recognition. This paper first provides a systematic analysis of the multilinear extensions for the most popular methods by using alignment techniques, thereby obtaining a general tensor alignment framework. From this framework, it is easy to show that the manifold learning based tensor learning methods are intrinsically different from the alignment techniques. Based on the alignment framework, a robust tensor learning method called sparse tensor alignment (STA) is then proposed for unsupervised tensor feature extraction. Different from the existing tensor learning methods, L1- and L2-norms are introduced to enhance the robustness in the alignment step of the STA. The advantage of the proposed technique is that the difficulty in selecting the size of the local neighborhood can be avoided in the manifold learning based tensor feature extraction algorithms. Although STA is an unsupervised learning method, the sparsity encodes the discriminative information in the alignment step and provides the robustness of STA. Extensive experiments on the well-known image databases as well as action and hand gesture databases by encoding object images as tensors demonstrate that the proposed STA algorithm gives the most competitive performance when compared with the tensor-based unsupervised learning methods.
Control algorithms for aerobraking in the Martian atmosphere
NASA Technical Reports Server (NTRS)
Ward, Donald T.; Shipley, Buford W., Jr.
1991-01-01
The Analytic Predictor Corrector (APC) and Energy Controller (EC) atmospheric guidance concepts were adapted to control an interplanetary vehicle aerobraking in the Martian atmosphere. Changes are made to the APC to improve its robustness to density variations. These changes include adaptation of a new exit phase algorithm, an adaptive transition velocity to initiate the exit phase, refinement of the reference dynamic pressure calculation and two improved density estimation techniques. The modified controller with the hybrid density estimation technique is called the Mars Hybrid Predictor Corrector (MHPC), while the modified controller with a polynomial density estimator is called the Mars Predictor Corrector (MPC). A Lyapunov Steepest Descent Controller (LSDC) is adapted to control the vehicle. The LSDC lacked robustness, so a Lyapunov tracking exit phase algorithm is developed to guide the vehicle along a reference trajectory. This algorithm, when using the hybrid density estimation technique to define the reference path, is called the Lyapunov Hybrid Tracking Controller (LHTC). With the polynomial density estimator used to define the reference trajectory, the algorithm is called the Lyapunov Tracking Controller (LTC). These four new controllers are tested using a six degree of freedom computer simulation to evaluate their robustness. The MHPC, MPC, LHTC, and LTC show dramatic improvements in robustness over the APC and EC.
Robust Stability Analysis of the Space Launch System Control Design: A Singular Value Approach
NASA Technical Reports Server (NTRS)
Pei, Jing; Newsome, Jerry R.
2015-01-01
Classical stability analysis consists of breaking the feedback loops one at a time and determining separately how much gain or phase variations would destabilize the stable nominal feedback system. For typical launch vehicle control design, classical control techniques are generally employed. In addition to stability margins, frequency domain Monte Carlo methods are used to evaluate the robustness of the design. However, such techniques were developed for Single-Input-Single-Output (SISO) systems and do not take into consideration the off-diagonal terms in the transfer function matrix of Multi-Input-Multi-Output (MIMO) systems. Robust stability analysis techniques such as H(sub infinity) and mu are applicable to MIMO systems but have not been adopted as standard practices within the launch vehicle controls community. This paper took advantage of a simple singular-value-based MIMO stability margin evaluation method based on work done by Mukhopadhyay and Newsom and applied it to the SLS high-fidelity dynamics model. The method computes a simultaneous multi-loop gain and phase margin that could be related back to classical margins. The results presented in this paper suggest that for the SLS system, traditional SISO stability margins are similar to the MIMO margins. This additional level of verification provides confidence in the robustness of the control design.
Bhanot, Gyan; Alexe, Gabriela; Levine, Arnold J; Stolovitzky, Gustavo
2005-01-01
A major challenge in cancer diagnosis from microarray data is the need for robust, accurate, classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose such a classification scheme originally developed for phenotype identification from mass spectrometry data. The method uses a robust multivariate gene selection procedure and combines the results of several machine learning tools trained on raw and pattern data to produce an accurate meta-classifier. We illustrate and validate our method by applying it to gene expression datasets: the oligonucleotide HuGeneFL microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our pattern-based meta-classification technique achieves higher predictive accuracies than each of the individual classifiers , is robust against data perturbations and provides subsets of related predictive genes. Our techniques predict that combinations of some genes in the p53 pathway are highly predictive of phenotype. In particular, we find that in 80% of DLBCL cases the mRNA level of at least one of the three genes p53, PLK1 and CDK2 is elevated, while in 80% of FL cases, the mRNA level of at most one of them is elevated.
Kim, Jongpal; Kim, Jihoon; Ko, Hyoungho
2015-12-31
To overcome light interference, including a large DC offset and ambient light variation, a robust photoplethysmogram (PPG) readout chip is fabricated using a 0.13-μm complementary metal-oxide-semiconductor (CMOS) process. Against the large DC offset, a saturation detection and current feedback circuit is proposed to compensate for an offset current of up to 30 μA. For robustness against optical path variation, an automatic emitted light compensation method is adopted. To prevent ambient light interference, an alternating sampling and charge redistribution technique is also proposed. In the proposed technique, no additional power is consumed, and only three differential switches and one capacitor are required. The PPG readout channel consumes 26.4 μW and has an input referred current noise of 260 pArms.
Kim, Jongpal; Kim, Jihoon; Ko, Hyoungho
2015-01-01
To overcome light interference, including a large DC offset and ambient light variation, a robust photoplethysmogram (PPG) readout chip is fabricated using a 0.13-μm complementary metal–oxide–semiconductor (CMOS) process. Against the large DC offset, a saturation detection and current feedback circuit is proposed to compensate for an offset current of up to 30 μA. For robustness against optical path variation, an automatic emitted light compensation method is adopted. To prevent ambient light interference, an alternating sampling and charge redistribution technique is also proposed. In the proposed technique, no additional power is consumed, and only three differential switches and one capacitor are required. The PPG readout channel consumes 26.4 μW and has an input referred current noise of 260 pArms. PMID:26729122
Impact of multicollinearity on small sample hydrologic regression models
NASA Astrophysics Data System (ADS)
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Gibson, Scott M; Ficklin, Stephen P; Isaacson, Sven; Luo, Feng; Feltus, Frank A; Smith, Melissa C
2013-01-01
The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust.
Xu, Yuan; Ding, Kun; Huo, Chunlei; Zhong, Zisha; Li, Haichang; Pan, Chunhong
2015-01-01
Very high resolution (VHR) image change detection is challenging due to the low discriminative ability of change feature and the difficulty of change decision in utilizing the multilevel contextual information. Most change feature extraction techniques put emphasis on the change degree description (i.e., in what degree the changes have happened), while they ignore the change pattern description (i.e., how the changes changed), which is of equal importance in characterizing the change signatures. Moreover, the simultaneous consideration of the classification robust to the registration noise and the multiscale region-consistent fusion is often neglected in change decision. To overcome such drawbacks, in this paper, a novel VHR image change detection method is proposed based on sparse change descriptor and robust discriminative dictionary learning. Sparse change descriptor combines the change degree component and the change pattern component, which are encoded by the sparse representation error and the morphological profile feature, respectively. Robust change decision is conducted by multiscale region-consistent fusion, which is implemented by the superpixel-level cosparse representation with robust discriminative dictionary and the conditional random field model. Experimental results confirm the effectiveness of the proposed change detection technique. PMID:25918748
O'Connell, Allan F.; Gardner, Beth; Oppel, Steffen; Meirinho, Ana; Ramírez, Iván; Miller, Peter I.; Louzao, Maite
2012-01-01
Knowledge about the spatial distribution of seabirds at sea is important for conservation. During marine conservation planning, logistical constraints preclude seabird surveys covering the complete area of interest and spatial distribution of seabirds is frequently inferred from predictive statistical models. Increasingly complex models are available to relate the distribution and abundance of pelagic seabirds to environmental variables, but a comparison of their usefulness for delineating protected areas for seabirds is lacking. Here we compare the performance of five modelling techniques (generalised linear models, generalised additive models, Random Forest, boosted regression trees, and maximum entropy) to predict the distribution of Balearic Shearwaters (Puffinus mauretanicus) along the coast of the western Iberian Peninsula. We used ship transect data from 2004 to 2009 and 13 environmental variables to predict occurrence and density, and evaluated predictive performance of all models using spatially segregated test data. Predicted distribution varied among the different models, although predictive performance varied little. An ensemble prediction that combined results from all five techniques was robust and confirmed the existence of marine important bird areas for Balearic Shearwaters in Portugal and Spain. Our predictions suggested additional areas that would be of high priority for conservation and could be proposed as protected areas. Abundance data were extremely difficult to predict, and none of five modelling techniques provided a reliable prediction of spatial patterns. We advocate the use of ensemble modelling that combines the output of several methods to predict the spatial distribution of seabirds, and use these predictions to target separate surveys assessing the abundance of seabirds in areas of regular use.
Huang, Lei
2015-01-01
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409
NASA Astrophysics Data System (ADS)
de Andrés, Javier; Landajo, Manuel; Lorca, Pedro; Labra, Jose; Ordóñez, Patricia
Artificial neural networks have proven to be useful tools for solving financial analysis problems such as financial distress prediction and audit risk assessment. In this paper we focus on the performance of robust (least absolute deviation-based) neural networks on measuring liquidity of firms. The problem of learning the bivariate relationship between the components (namely, current liabilities and current assets) of the so-called current ratio is analyzed, and the predictive performance of several modelling paradigms (namely, linear and log-linear regressions, classical ratios and neural networks) is compared. An empirical analysis is conducted on a representative data base from the Spanish economy. Results indicate that classical ratio models are largely inadequate as a realistic description of the studied relationship, especially when used for predictive purposes. In a number of cases, especially when the analyzed firms are microenterprises, the linear specification is improved by considering the flexible non-linear structures provided by neural networks.
Recovering Galaxy Properties Using Gaussian Process SED Fitting
NASA Astrophysics Data System (ADS)
Iyer, Kartheik; Awan, Humna
2018-01-01
Information about physical quantities like the stellar mass, star formation rates, and ages for distant galaxies is contained in their spectral energy distributions (SEDs), obtained through photometric surveys like SDSS, CANDELS, LSST etc. However, noise in the photometric observations often is a problem, and using naive machine learning methods to estimate physical quantities can result in overfitting the noise, or converging on solutions that lie outside the physical regime of parameter space.We use Gaussian Process regression trained on a sample of SEDs corresponding to galaxies from a Semi-Analytic model (Somerville+15a) to estimate their stellar masses, and compare its performance to a variety of different methods, including simple linear regression, Random Forests, and k-Nearest Neighbours. We find that the Gaussian Process method is robust to noise and predicts not only stellar masses but also their uncertainties. The method is also robust in the cases where the distribution of the training data is not identical to the target data, which can be extremely useful when generalized to more subtle galaxy properties.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Warren, Samantha, E-mail: samantha.warren@oncology.ox.ac.uk; Partridge, Mike; Bolsi, Alessandra
Purpose: Planning studies to compare x-ray and proton techniques and to select the most suitable technique for each patient have been hampered by the nonequivalence of several aspects of treatment planning and delivery. A fair comparison should compare similarly advanced delivery techniques from current clinical practice and also assess the robustness of each technique. The present study therefore compared volumetric modulated arc therapy (VMAT) and single-field optimization (SFO) spot scanning proton therapy plans created using a simultaneous integrated boost (SIB) for dose escalation in midesophageal cancer and analyzed the effect of setup and range uncertainties on these plans. Methods andmore » Materials: For 21 patients, SIB plans with a physical dose prescription of 2 Gy or 2.5 Gy/fraction in 25 fractions to planning target volume (PTV){sub 50Gy} or PTV{sub 62.5Gy} (primary tumor with 0.5 cm margins) were created and evaluated for robustness to random setup errors and proton range errors. Dose–volume metrics were compared for the optimal and uncertainty plans, with P<.05 (Wilcoxon) considered significant. Results: SFO reduced the mean lung dose by 51.4% (range 35.1%-76.1%) and the mean heart dose by 40.9% (range 15.0%-57.4%) compared with VMAT. Proton plan robustness to a 3.5% range error was acceptable. For all patients, the clinical target volume D{sub 98} was 95.0% to 100.4% of the prescribed dose and gross tumor volume (GTV) D{sub 98} was 98.8% to 101%. Setup error robustness was patient anatomy dependent, and the potential minimum dose per fraction was always lower with SFO than with VMAT. The clinical target volume D{sub 98} was lower by 0.6% to 7.8% of the prescribed dose, and the GTV D{sub 98} was lower by 0.3% to 2.2% of the prescribed GTV dose. Conclusions: The SFO plans achieved significant sparing of normal tissue compared with the VMAT plans for midesophageal cancer. The target dose coverage in the SIB proton plans was less robust to random setup errors and might be unacceptable for certain patients. Robust optimization to ensure adequate target coverage of SIB proton plans might be beneficial.« less
Warren, Samantha; Partridge, Mike; Bolsi, Alessandra; Lomax, Anthony J.; Hurt, Chris; Crosby, Thomas; Hawkins, Maria A.
2016-01-01
Purpose Planning studies to compare x-ray and proton techniques and to select the most suitable technique for each patient have been hampered by the nonequivalence of several aspects of treatment planning and delivery. A fair comparison should compare similarly advanced delivery techniques from current clinical practice and also assess the robustness of each technique. The present study therefore compared volumetric modulated arc therapy (VMAT) and single-field optimization (SFO) spot scanning proton therapy plans created using a simultaneous integrated boost (SIB) for dose escalation in midesophageal cancer and analyzed the effect of setup and range uncertainties on these plans. Methods and Materials For 21 patients, SIB plans with a physical dose prescription of 2 Gy or 2.5 Gy/fraction in 25 fractions to planning target volume (PTV)50Gy or PTV62.5Gy (primary tumor with 0.5 cm margins) were created and evaluated for robustness to random setup errors and proton range errors. Dose–volume metrics were compared for the optimal and uncertainty plans, with P<.05 (Wilcoxon) considered significant. Results SFO reduced the mean lung dose by 51.4% (range 35.1%-76.1%) and the mean heart dose by 40.9% (range 15.0%-57.4%) compared with VMAT. Proton plan robustness to a 3.5% range error was acceptable. For all patients, the clinical target volume D98 was 95.0% to 100.4% of the prescribed dose and gross tumor volume (GTV) D98 was 98.8% to 101%. Setup error robustness was patient anatomy dependent, and the potential minimum dose per fraction was always lower with SFO than with VMAT. The clinical target volume D98 was lower by 0.6% to 7.8% of the prescribed dose, and the GTV D98 was lower by 0.3% to 2.2% of the prescribed GTV dose. Conclusions The SFO plans achieved significant sparing of normal tissue compared with the VMAT plans for midesophageal cancer. The target dose coverage in the SIB proton plans was less robust to random setup errors and might be unacceptable for certain patients. Robust optimization to ensure adequate target coverage of SIB proton plans might be beneficial. PMID:27084641
Warren, Samantha; Partridge, Mike; Bolsi, Alessandra; Lomax, Anthony J; Hurt, Chris; Crosby, Thomas; Hawkins, Maria A
2016-05-01
Planning studies to compare x-ray and proton techniques and to select the most suitable technique for each patient have been hampered by the nonequivalence of several aspects of treatment planning and delivery. A fair comparison should compare similarly advanced delivery techniques from current clinical practice and also assess the robustness of each technique. The present study therefore compared volumetric modulated arc therapy (VMAT) and single-field optimization (SFO) spot scanning proton therapy plans created using a simultaneous integrated boost (SIB) for dose escalation in midesophageal cancer and analyzed the effect of setup and range uncertainties on these plans. For 21 patients, SIB plans with a physical dose prescription of 2 Gy or 2.5 Gy/fraction in 25 fractions to planning target volume (PTV)50Gy or PTV62.5Gy (primary tumor with 0.5 cm margins) were created and evaluated for robustness to random setup errors and proton range errors. Dose-volume metrics were compared for the optimal and uncertainty plans, with P<.05 (Wilcoxon) considered significant. SFO reduced the mean lung dose by 51.4% (range 35.1%-76.1%) and the mean heart dose by 40.9% (range 15.0%-57.4%) compared with VMAT. Proton plan robustness to a 3.5% range error was acceptable. For all patients, the clinical target volume D98 was 95.0% to 100.4% of the prescribed dose and gross tumor volume (GTV) D98 was 98.8% to 101%. Setup error robustness was patient anatomy dependent, and the potential minimum dose per fraction was always lower with SFO than with VMAT. The clinical target volume D98 was lower by 0.6% to 7.8% of the prescribed dose, and the GTV D98 was lower by 0.3% to 2.2% of the prescribed GTV dose. The SFO plans achieved significant sparing of normal tissue compared with the VMAT plans for midesophageal cancer. The target dose coverage in the SIB proton plans was less robust to random setup errors and might be unacceptable for certain patients. Robust optimization to ensure adequate target coverage of SIB proton plans might be beneficial. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Real-Time PCR Quantification Using A Variable Reaction Efficiency Model
Platts, Adrian E.; Johnson, Graham D.; Linnemann, Amelia K.; Krawetz, Stephen A.
2008-01-01
Quantitative real-time PCR remains a cornerstone technique in gene expression analysis and sequence characterization. Despite the importance of the approach to experimental biology the confident assignment of reaction efficiency to the early cycles of real-time PCR reactions remains problematic. Considerable noise may be generated where few cycles in the amplification are available to estimate peak efficiency. An alternate approach that uses data from beyond the log-linear amplification phase is explored with the aim of reducing noise and adding confidence to efficiency estimates. PCR reaction efficiency is regressed to estimate the per-cycle profile of an asymptotically departed peak efficiency, even when this is not closely approximated in the measurable cycles. The process can be repeated over replicates to develop a robust estimate of peak reaction efficiency. This leads to an estimate of the maximum reaction efficiency that may be considered primer-design specific. Using a series of biological scenarios we demonstrate that this approach can provide an accurate estimate of initial template concentration. PMID:18570886
An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.
Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E; Dowell, Robin D
2017-01-01
We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.
2015-06-04
that involve physics coupling with phase change in the simulation of 3D deep convection. We show that the VMS+DC approach is a robust technique that can...of 3D deep convection. We show that the VMS+DC approach is a robust technique that can damp the high order modes characterizing the spectral element...of Spectral Elements, Deep Convection, Kessler Microphysics Preprint J. Comput. Phys. 283 (2015) 360-373 June 4, 2015 1. Introduction In the field of
Design, innovation, and rural creative places: Are the arts the cherry on top, or the secret sauce?
Wojan, Timothy R; Nichols, Bonnie
2018-01-01
Creative class theory explains the positive relationship between the arts and commercial innovation as the mutual attraction of artists and other creative workers by an unobserved creative milieu. This study explores alternative theories for rural settings, by analyzing establishment-level survey data combined with data on the local arts scene. The study identifies the local contextual factors associated with a strong design orientation, and estimates the impact that a strong design orientation has on the local economy. Data on innovation and design come from a nationally representative sample of establishments in tradable industries. Latent class analysis allows identifying unobserved subpopulations comprised of establishments with different design and innovation orientations. Logistic regression allows estimating the association between an establishment's design orientation and local contextual factors. A quantile instrumental variable regression allows assessing the robustness of the logistic regression results with respect to endogeneity. An estimate of design orientation at the local level derived from the survey is used to examine variation in economic performance during the period of recovery from the Great Recession (2010-2014). Three distinct innovation (substantive, nominal, and non-innovators) and design orientations (design-integrated, "design last finish," and no systematic approach to design) are identified. Innovation- and design-intensive establishments were identified in both rural and urban areas. Rural design-integrated establishments tended to locate in counties with more highly educated workforces and containing at least one performing arts organization. A quantile instrumental variable regression confirmed that the logistic regression result is robust to endogeneity concerns. Finally, rural areas characterized by design-integrated establishments experienced faster growth in wages relative to rural areas characterized by establishments using no systematic approach to design.
NASA Astrophysics Data System (ADS)
Verrelst, Jochem; Rivera, J. P.; Alonso, L.; Guanter, L.; Moreno, J.
2012-04-01
ESA’s upcoming satellites Sentinel-2 (S2) and Sentinel-3 (S3) aim to ensure continuity for Landsat 5/7, SPOT- 5, SPOT-Vegetation and Envisat MERIS observations by providing superspectral images of high spatial and temporal resolution. S2 and S3 will deliver near real-time operational products with a high accuracy for land monitoring. This unprecedented data availability leads to an urgent need for developing robust and accurate retrieval methods. Machine learning regression algorithms could be powerful candidates for the estimation of biophysical parameters from satellite reflectance measurements because of their ability to perform adaptive, nonlinear data fitting. By using data from the ESA-led field campaign SPARC (Barrax, Spain), it was recently found [1] that Gaussian processes regression (GPR) outperformed competitive machine learning algorithms such as neural networks, support vector regression) and kernel ridge regression both in terms of accuracy and computational speed. For various Sentinel configurations (S2-10m, S2- 20m, S2-60m and S3-300m) three important biophysical parameters were estimated: leaf chlorophyll content (Chl), leaf area index (LAI) and fractional vegetation cover (FVC). GPR was the only method that reached the 10% precision required by end users in the estimation of Chl. In view of implementing the regressor into operational monitoring applications, here the portability of locally trained GPR models to other images was evaluated. The associated confidence maps proved to be a good indicator for evaluating the robustness of the trained models. Consistent retrievals were obtained across the different images, particularly over agricultural sites. To make the method suitable for operational use, however, the poorer confidences over bare soil areas suggest that the training dataset should be expanded with inputs from various land cover types.
Design, innovation, and rural creative places: Are the arts the cherry on top, or the secret sauce?
Nichols, Bonnie
2018-01-01
Objective Creative class theory explains the positive relationship between the arts and commercial innovation as the mutual attraction of artists and other creative workers by an unobserved creative milieu. This study explores alternative theories for rural settings, by analyzing establishment-level survey data combined with data on the local arts scene. The study identifies the local contextual factors associated with a strong design orientation, and estimates the impact that a strong design orientation has on the local economy. Method Data on innovation and design come from a nationally representative sample of establishments in tradable industries. Latent class analysis allows identifying unobserved subpopulations comprised of establishments with different design and innovation orientations. Logistic regression allows estimating the association between an establishment’s design orientation and local contextual factors. A quantile instrumental variable regression allows assessing the robustness of the logistic regression results with respect to endogeneity. An estimate of design orientation at the local level derived from the survey is used to examine variation in economic performance during the period of recovery from the Great Recession (2010–2014). Results Three distinct innovation (substantive, nominal, and non-innovators) and design orientations (design-integrated, “design last finish,” and no systematic approach to design) are identified. Innovation- and design-intensive establishments were identified in both rural and urban areas. Rural design-integrated establishments tended to locate in counties with more highly educated workforces and containing at least one performing arts organization. A quantile instrumental variable regression confirmed that the logistic regression result is robust to endogeneity concerns. Finally, rural areas characterized by design-integrated establishments experienced faster growth in wages relative to rural areas characterized by establishments using no systematic approach to design. PMID:29489884
Model reference tracking control of an aircraft: a robust adaptive approach
NASA Astrophysics Data System (ADS)
Tanyer, Ilker; Tatlicioglu, Enver; Zergeroglu, Erkan
2017-05-01
This work presents the design and the corresponding analysis of a nonlinear robust adaptive controller for model reference tracking of an aircraft that has parametric uncertainties in its system matrices and additive state- and/or time-dependent nonlinear disturbance-like terms in its dynamics. Specifically, robust integral of the sign of the error feedback term and an adaptive term is fused with a proportional integral controller. Lyapunov-based stability analysis techniques are utilised to prove global asymptotic convergence of the output tracking error. Extensive numerical simulations are presented to illustrate the performance of the proposed robust adaptive controller.
Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729
Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.
Risk prediction for myocardial infarction via generalized functional regression models.
Ieva, Francesca; Paganoni, Anna M
2016-08-01
In this paper, we propose a generalized functional linear regression model for a binary outcome indicating the presence/absence of a cardiac disease with multivariate functional data among the relevant predictors. In particular, the motivating aim is the analysis of electrocardiographic traces of patients whose pre-hospital electrocardiogram (ECG) has been sent to 118 Dispatch Center of Milan (the Italian free-toll number for emergencies) by life support personnel of the basic rescue units. The statistical analysis starts with a preprocessing of ECGs treated as multivariate functional data. The signals are reconstructed from noisy observations. The biological variability is then removed by a nonlinear registration procedure based on landmarks. Thus, in order to perform a data-driven dimensional reduction, a multivariate functional principal component analysis is carried out on the variance-covariance matrix of the reconstructed and registered ECGs and their first derivatives. We use the scores of the Principal Components decomposition as covariates in a generalized linear model to predict the presence of the disease in a new patient. Hence, a new semi-automatic diagnostic procedure is proposed to estimate the risk of infarction (in the case of interest, the probability of being affected by Left Bundle Brunch Block). The performance of this classification method is evaluated and compared with other methods proposed in literature. Finally, the robustness of the procedure is checked via leave-j-out techniques. © The Author(s) 2013.
Ono, Daiki; Bamba, Takeshi; Oku, Yuichi; Yonetani, Tsutomu; Fukusaki, Eiichiro
2011-09-01
In this study, we constructed prediction models by metabolic fingerprinting of fresh green tea leaves using Fourier transform near-infrared (FT-NIR) spectroscopy and partial least squares (PLS) regression analysis to objectively optimize of the steaming process conditions in green tea manufacture. The steaming process is the most important step for manufacturing high quality green tea products. However, the parameter setting of the steamer is currently determined subjectively by the manufacturer. Therefore, a simple and robust system that can be used to objectively set the steaming process parameters is necessary. We focused on FT-NIR spectroscopy because of its simple operation, quick measurement, and low running costs. After removal of noise in the spectral data by principal component analysis (PCA), PLS regression analysis was performed using spectral information as independent variables, and the steaming parameters set by experienced manufacturers as dependent variables. The prediction models were successfully constructed with satisfactory accuracy. Moreover, the results of the demonstrated experiment suggested that the green tea steaming process parameters could be predicted on a larger manufacturing scale. This technique will contribute to improvement of the quality and productivity of green tea because it can objectively optimize the complicated green tea steaming process and will be suitable for practical use in green tea manufacture. Copyright © 2011 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Xiao, Fan; Chen, Zhijun; Chen, Jianguo; Zhou, Yongzhang
2016-05-01
In this study, a novel batch sliding window (BSW) based singularity mapping approach was proposed. Compared to the traditional sliding window (SW) technique with disadvantages of the empirical predetermination of a fixed maximum window size and outliers sensitivity of least-squares (LS) linear regression method, the BSW based singularity mapping approach can automatically determine the optimal size of the largest window for each estimated position, and utilizes robust linear regression (RLR) which is insensitive to outlier values. In the case study, tin geochemical data in Gejiu, Yunnan, have been processed by BSW based singularity mapping approach. The results show that the BSW approach can improve the accuracy of the calculation of singularity exponent values due to the determination of the optimal maximum window size. The utilization of RLR method in the BSW approach can smoothen the distribution of singularity index values with few or even without much high fluctuate values looking like noise points that usually make a singularity map much roughly and discontinuously. Furthermore, the student's t-statistic diagram indicates a strong spatial correlation between high geochemical anomaly and known tin polymetallic deposits. The target areas within high tin geochemical anomaly could probably have much higher potential for the exploration of new tin polymetallic deposits than other areas, particularly for the areas that show strong tin geochemical anomalies whereas no tin polymetallic deposits have been found in them.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tatiana G. Levitskaia; James M. Peterson; Emily L. Campbell
2013-12-01
In liquid–liquid extraction separation processes, accumulation of organic solvent degradation products is detrimental to the process robustness, and frequent solvent analysis is warranted. Our research explores the feasibility of online monitoring of the organic solvents relevant to used nuclear fuel reprocessing. This paper describes the first phase of developing a system for monitoring the tributyl phosphate (TBP)/n-dodecane solvent commonly used to separate used nuclear fuel. In this investigation, the effect of extraction of nitric acid from aqueous solutions of variable concentrations on the quantification of TBP and its major degradation product dibutylphosphoric acid (HDBP) was assessed. Fourier transform infrared (FTIR)more » spectroscopy was used to discriminate between HDBP and TBP in the nitric acid-containing TBP/n-dodecane solvent. Multivariate analysis of the spectral data facilitated the development of regression models for HDBP and TBP quantification in real time, enabling online implementation of the monitoring system. The predictive regression models were validated using TBP/n-dodecane solvent samples subjected to high-dose external ?-irradiation. The predictive models were translated to flow conditions using a hollow fiber FTIR probe installed in a centrifugal contactor extraction apparatus, demonstrating the applicability of the FTIR technique coupled with multivariate analysis for the online monitoring of the organic solvent degradation products.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levitskaia, Tatiana G.; Peterson, James M.; Campbell, Emily L.
2013-11-05
In liquid-liquid extraction separation processes, accumulation of organic solvent degradation products is detrimental to the process robustness and frequent solvent analysis is warranted. Our research explores feasibility of online monitoring of the organic solvents relevant to used nuclear fuel reprocessing. This paper describes the first phase of developing a system for monitoring the tributyl phosphate (TBP)/n-dodecane solvent commonly used to separate used nuclear fuel. In this investigation, the effect of extraction of nitric acid from aqueous solutions of variable concentrations on the quantification of TBP and its major degradation product dibutyl phosphoric acid (HDBP) was assessed. Fourier Transform Infrared Spectroscopymore » (FTIR) spectroscopy was used to discriminate between HDBP and TBP in the nitric acid-containing TBP/n-dodecane solvent. Multivariate analysis of the spectral data facilitated the development of regression models for HDBP and TBP quantification in real time, enabling online implementation of the monitoring system. The predictive regression models were validated using TBP/n-dodecane solvent samples subjected to the high dose external gamma irradiation. The predictive models were translated to flow conditions using a hollow fiber FTIR probe installed in a centrifugal contactor extraction apparatus demonstrating the applicability of the FTIR technique coupled with multivariate analysis for the online monitoring of the organic solvent degradation products.« less
Regression techniques for oceanographic parameter retrieval using space-borne microwave radiometry
NASA Technical Reports Server (NTRS)
Hofer, R.; Njoku, E. G.
1981-01-01
Variations of conventional multiple regression techniques are applied to the problem of remote sensing of oceanographic parameters from space. The techniques are specifically adapted to the scanning multichannel microwave radiometer (SMRR) launched on the Seasat and Nimbus 7 satellites to determine ocean surface temperature, wind speed, and atmospheric water content. The retrievals are studied primarily from a theoretical viewpoint, to illustrate the retrieval error structure, the relative importances of different radiometer channels, and the tradeoffs between spatial resolution and retrieval accuracy. Comparisons between regressions using simulated and actual SMMR data are discussed; they show similar behavior.
Aeroelastic Model Structure Computation for Envelope Expansion
NASA Technical Reports Server (NTRS)
Kukreja, Sunil L.
2007-01-01
Structure detection is a procedure for selecting a subset of candidate terms, from a full model description, that best describes the observed output. This is a necessary procedure to compute an efficient system description which may afford greater insight into the functionality of the system or a simpler controller design. Structure computation as a tool for black-box modelling may be of critical importance in the development of robust, parsimonious models for the flight-test community. Moreover, this approach may lead to efficient strategies for rapid envelope expansion which may save significant development time and costs. In this study, a least absolute shrinkage and selection operator (LASSO) technique is investigated for computing efficient model descriptions of nonlinear aeroelastic systems. The LASSO minimises the residual sum of squares by the addition of an l(sub 1) penalty term on the parameter vector of the traditional 2 minimisation problem. Its use for structure detection is a natural extension of this constrained minimisation approach to pseudolinear regression problems which produces some model parameters that are exactly zero and, therefore, yields a parsimonious system description. Applicability of this technique for model structure computation for the F/A-18 Active Aeroelastic Wing using flight test data is shown for several flight conditions (Mach numbers) by identifying a parsimonious system description with a high percent fit for cross-validated data.
A review on the applications of portable near-infrared spectrometers in the agro-food industry.
dos Santos, Cláudia A Teixeira; Lopo, Miguel; Páscoa, Ricardo N M J; Lopes, João A
2013-11-01
Industry has created the need for a cost-effective and nondestructive quality-control analysis system. This requirement has increased interest in near-infrared (NIR) spectroscopy, leading to the development and marketing of handheld devices that enable new applications that can be implemented in situ. Portable NIR spectrometers are powerful instruments offering several advantages for nondestructive, online, or in situ analysis: small size, low cost, robustness, simplicity of analysis, sample user interface, portability, and ergonomic design. Several studies of on-site NIR applications are presented: characterization of internal and external parameters of fruits and vegetables; conservation state and fat content of meat and fish; distinguishing among and quality evaluation of beverages and dairy products; protein content of cereals; evaluation of grape ripeness in vineyards; and soil analysis. Chemometrics is an essential part of NIR spectroscopy manipulation because wavelength-dependent scattering effects, instrumental noise, ambient effects, and other sources of variability may complicate the spectra. As a consequence, it is difficult to assign specific absorption bands to specific functional groups. To achieve useful and meaningful results, multivariate statistical techniques (essentially involving regression techniques coupled with spectral preprocessing) are therefore required to extract the information hidden in the spectra. This work reviews the evolution of the use of portable near-infrared spectrometers in the agro-food industry.
Pollen-based continental climate reconstructions at 6 and 21 ka: A global synthesis
Bartlein, P.J.; Harrison, S.P.; Brewer, Sandra; Connor, S.; Davis, B.A.S.; Gajewski, K.; Guiot, J.; Harrison-Prentice, T. I.; Henderson, A.; Peyron, O.; Prentice, I.C.; Scholze, M.; Seppa, H.; Shuman, B.; Sugita, S.; Thompson, R.S.; Viau, A.E.; Williams, J.; Wu, H.
2011-01-01
Subfossil pollen and plant macrofossil data derived from 14C-dated sediment profiles can provide quantitative information on glacial and interglacial climates. The data allow climate variables related to growing-season warmth, winter cold, and plant-available moisture to be reconstructed. Continental-scale reconstructions have been made for the mid-Holocene (MH, around 6 ka) and Last Glacial Maximum (LGM, around 21 ka), allowing comparison with palaeoclimate simulations currently being carried out as part of the fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change. The synthesis of the available MH and LGM climate reconstructions and their uncertainties, obtained using modern-analogue, regression and model-inversion techniques, is presented for four temperature variables and two moisture variables. Reconstructions of the same variables based on surface-pollen assemblages are shown to be accurate and unbiased. Reconstructed LGM and MH climate anomaly patterns are coherent, consistent between variables, and robust with respect to the choice of technique. They support a conceptual model of the controls of Late Quaternary climate change whereby the first-order effects of orbital variations and greenhouse forcing on the seasonal cycle of temperature are predictably modified by responses of the atmospheric circulation and surface energy balance. ?? 2010 The Author(s).
Feedback system design with an uncertain plant
NASA Technical Reports Server (NTRS)
Milich, D.; Valavani, L.; Athans, M.
1986-01-01
A method is developed to design a fixed-parameter compensator for a linear, time-invariant, SISO (single-input single-output) plant model characterized by significant structured, as well as unstructured, uncertainty. The controller minimizes the H(infinity) norm of the worst-case sensitivity function over the operating band and the resulting feedback system exhibits robust stability and robust performance. It is conjectured that such a robust nonadaptive control design technique can be used on-line in an adaptive control system.
Research in robust control for hypersonic aircraft
NASA Technical Reports Server (NTRS)
Calise, A. J.
1993-01-01
The research during the second reporting period has focused on robust control design for hypersonic vehicles. An already existing design for the Hypersonic Winged-Cone Configuration has been enhanced. Uncertainty models for the effects of propulsion system perturbations due to angle of attack variations, structural vibrations, and uncertainty in control effectiveness were developed. Using H(sub infinity) and mu-synthesis techniques, various control designs were performed in order to investigate the impact of these effects on achievable robust performance.
Sparse coding for flexible, robust 3D facial-expression synthesis.
Lin, Yuxu; Song, Mingli; Quynh, Dao Thi Phuong; He, Ying; Chen, Chun
2012-01-01
Computer animation researchers have been extensively investigating 3D facial-expression synthesis for decades. However, flexible, robust production of realistic 3D facial expressions is still technically challenging. A proposed modeling framework applies sparse coding to synthesize 3D expressive faces, using specified coefficients or expression examples. It also robustly recovers facial expressions from noisy and incomplete data. This approach can synthesize higher-quality expressions in less time than the state-of-the-art techniques.
Covariate selection with group lasso and doubly robust estimation of causal effects
Koch, Brandon; Vock, David M.; Wolfson, Julian
2017-01-01
Summary The efficiency of doubly robust estimators of the average causal effect (ACE) of a treatment can be improved by including in the treatment and outcome models only those covariates which are related to both treatment and outcome (i.e., confounders) or related only to the outcome. However, it is often challenging to identify such covariates among the large number that may be measured in a given study. In this paper, we propose GLiDeR (Group Lasso and Doubly Robust Estimation), a novel variable selection technique for identifying confounders and predictors of outcome using an adaptive group lasso approach that simultaneously performs coefficient selection, regularization, and estimation across the treatment and outcome models. The selected variables and corresponding coefficient estimates are used in a standard doubly robust ACE estimator. We provide asymptotic results showing that, for a broad class of data generating mechanisms, GLiDeR yields a consistent estimator of the ACE when either the outcome or treatment model is correctly specified. A comprehensive simulation study shows that GLiDeR is more efficient than doubly robust methods using standard variable selection techniques and has substantial computational advantages over a recently proposed doubly robust Bayesian model averaging method. We illustrate our method by estimating the causal treatment effect of bilateral versus single-lung transplant on forced expiratory volume in one year after transplant using an observational registry. PMID:28636276
Covariate selection with group lasso and doubly robust estimation of causal effects.
Koch, Brandon; Vock, David M; Wolfson, Julian
2018-03-01
The efficiency of doubly robust estimators of the average causal effect (ACE) of a treatment can be improved by including in the treatment and outcome models only those covariates which are related to both treatment and outcome (i.e., confounders) or related only to the outcome. However, it is often challenging to identify such covariates among the large number that may be measured in a given study. In this article, we propose GLiDeR (Group Lasso and Doubly Robust Estimation), a novel variable selection technique for identifying confounders and predictors of outcome using an adaptive group lasso approach that simultaneously performs coefficient selection, regularization, and estimation across the treatment and outcome models. The selected variables and corresponding coefficient estimates are used in a standard doubly robust ACE estimator. We provide asymptotic results showing that, for a broad class of data generating mechanisms, GLiDeR yields a consistent estimator of the ACE when either the outcome or treatment model is correctly specified. A comprehensive simulation study shows that GLiDeR is more efficient than doubly robust methods using standard variable selection techniques and has substantial computational advantages over a recently proposed doubly robust Bayesian model averaging method. We illustrate our method by estimating the causal treatment effect of bilateral versus single-lung transplant on forced expiratory volume in one year after transplant using an observational registry. © 2017, The International Biometric Society.
Robust Crossfeed Design for Hovering Rotorcraft
NASA Technical Reports Server (NTRS)
Catapang, David R.
1993-01-01
Control law design for rotorcraft fly-by-wire systems normally attempts to decouple angular responses using fixed-gain crossfeeds. This approach can lead to poor decoupling over the frequency range of pilot inputs and increase the load on the feedback loops. In order to improve the decoupling performance, dynamic crossfeeds may be adopted. Moreover, because of the large changes that occur in rotorcraft dynamics due to small changes about the nominal design condition, especially for near-hovering flight, the crossfeed design must be 'robust'. A new low-order matching method is presented here to design robust crossfeed compensators for multi-input, multi-output (MIMO) systems. The technique identifies degrees-of-freedom that can be decoupled using crossfeeds, given an anticipated set of parameter variations for the range of flight conditions of concern. Cross-coupling is then reduced for degrees-of-freedom that can use crossfeed compensation by minimizing off-axis response magnitude average and variance. Results are presented for the analysis of pitch, roll, yaw and heave coupling of the UH-60 Black Hawk helicopter in near-hovering flight. Robust crossfeeds are designed that show significant improvement in decoupling performance and robustness over nominal, single design point, compensators. The design method and results are presented in an easily used graphical format that lends significant physical insight to the design procedure. This plant pre-compensation technique is an appropriate preliminary step to the design of robust feedback control laws for rotorcraft.
Nonparametric methods for doubly robust estimation of continuous treatment effects.
Kennedy, Edward H; Ma, Zongming; McHugh, Matthew D; Small, Dylan S
2017-09-01
Continuous treatments (e.g., doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the effect curve, or by not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on the effect curve, and still allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and give a procedure for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.
1987-03-01
some general results and definitions from robust statistics (see Hampel et. al.. 1986). The influence function of an M-Lstimator is IC (y.x.0) = D_...estimator is (4,T T T (\\cond(YXGB) . Yx(x,O.B) ) . where X~xO.) =x T T8 xT-lx 1/2 = x x v(x . b/(x B x) ) - B. The influence function of this...the influence function for (3.7) and (3.8) is equal to -. Eo[\\Pcond(y.X.,B)] -1 cond(Yx. ,B). (4.3) On the other hand, the influence function for the
A Survey of UML Based Regression Testing
NASA Astrophysics Data System (ADS)
Fahad, Muhammad; Nadeem, Aamer
Regression testing is the process of ensuring software quality by analyzing whether changed parts behave as intended, and unchanged parts are not affected by the modifications. Since it is a costly process, a lot of techniques are proposed in the research literature that suggest testers how to build regression test suite from existing test suite with minimum cost. In this paper, we discuss the advantages and drawbacks of using UML diagrams for regression testing and analyze that UML model helps in identifying changes for regression test selection effectively. We survey the existing UML based regression testing techniques and provide an analysis matrix to give a quick insight into prominent features of the literature work. We discuss the open research issues like managing and reducing the size of regression test suite, prioritization of the test cases that would be helpful during strict schedule and resources that remain to be addressed for UML based regression testing.
A robust nonlinear filter for image restoration.
Koivunen, V
1995-01-01
A class of nonlinear regression filters based on robust estimation theory is introduced. The goal of the filtering is to recover a high-quality image from degraded observations. Models for desired image structures and contaminating processes are employed, but deviations from strict assumptions are allowed since the assumptions on signal and noise are typically only approximately true. The robustness of filters is usually addressed only in a distributional sense, i.e., the actual error distribution deviates from the nominal one. In this paper, the robustness is considered in a broad sense since the outliers may also be due to inappropriate signal model, or there may be more than one statistical population present in the processing window, causing biased estimates. Two filtering algorithms minimizing a least trimmed squares criterion are provided. The design of the filters is simple since no scale parameters or context-dependent threshold values are required. Experimental results using both real and simulated data are presented. The filters effectively attenuate both impulsive and nonimpulsive noise while recovering the signal structure and preserving interesting details.
Jabeen, Safia; Mehmood, Zahid; Mahmood, Toqeer; Saba, Tanzila; Rehman, Amjad; Mahmood, Muhammad Tariq
2018-01-01
For the last three decades, content-based image retrieval (CBIR) has been an active research area, representing a viable solution for retrieving similar images from an image repository. In this article, we propose a novel CBIR technique based on the visual words fusion of speeded-up robust features (SURF) and fast retina keypoint (FREAK) feature descriptors. SURF is a sparse descriptor whereas FREAK is a dense descriptor. Moreover, SURF is a scale and rotation-invariant descriptor that performs better in the case of repeatability, distinctiveness, and robustness. It is robust to noise, detection errors, geometric, and photometric deformations. It also performs better at low illumination within an image as compared to the FREAK descriptor. In contrast, FREAK is a retina-inspired speedy descriptor that performs better for classification-based problems as compared to the SURF descriptor. Experimental results show that the proposed technique based on the visual words fusion of SURF-FREAK descriptors combines the features of both descriptors and resolves the aforementioned issues. The qualitative and quantitative analysis performed on three image collections, namely Corel-1000, Corel-1500, and Caltech-256, shows that proposed technique based on visual words fusion significantly improved the performance of the CBIR as compared to the feature fusion of both descriptors and state-of-the-art image retrieval techniques. PMID:29694429
Jabeen, Safia; Mehmood, Zahid; Mahmood, Toqeer; Saba, Tanzila; Rehman, Amjad; Mahmood, Muhammad Tariq
2018-01-01
For the last three decades, content-based image retrieval (CBIR) has been an active research area, representing a viable solution for retrieving similar images from an image repository. In this article, we propose a novel CBIR technique based on the visual words fusion of speeded-up robust features (SURF) and fast retina keypoint (FREAK) feature descriptors. SURF is a sparse descriptor whereas FREAK is a dense descriptor. Moreover, SURF is a scale and rotation-invariant descriptor that performs better in the case of repeatability, distinctiveness, and robustness. It is robust to noise, detection errors, geometric, and photometric deformations. It also performs better at low illumination within an image as compared to the FREAK descriptor. In contrast, FREAK is a retina-inspired speedy descriptor that performs better for classification-based problems as compared to the SURF descriptor. Experimental results show that the proposed technique based on the visual words fusion of SURF-FREAK descriptors combines the features of both descriptors and resolves the aforementioned issues. The qualitative and quantitative analysis performed on three image collections, namely Corel-1000, Corel-1500, and Caltech-256, shows that proposed technique based on visual words fusion significantly improved the performance of the CBIR as compared to the feature fusion of both descriptors and state-of-the-art image retrieval techniques.
Use of fish embryo toxicity tests for the prediction of acute fish toxicity to chemicals.
Belanger, Scott E; Rawlings, Jane M; Carr, Gregory J
2013-08-01
The fish embryo test (FET) is a potential animal alternative for the acute fish toxicity (AFT) test. A comprehensive validation program assessed 20 different chemicals to understand intra- and interlaboratory variability for the FET. The FET had sufficient reproducibility across a range of potencies and modes of action. In the present study, the suitability of the FET as an alternative model is reviewed by relating FET and AFT. In total, 985 FET studies and 1531 AFT studies were summarized. The authors performed FET-AFT regressions to understand potential relationships based on physical-chemical properties, species choices, duration of exposure, chemical classes, chemical functional uses, and modes of action. The FET-AFT relationships are very robust (slopes near 1.0, intercepts near 0) across 9 orders of magnitude in potency. A recommendation for the predictive regression relationship is based on 96-h FET and AFT data: log FET median lethal concentration (LC50) = (0.989 × log fish LC50) - 0.195; n = 72 chemicals, r = 0.95, p < 0.001, LC50 in mg/L. A similar, not statistically different regression was developed for the entire data set (n = 144 chemicals, unreliable studies deleted). The FET-AFT regressions were robust for major chemical classes with suitably large data sets. Furthermore, regressions were similar to those for large groups of functional chemical categories such as pesticides, surfactants, and industrial organics. Pharmaceutical regressions (n = 8 studies only) were directionally correct. The FET-AFT relationships were not quantitatively different from acute fish-acute fish toxicity relationships with the following species: fathead minnow, rainbow trout, bluegill sunfish, Japanese medaka, and zebrafish. The FET is scientifically supportable as a rational animal alternative model for ecotoxicological testing of acute toxicity of chemicals to fish. Copyright © 2013 SETAC.
2010-01-01
Background There is growing concern in communities surrounding airports regarding the contribution of various emission sources (such as aircraft and ground support equipment) to nearby ambient concentrations. We used extensive monitoring of nitrogen dioxide (NO2) in neighborhoods surrounding T.F. Green Airport in Warwick, RI, and land-use regression (LUR) modeling techniques to determine the impact of proximity to the airport and local traffic on these concentrations. Methods Palmes diffusion tube samplers were deployed along the airport's fence line and within surrounding neighborhoods for one to two weeks. In total, 644 measurements were collected over three sampling campaigns (October 2007, March 2008 and June 2008) and each sampling location was geocoded. GIS-based variables were created as proxies for local traffic and airport activity. A forward stepwise regression methodology was employed to create general linear models (GLMs) of NO2 variability near the airport. The effect of local meteorology on associations with GIS-based variables was also explored. Results Higher concentrations of NO2 were seen near the airport terminal, entrance roads to the terminal, and near major roads, with qualitatively consistent spatial patterns between seasons. In our final multivariate model (R2 = 0.32), the local influences of highways and arterial/collector roads were statistically significant, as were local traffic density and distance to the airport terminal (all p < 0.001). Local meteorology did not significantly affect associations with principal GIS variables, and the regression model structure was robust to various model-building approaches. Conclusion Our study has shown that there are clear local variations in NO2 in the neighborhoods that surround an urban airport, which are spatially consistent across seasons. LUR modeling demonstrated a strong influence of local traffic, except the smallest roads that predominate in residential areas, as well as proximity to the airport terminal. PMID:21083910
Adamkiewicz, Gary; Hsu, Hsiao-Hsien; Vallarino, Jose; Melly, Steven J; Spengler, John D; Levy, Jonathan I
2010-11-17
There is growing concern in communities surrounding airports regarding the contribution of various emission sources (such as aircraft and ground support equipment) to nearby ambient concentrations. We used extensive monitoring of nitrogen dioxide (NO2) in neighborhoods surrounding T.F. Green Airport in Warwick, RI, and land-use regression (LUR) modeling techniques to determine the impact of proximity to the airport and local traffic on these concentrations. Palmes diffusion tube samplers were deployed along the airport's fence line and within surrounding neighborhoods for one to two weeks. In total, 644 measurements were collected over three sampling campaigns (October 2007, March 2008 and June 2008) and each sampling location was geocoded. GIS-based variables were created as proxies for local traffic and airport activity. A forward stepwise regression methodology was employed to create general linear models (GLMs) of NO2 variability near the airport. The effect of local meteorology on associations with GIS-based variables was also explored. Higher concentrations of NO2 were seen near the airport terminal, entrance roads to the terminal, and near major roads, with qualitatively consistent spatial patterns between seasons. In our final multivariate model (R2 = 0.32), the local influences of highways and arterial/collector roads were statistically significant, as were local traffic density and distance to the airport terminal (all p < 0.001). Local meteorology did not significantly affect associations with principal GIS variables, and the regression model structure was robust to various model-building approaches. Our study has shown that there are clear local variations in NO2 in the neighborhoods that surround an urban airport, which are spatially consistent across seasons. LUR modeling demonstrated a strong influence of local traffic, except the smallest roads that predominate in residential areas, as well as proximity to the airport terminal.
Evaluating uses of data mining techniques in propensity score estimation: a simulation study.
Setoguchi, Soko; Schneeweiss, Sebastian; Brookhart, M Alan; Glynn, Robert J; Cook, E Francis
2008-06-01
In propensity score modeling, it is a standard practice to optimize the prediction of exposure status based on the covariate information. In a simulation study, we examined in what situations analyses based on various types of exposure propensity score (EPS) models using data mining techniques such as recursive partitioning (RP) and neural networks (NN) produce unbiased and/or efficient results. We simulated data for a hypothetical cohort study (n = 2000) with a binary exposure/outcome and 10 binary/continuous covariates with seven scenarios differing by non-linear and/or non-additive associations between exposure and covariates. EPS models used logistic regression (LR) (all possible main effects), RP1 (without pruning), RP2 (with pruning), and NN. We calculated c-statistics (C), standard errors (SE), and bias of exposure-effect estimates from outcome models for the PS-matched dataset. Data mining techniques yielded higher C than LR (mean: NN, 0.86; RPI, 0.79; RP2, 0.72; and LR, 0.76). SE tended to be greater in models with higher C. Overall bias was small for each strategy, although NN estimates tended to be the least biased. C was not correlated with the magnitude of bias (correlation coefficient [COR] = -0.3, p = 0.1) but increased SE (COR = 0.7, p < 0.001). Effect estimates from EPS models by simple LR were generally robust. NN models generally provided the least numerically biased estimates. C was not associated with the magnitude of bias but was with the increased SE.
Mahlke, C; Hernando, D; Jahn, C; Cigliano, A; Ittermann, T; Mössler, A; Kromrey, ML; Domaska, G; Reeder, SB; Kühn, JP
2016-01-01
Purpose To investigate the feasibility of estimating the proton-density fat fraction (PDFF) using a 7.1 Tesla magnetic resonance imaging (MRI) system and to compare the accuracy of liver fat quantification using different fitting approaches. Materials and Methods Fourteen leptin-deficient ob/ob mice and eight intact controls were examined in a 7.1 Tesla animal scanner using a 3-dimensional six-echo chemical shift-encoded pulse sequence. Confounder-corrected PDFF was calculated using magnitude (magnitude data alone) and combined fitting (complex and magnitude data). Differences between fitting techniques were compared using Bland-Altman analysis. In addition, PDFFs derived with both reconstructions were correlated with histopathological fat content and triglyceride mass fraction using linear regression analysis. Results The PDFFs determined with use of both reconstructions correlated very strongly (r=0.91). However, small mean bias between reconstructions demonstrated divergent results (3.9%; CI 2.7%-5.1%). For both reconstructions, there was linear correlation with histopathology (combined fitting: r=0.61; magnitude fitting: r=0.64) and triglyceride content (combined fitting: r=0.79; magnitude fitting: r=0.70). Conclusion Liver fat quantification using the PDFF derived from MRI performed at 7.1 Tesla is feasible. PDFF has strong correlations with histopathologically determined fat and with triglyceride content. However, small differences between PDFF reconstruction techniques may impair the robustness and reliability of the biomarker at 7.1 Tesla. PMID:27197806
Robust passivity analysis for discrete-time recurrent neural networks with mixed delays
NASA Astrophysics Data System (ADS)
Huang, Chuan-Kuei; Shu, Yu-Jeng; Chang, Koan-Yuh; Shou, Ho-Nien; Lu, Chien-Yu
2015-02-01
This article considers the robust passivity analysis for a class of discrete-time recurrent neural networks (DRNNs) with mixed time-delays and uncertain parameters. The mixed time-delays that consist of both the discrete time-varying and distributed time-delays in a given range are presented, and the uncertain parameters are norm-bounded. The activation functions are assumed to be globally Lipschitz continuous. Based on new bounding technique and appropriate type of Lyapunov functional, a sufficient condition is investigated to guarantee the existence of the desired robust passivity condition for the DRNNs, which can be derived in terms of a family of linear matrix inequality (LMI). Some free-weighting matrices are introduced to reduce the conservatism of the criterion by using the bounding technique. A numerical example is given to illustrate the effectiveness and applicability.
Statistics based sampling for controller and estimator design
NASA Astrophysics Data System (ADS)
Tenne, Dirk
The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.
Ku, Yuen-Ching; Chan, Chun-Kit; Chen, Lian-Kuan
2007-06-15
We propose and experimentally demonstrate a novel in-band optical signal-to-noise ratio (OSNR) monitoring technique using a phase-modulator-embedded fiber loop mirror. This technique measures the in-band OSNR accurately by observing the output power of a fiber loop mirror filter, where the transmittance is adjusted by an embedded phase modulator driven by a low-frequency periodic signal. The measurement errors are less than 0.5 dB for an OSNR between 0 and 40 dB in a 10 Gbit/s non-return-to-zero system. This technique was also shown experimentally to have high robustness against various system impairments and high feasibility to be deployed in practical implementation.
Balabin, Roman M; Smirnov, Sergey V
2011-07-15
Melamine (2,4,6-triamino-1,3,5-triazine) is a nitrogen-rich chemical implicated in the pet and human food recalls and in the global food safety scares involving milk products. Due to the serious health concerns associated with melamine consumption and the extensive scope of affected products, rapid and sensitive methods to detect melamine's presence are essential. We propose the use of spectroscopy data-produced by near-infrared (near-IR/NIR) and mid-infrared (mid-IR/MIR) spectroscopies, in particular-for melamine detection in complex dairy matrixes. None of the up-to-date reported IR-based methods for melamine detection has unambiguously shown its wide applicability to different dairy products as well as limit of detection (LOD) below 1 ppm on independent sample set. It was found that infrared spectroscopy is an effective tool to detect melamine in dairy products, such as infant formula, milk powder, or liquid milk. ALOD below 1 ppm (0.76±0.11 ppm) can be reached if a correct spectrum preprocessing (pretreatment) technique and a correct multivariate (MDA) algorithm-partial least squares regression (PLS), polynomial PLS (Poly-PLS), artificial neural network (ANN), support vector regression (SVR), or least squares support vector machine (LS-SVM)-are used for spectrum analysis. The relationship between MIR/NIR spectrum of milk products and melamine content is nonlinear. Thus, nonlinear regression methods are needed to correctly predict the triazine-derivative content of milk products. It can be concluded that mid- and near-infrared spectroscopy can be regarded as a quick, sensitive, robust, and low-cost method for liquid milk, infant formula, and milk powder analysis. Copyright © 2011 Elsevier B.V. All rights reserved.
Thalagala, N
2015-11-01
The normative age ranges during which cohorts of children achieve milestones are called windows of achievement. The patterns of these windows of achievement are known to be both genetically and environmentally dependent. This study aimed to determine the windows of achievement for motor, social emotional, language and cognitive development milestones for infants and toddlers in Sri Lanka. A set of 293 milestones identified through a literature review were subjected to content validation using parent and expert reviews, which resulted in the selection of a revised set of 277 milestones. Thereafter, a sample of 1036 children from 2 months to 30 months was examined to see whether or not they had attained the selected milestones. Percentile ages of attaining milestone were determined using a rearranged closed form equation related to the logistic regression. The parameters required for calculations were derived through the logistic regression of milestone achievement statuses against ages of children. These percentile ages were used to define the respective windows of achievement. A set of 178 robust indicators that represent motor, socio emotional, language and cognitive development skills and their windows of achievement relevant to 2 to 24 months of age were determined. Windows of achievement for six gross motor milestones determined in the study were shown to closely overlap a similar set of windows of achievement published by the World Health Organization indicating the validity of some findings. A methodology combining the content validation based on qualitative techniques and age validation based on regression modelling found to be effective for determining age percentiles for realizing milestones and determining respective windows of achievement. © 2015 John Wiley & Sons Ltd.
The economic impact of state cigarette taxes and smoke-free air policies on convenience stores.
Huang, Jidong; Chaloupka, Frank J
2013-03-01
To investigate whether increasing state cigarette taxes and/or enacting stronger smoke-free air (SFA) policies have negative impact on convenience store density in a state, a proxy that is determined by store openings and closings, which reflects store profits. State-level business count estimates for convenience stores for 50 states and District of Columbia from 1997 to 2009 were analysed using two-way fixed effects regression techniques that control for state-specific and year-specific determinants of convenience store density. The impact of tax and SFA policies was examined using a quasi-experimental research design that exploits changes in cigarette taxes and SFA policies within a state over time. Taxes are found to be uncorrelated with the density of combined convenience stores and gas stations in a state. Taxes are positively correlated with the density of convenience stores; however, the magnitude of this correlation is small, with a 10% increase in state cigarette taxes associated with a 0.19% (p<0.05) increase in the number of convenience stores per million people in a state. State-level SFA policies do not correlate with convenience store density in a state, regardless whether gas stations were included. These results are robust across different model specifications. In addition, they are robust with regard to the inclusion/exclusion of other state-level tobacco control measures and gasoline prices. Contrary to tobacco industry and related organisations' claims, higher cigarette taxes and stronger SFA policies do not negatively affect convenience stores.
SKYNET: an efficient and robust neural network training tool for machine learning in astronomy
NASA Astrophysics Data System (ADS)
Graff, Philip; Feroz, Farhan; Hobson, Michael P.; Lasenby, Anthony
2014-06-01
We present the first public release of our generic neural network training algorithm, called SKYNET. This efficient and robust machine learning tool is able to train large and deep feed-forward neural networks, including autoencoders, for use in a wide range of supervised and unsupervised learning applications, such as regression, classification, density estimation, clustering and dimensionality reduction. SKYNET uses a `pre-training' method to obtain a set of network parameters that has empirically been shown to be close to a good solution, followed by further optimization using a regularized variant of Newton's method, where the level of regularization is determined and adjusted automatically; the latter uses second-order derivative information to improve convergence, but without the need to evaluate or store the full Hessian matrix, by using a fast approximate method to calculate Hessian-vector products. This combination of methods allows for the training of complicated networks that are difficult to optimize using standard backpropagation techniques. SKYNET employs convergence criteria that naturally prevent overfitting, and also includes a fast algorithm for estimating the accuracy of network outputs. The utility and flexibility of SKYNET are demonstrated by application to a number of toy problems, and to astronomical problems focusing on the recovery of structure from blurred and noisy images, the identification of gamma-ray bursters, and the compression and denoising of galaxy images. The SKYNET software, which is implemented in standard ANSI C and fully parallelized using MPI, is available at http://www.mrao.cam.ac.uk/software/skynet/.
Design and implementation of robust controllers for a gait trainer.
Wang, F C; Yu, C H; Chou, T Y
2009-08-01
This paper applies robust algorithms to control an active gait trainer for children with walking disabilities. Compared with traditional rehabilitation procedures, in which two or three trainers are required to assist the patient, a motor-driven mechanism was constructed to improve the efficiency of the procedures. First, a six-bar mechanism was designed and constructed to mimic the trajectory of children's ankles in walking. Second, system identification techniques were applied to obtain system transfer functions at different operating points by experiments. Third, robust control algorithms were used to design Hinfinity robust controllers for the system. Finally, the designed controllers were implemented to verify experimentally the system performance. From the results, the proposed robust control strategies are shown to be effective.
Identification and robust control of an experimental servo motor.
Adam, E J; Guestrin, E D
2002-04-01
In this work, the design of a robust controller for an experimental laboratory-scale position control system based on a dc motor drive as well as the corresponding identification and robust stability analysis are presented. In order to carry out the robust design procedure, first, a classic closed-loop identification technique is applied and then, the parametrization by internal model control is used. The model uncertainty is evaluated under both parametric and global representation. For the latter case, an interesting discussion about the conservativeness of this description is presented by means of a comparison between the uncertainty disk and the critical perturbation radius approaches. Finally, conclusions about the performance of the experimental system with the robust controller are discussed using comparative graphics of the controlled variable and the Nyquist stability margin as a robustness measurement.
Li, Zukui; Ding, Ran; Floudas, Christodoulos A.
2011-01-01
Robust counterpart optimization techniques for linear optimization and mixed integer linear optimization problems are studied in this paper. Different uncertainty sets, including those studied in literature (i.e., interval set; combined interval and ellipsoidal set; combined interval and polyhedral set) and new ones (i.e., adjustable box; pure ellipsoidal; pure polyhedral; combined interval, ellipsoidal, and polyhedral set) are studied in this work and their geometric relationship is discussed. For uncertainty in the left hand side, right hand side, and objective function of the optimization problems, robust counterpart optimization formulations induced by those different uncertainty sets are derived. Numerical studies are performed to compare the solutions of the robust counterpart optimization models and applications in refinery production planning and batch process scheduling problem are presented. PMID:21935263
Application of the Overclaiming Technique to Scholastic Assessment
ERIC Educational Resources Information Center
Paulhus, Delroy L.; Dubois, Patrick J.
2014-01-01
The overclaiming technique is a novel assessment procedure that uses signal detection analysis to generate indices of knowledge accuracy (OC-accuracy) and self-enhancement (OC-bias). The technique has previously shown robustness over varied knowledge domains as well as low reactivity across administration contexts. Here we compared the OC-accuracy…
Predicting the rate of change in timber value for forest stands infested with gypsy moth
David A. Gansner; Owen W. Herrick
1982-01-01
Presents a method for estimating the potential impact of gypsy moth attacks on forest-stand value. Robust regression analysis is used to develop an equation for predicting the rate of change in timber value from easy-to-measure key characteristics of stand condition.
Robust Mosaicking of Stereo Digital Elevation Models from the Ames Stereo Pipeline
NASA Technical Reports Server (NTRS)
Kim, Tae Min; Moratto, Zachary M.; Nefian, Ara Victor
2010-01-01
Robust estimation method is proposed to combine multiple observations and create consistent, accurate, dense Digital Elevation Models (DEMs) from lunar orbital imagery. The NASA Ames Intelligent Robotics Group (IRG) aims to produce higher-quality terrain reconstructions of the Moon from Apollo Metric Camera (AMC) data than is currently possible. In particular, IRG makes use of a stereo vision process, the Ames Stereo Pipeline (ASP), to automatically generate DEMs from consecutive AMC image pairs. However, the DEMs currently produced by the ASP often contain errors and inconsistencies due to image noise, shadows, etc. The proposed method addresses this problem by making use of multiple observations and by considering their goodness of fit to improve both the accuracy and robustness of the estimate. The stepwise regression method is applied to estimate the relaxed weight of each observation.
Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine.
Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L; Balleteros, Francisco
2016-12-07
Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets.
UNIX-based operating systems robustness evaluation
NASA Technical Reports Server (NTRS)
Chang, Yu-Ming
1996-01-01
Robust operating systems are required for reliable computing. Techniques for robustness evaluation of operating systems not only enhance the understanding of the reliability of computer systems, but also provide valuable feed- back to system designers. This thesis presents results from robustness evaluation experiments on five UNIX-based operating systems, which include Digital Equipment's OSF/l, Hewlett Packard's HP-UX, Sun Microsystems' Solaris and SunOS, and Silicon Graphics' IRIX. Three sets of experiments were performed. The methodology for evaluation tested (1) the exception handling mechanism, (2) system resource management, and (3) system capacity under high workload stress. An exception generator was used to evaluate the exception handling mechanism of the operating systems. Results included exit status of the exception generator and the system state. Resource management techniques used by individual operating systems were tested using programs designed to usurp system resources such as physical memory and process slots. Finally, the workload stress testing evaluated the effect of the workload on system performance by running a synthetic workload and recording the response time of local and remote user requests. Moderate to severe performance degradations were observed on the systems under stress.
Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine
Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L.; Balleteros, Francisco
2016-01-01
Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets. PMID:27941604
NASA Astrophysics Data System (ADS)
Friedel, M. J.; Daughney, C.
2016-12-01
The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.
Precise visual navigation using multi-stereo vision and landmark matching
NASA Astrophysics Data System (ADS)
Zhu, Zhiwei; Oskiper, Taragay; Samarasekera, Supun; Kumar, Rakesh
2007-04-01
Traditional vision-based navigation system often drifts over time during navigation. In this paper, we propose a set of techniques which greatly reduce the long term drift and also improve its robustness to many failure conditions. In our approach, two pairs of stereo cameras are integrated to form a forward/backward multi-stereo camera system. As a result, the Field-Of-View of the system is extended significantly to capture more natural landmarks from the scene. This helps to increase the pose estimation accuracy as well as reduce the failure situations. Secondly, a global landmark matching technique is used to recognize the previously visited locations during navigation. Using the matched landmarks, a pose correction technique is used to eliminate the accumulated navigation drift. Finally, in order to further improve the robustness of the system, measurements from low-cost Inertial Measurement Unit (IMU) and Global Positioning System (GPS) sensors are integrated with the visual odometry in an extended Kalman Filtering framework. Our system is significantly more accurate and robust than previously published techniques (1~5% localization error) over long-distance navigation both indoors and outdoors. Real world experiments on a human worn system show that the location can be estimated within 1 meter over 500 meters (around 0.1% localization error averagely) without the use of GPS information.
Using ridge regression in systematic pointing error corrections
NASA Technical Reports Server (NTRS)
Guiar, C. N.
1988-01-01
A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.
Slater, Graham J; Pennell, Matthew W
2014-05-01
A central prediction of much theory on adaptive radiations is that traits should evolve rapidly during the early stages of a clade's history and subsequently slowdown in rate as niches become saturated--a so-called "Early Burst." Although a common pattern in the fossil record, evidence for early bursts of trait evolution in phylogenetic comparative data has been equivocal at best. We show here that this may not necessarily be due to the absence of this pattern in nature. Rather, commonly used methods to infer its presence perform poorly when when the strength of the burst--the rate at which phenotypic evolution declines--is small, and when some morphological convergence is present within the clade. We present two modifications to existing comparative methods that allow greater power to detect early bursts in simulated datasets. First, we develop posterior predictive simulation approaches and show that they outperform maximum likelihood approaches at identifying early bursts at moderate strength. Second, we use a robust regression procedure that allows for the identification and down-weighting of convergent taxa, leading to moderate increases in method performance. We demonstrate the utility and power of these approach by investigating the evolution of body size in cetaceans. Model fitting using maximum likelihood is equivocal with regards the mode of cetacean body size evolution. However, posterior predictive simulation combined with a robust node height test return low support for Brownian motion or rate shift models, but not the early burst model. While the jury is still out on whether early bursts are actually common in nature, our approach will hopefully facilitate more robust testing of this hypothesis. We advocate the adoption of similar posterior predictive approaches to improve the fit and to assess the adequacy of macroevolutionary models in general.
A robust optimization methodology for preliminary aircraft design
NASA Astrophysics Data System (ADS)
Prigent, S.; Maréchal, P.; Rondepierre, A.; Druot, T.; Belleville, M.
2016-05-01
This article focuses on a robust optimization of an aircraft preliminary design under operational constraints. According to engineers' know-how, the aircraft preliminary design problem can be modelled as an uncertain optimization problem whose objective (the cost or the fuel consumption) is almost affine, and whose constraints are convex. It is shown that this uncertain optimization problem can be approximated in a conservative manner by an uncertain linear optimization program, which enables the use of the techniques of robust linear programming of Ben-Tal, El Ghaoui, and Nemirovski [Robust Optimization, Princeton University Press, 2009]. This methodology is then applied to two real cases of aircraft design and numerical results are presented.
Isaacson, Sven; Luo, Feng; Feltus, Frank A.; Smith, Melissa C.
2013-01-01
The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust. PMID:23409071
An improved, robust, axial line singularity method for bodies of revolution
NASA Technical Reports Server (NTRS)
Hemsch, Michael J.
1989-01-01
The failures encountered in attempts to increase the range of applicability of the axial line singularity method for representing incompressible, inviscid flow about an inclined and slender body-of-revolution are presently noted to be common to all efforts to solve Fredholm equations of the first kind. It is shown that a previously developed smoothing technique yields a robust method for numerical solution of the governing equations; this technique is easily retrofitted to existing codes, and allows the number of circularities to be increased until the most accurate line singularity solution is obtained.
Robust Approach for Nonuniformity Correction in Infrared Focal Plane Array.
Boutemedjet, Ayoub; Deng, Chenwei; Zhao, Baojun
2016-11-10
In this paper, we propose a new scene-based nonuniformity correction technique for infrared focal plane arrays. Our work is based on the use of two well-known scene-based methods, namely, adaptive and interframe registration-based exploiting pure translation motion model between frames. The two approaches have their benefits and drawbacks, which make them extremely effective in certain conditions and not adapted for others. Following on that, we developed a method robust to various conditions, which may slow or affect the correction process by elaborating a decision criterion that adapts the process to the most effective technique to ensure fast and reliable correction. In addition to that, problems such as bad pixels and ghosting artifacts are also dealt with to enhance the overall quality of the correction. The performance of the proposed technique is investigated and compared to the two state-of-the-art techniques cited above.
Robust Approach for Nonuniformity Correction in Infrared Focal Plane Array
Boutemedjet, Ayoub; Deng, Chenwei; Zhao, Baojun
2016-01-01
In this paper, we propose a new scene-based nonuniformity correction technique for infrared focal plane arrays. Our work is based on the use of two well-known scene-based methods, namely, adaptive and interframe registration-based exploiting pure translation motion model between frames. The two approaches have their benefits and drawbacks, which make them extremely effective in certain conditions and not adapted for others. Following on that, we developed a method robust to various conditions, which may slow or affect the correction process by elaborating a decision criterion that adapts the process to the most effective technique to ensure fast and reliable correction. In addition to that, problems such as bad pixels and ghosting artifacts are also dealt with to enhance the overall quality of the correction. The performance of the proposed technique is investigated and compared to the two state-of-the-art techniques cited above. PMID:27834893
Improved Signal Processing Technique Leads to More Robust Self Diagnostic Accelerometer System
NASA Technical Reports Server (NTRS)
Tokars, Roger; Lekki, John; Jaros, Dave; Riggs, Terrence; Evans, Kenneth P.
2010-01-01
The self diagnostic accelerometer (SDA) is a sensor system designed to actively monitor the health of an accelerometer. In this case an accelerometer is considered healthy if it can be determined that it is operating correctly and its measurements may be relied upon. The SDA system accomplishes this by actively monitoring the accelerometer for a variety of failure conditions including accelerometer structural damage, an electrical open circuit, and most importantly accelerometer detachment. In recent testing of the SDA system in emulated engine operating conditions it has been found that a more robust signal processing technique was necessary. An improved accelerometer diagnostic technique and test results of the SDA system utilizing this technique are presented here. Furthermore, the real time, autonomous capability of the SDA system to concurrently compensate for effects from real operating conditions such as temperature changes and mechanical noise, while monitoring the condition of the accelerometer health and attachment, will be demonstrated.
An adaptive discontinuous Galerkin solver for aerodynamic flows
NASA Astrophysics Data System (ADS)
Burgess, Nicholas K.
This work considers the accuracy, efficiency, and robustness of an unstructured high-order accurate discontinuous Galerkin (DG) solver for computational fluid dynamics (CFD). Recently, there has been a drive to reduce the discretization error of CFD simulations using high-order methods on unstructured grids. However, high-order methods are often criticized for lacking robustness and having high computational cost. The goal of this work is to investigate methods that enhance the robustness of high-order discontinuous Galerkin (DG) methods on unstructured meshes, while maintaining low computational cost and high accuracy of the numerical solutions. This work investigates robustness enhancement of high-order methods by examining effective non-linear solvers, shock capturing methods, turbulence model discretizations and adaptive refinement techniques. The goal is to develop an all encompassing solver that can simulate a large range of physical phenomena, where all aspects of the solver work together to achieve a robust, efficient and accurate solution strategy. The components and framework for a robust high-order accurate solver that is capable of solving viscous, Reynolds Averaged Navier-Stokes (RANS) and shocked flows is presented. In particular, this work discusses robust discretizations of the turbulence model equation used to close the RANS equations, as well as stable shock capturing strategies that are applicable across a wide range of discretization orders and applicable to very strong shock waves. Furthermore, refinement techniques are considered as both efficiency and robustness enhancement strategies. Additionally, efficient non-linear solvers based on multigrid and Krylov subspace methods are presented. The accuracy, efficiency, and robustness of the solver is demonstrated using a variety of challenging aerodynamic test problems, which include turbulent high-lift and viscous hypersonic flows. Adaptive mesh refinement was found to play a critical role in obtaining a robust and efficient high-order accurate flow solver. A goal-oriented error estimation technique has been developed to estimate the discretization error of simulation outputs. For high-order discretizations, it is shown that functional output error super-convergence can be obtained, provided the discretization satisfies a property known as dual consistency. The dual consistency of the DG methods developed in this work is shown via mathematical analysis and numerical experimentation. Goal-oriented error estimation is also used to drive an hp-adaptive mesh refinement strategy, where a combination of mesh or h-refinement, and order or p-enrichment, is employed based on the smoothness of the solution. The results demonstrate that the combination of goal-oriented error estimation and hp-adaptation yield superior accuracy, as well as enhanced robustness and efficiency for a variety of aerodynamic flows including flows with strong shock waves. This work demonstrates that DG discretizations can be the basis of an accurate, efficient, and robust CFD solver. Furthermore, enhancing the robustness of DG methods does not adversely impact the accuracy or efficiency of the solver for challenging and complex flow problems. In particular, when considering the computation of shocked flows, this work demonstrates that the available shock capturing techniques are sufficiently accurate and robust, particularly when used in conjunction with adaptive mesh refinement . This work also demonstrates that robust solutions of the Reynolds Averaged Navier-Stokes (RANS) and turbulence model equations can be obtained for complex and challenging aerodynamic flows. In this context, the most robust strategy was determined to be a low-order turbulence model discretization coupled to a high-order discretization of the RANS equations. Although RANS solutions using high-order accurate discretizations of the turbulence model were obtained, the behavior of current-day RANS turbulence models discretized to high-order was found to be problematic, leading to solver robustness issues. This suggests that future work is warranted in the area of turbulence model formulation for use with high-order discretizations. Alternately, the use of Large-Eddy Simulation (LES) subgrid scale models with high-order DG methods offers the potential to leverage the high accuracy of these methods for very high fidelity turbulent simulations. This thesis has developed the algorithmic improvements that will lay the foundation for the development of a three-dimensional high-order flow solution strategy that can be used as the basis for future LES simulations.
Fourier-Mellin moment-based intertwining map for image encryption
NASA Astrophysics Data System (ADS)
Kaur, Manjit; Kumar, Vijay
2018-03-01
In this paper, a robust image encryption technique that utilizes Fourier-Mellin moments and intertwining logistic map is proposed. Fourier-Mellin moment-based intertwining logistic map has been designed to overcome the issue of low sensitivity of an input image. Multi-objective Non-Dominated Sorting Genetic Algorithm (NSGA-II) based on Reinforcement Learning (MNSGA-RL) has been used to optimize the required parameters of intertwining logistic map. Fourier-Mellin moments are used to make the secret keys more secure. Thereafter, permutation and diffusion operations are carried out on input image using secret keys. The performance of proposed image encryption technique has been evaluated on five well-known benchmark images and also compared with seven well-known existing encryption techniques. The experimental results reveal that the proposed technique outperforms others in terms of entropy, correlation analysis, a unified average changing intensity and the number of changing pixel rate. The simulation results reveal that the proposed technique provides high level of security and robustness against various types of attacks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, Xuanfeng, E-mail: Xuanfeng.ding@beaumont.org; Li, Xiaoqiang; Zhang, J. Michele
Purpose: To present a novel robust and delivery-efficient spot-scanning proton arc (SPArc) therapy technique. Methods and Materials: A SPArc optimization algorithm was developed that integrates control point resampling, energy layer redistribution, energy layer filtration, and energy layer resampling. The feasibility of such a technique was evaluated using sample patients: 1 patient with locally advanced head and neck oropharyngeal cancer with bilateral lymph node coverage, and 1 with a nonmobile lung cancer. Plan quality, robustness, and total estimated delivery time were compared with the robust optimized multifield step-and-shoot arc plan without SPArc optimization (Arc{sub multi-field}) and the standard robust optimized intensity modulatedmore » proton therapy (IMPT) plan. Dose-volume histograms of target and organs at risk were analyzed, taking into account the setup and range uncertainties. Total delivery time was calculated on the basis of a 360° gantry room with 1 revolutions per minute gantry rotation speed, 2-millisecond spot switching time, 1-nA beam current, 0.01 minimum spot monitor unit, and energy layer switching time of 0.5 to 4 seconds. Results: The SPArc plan showed potential dosimetric advantages for both clinical sample cases. Compared with IMPT, SPArc delivered 8% and 14% less integral dose for oropharyngeal and lung cancer cases, respectively. Furthermore, evaluating the lung cancer plan compared with IMPT, it was evident that the maximum skin dose, the mean lung dose, and the maximum dose to ribs were reduced by 60%, 15%, and 35%, respectively, whereas the conformity index was improved from 7.6 (IMPT) to 4.0 (SPArc). The total treatment delivery time for lung and oropharyngeal cancer patients was reduced by 55% to 60% and 56% to 67%, respectively, when compared with Arc{sub multi-field} plans. Conclusion: The SPArc plan is the first robust and delivery-efficient proton spot-scanning arc therapy technique, which could potentially be implemented into routine clinical practice.« less
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Levine, Matthew E; Albers, David J; Hripcsak, George
2016-01-01
Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.
Analysis of Learning Curve Fitting Techniques.
1987-09-01
1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barten, Danique L. J., E-mail: d.barten@vumc.nl; Tol, Jim P.; Dahele, Max
Purpose: Proton radiotherapy for head-and-neck cancer (HNC) aims to improve organ-at-risk (OAR) sparing over photon radiotherapy. However, it may be less robust for setup and range uncertainties. The authors investigated OAR sparing and plan robustness for spot-scanning proton planning techniques and compared these with volumetric modulated arc therapy (VMAT) photon plans. Methods: Ten HNC patients were replanned using two arc VMAT (RapidArc) and spot-scanning proton techniques. OARs to be spared included the contra- and ipsilateral parotid and submandibular glands and individual swallowing muscles. Proton plans were made using Multifield Optimization (MFO, using three, five, and seven fields) and Single-field Optimizationmore » (SFO, using three fields). OAR sparing was evaluated using mean dose to composite salivary glands (Comp{sub Sal}) and composite swallowing muscles (Comp{sub Swal}). Plan robustness was determined for setup and range uncertainties (±3 mm for setup, ±3% HU) evaluating V95% and V107% for clinical target volumes. Results: Averaged over all patients Comp{sub Sal}/Comp{sub Swal} mean doses were lower for the three-field MFO plans (14.6/16.4 Gy) compared to the three-field SFO plans (20.0/23.7 Gy) and VMAT plans (23.0/25.3 Gy). Using more than three fields resulted in differences in OAR sparing of less than 1.5 Gy between plans. SFO plans were significantly more robust than MFO plans. VMAT plans were the most robust. Conclusions: MFO plans had improved OAR sparing but were less robust than SFO and VMAT plans, while SFO plans were more robust than MFO plans but resulted in less OAR sparing. Robustness of the MFO plans did not increase with more fields.« less
Krasikova, Dina V; Le, Huy; Bachura, Eric
2018-06-01
To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Analysis and Interpretation of Findings Using Multiple Regression Techniques
ERIC Educational Resources Information Center
Hoyt, William T.; Leierer, Stephen; Millington, Michael J.
2006-01-01
Multiple regression and correlation (MRC) methods form a flexible family of statistical techniques that can address a wide variety of different types of research questions of interest to rehabilitation professionals. In this article, we review basic concepts and terms, with an emphasis on interpretation of findings relevant to research questions…
Chen, Wen; Chowdhury, Fahmida N; Djuric, Ana; Yeh, Chih-Ping
2014-09-01
This paper provides a new design of robust fault detection for turbofan engines with adaptive controllers. The critical issue is that the adaptive controllers can depress the faulty effects such that the actual system outputs remain the pre-specified values, making it difficult to detect faults/failures. To solve this problem, a Total Measurable Fault Information Residual (ToMFIR) technique with the aid of system transformation is adopted to detect faults in turbofan engines with adaptive controllers. This design is a ToMFIR-redundancy-based robust fault detection. The ToMFIR is first introduced and existing results are also summarized. The Detailed design process of the ToMFIRs is presented and a turbofan engine model is simulated to verify the effectiveness of the proposed ToMFIR-based fault-detection strategy. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
Design and analysis issues in quantitative proteomics studies.
Karp, Natasha A; Lilley, Kathryn S
2007-09-01
Quantitative proteomics is the comparison of distinct proteomes which enables the identification of protein species which exhibit changes in expression or post-translational state in response to a given stimulus. Many different quantitative techniques are being utilized and generate large datasets. Independent of the technique used, these large datasets need robust data analysis to ensure valid conclusions are drawn from such studies. Approaches to address the problems that arise with large datasets are discussed to give insight into the types of statistical analyses of data appropriate for the various experimental strategies that can be employed by quantitative proteomic studies. This review also highlights the importance of employing a robust experimental design and highlights various issues surrounding the design of experiments. The concepts and examples discussed within will show how robust design and analysis will lead to confident results that will ensure quantitative proteomics delivers.
Robust pupil center detection using a curvature algorithm
NASA Technical Reports Server (NTRS)
Zhu, D.; Moore, S. T.; Raphan, T.; Wall, C. C. (Principal Investigator)
1999-01-01
Determining the pupil center is fundamental for calculating eye orientation in video-based systems. Existing techniques are error prone and not robust because eyelids, eyelashes, corneal reflections or shadows in many instances occlude the pupil. We have developed a new algorithm which utilizes curvature characteristics of the pupil boundary to eliminate these artifacts. Pupil center is computed based solely on points related to the pupil boundary. For each boundary point, a curvature value is computed. Occlusion of the boundary induces characteristic peaks in the curvature function. Curvature values for normal pupil sizes were determined and a threshold was found which together with heuristics discriminated normal from abnormal curvature. Remaining boundary points were fit with an ellipse using a least squares error criterion. The center of the ellipse is an estimate of the pupil center. This technique is robust and accurately estimates pupil center with less than 40% of the pupil boundary points visible.
NASA Astrophysics Data System (ADS)
Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad
2015-11-01
One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.
Morse Code, Scrabble, and the Alphabet
ERIC Educational Resources Information Center
Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss
2004-01-01
In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
NASA Astrophysics Data System (ADS)
Chang, Insu
The objective of the thesis is to introduce a relatively general nonlinear controller/estimator synthesis framework using a special type of the state-dependent Riccati equation technique. The continuous time state-dependent Riccati equation (SDRE) technique is extended to discrete-time under input and state constraints, yielding constrained (C) discrete-time (D) SDRE, referred to as CD-SDRE. For the latter, stability analysis and calculation of a region of attraction are carried out. The derivation of the D-SDRE under state-dependent weights is provided. Stability of the D-SDRE feedback system is established using Lyapunov stability approach. Receding horizon strategy is used to take into account the constraints on D-SDRE controller. Stability condition of the CD-SDRE controller is analyzed by using a switched system. The use of CD-SDRE scheme in the presence of constraints is then systematically demonstrated by applying this scheme to problems of spacecraft formation orbit reconfiguration under limited performance on thrusters. Simulation results demonstrate the efficacy and reliability of the proposed CD-SDRE. The CD-SDRE technique is further investigated in a case where there are uncertainties in nonlinear systems to be controlled. First, the system stability under each of the controllers in the robust CD-SDRE technique is separately established. The stability of the closed-loop system under the robust CD-SDRE controller is then proven based on the stability of each control system comprising switching configuration. A high fidelity dynamical model of spacecraft attitude motion in 3-dimensional space is derived with a partially filled fuel tank, assumed to have the first fuel slosh mode. The proposed robust CD-SDRE controller is then applied to the spacecraft attitude control system to stabilize its motion in the presence of uncertainties characterized by the first fuel slosh mode. The performance of the robust CD-SDRE technique is discussed. Subsequently, filtering techniques are investigated by using the D-SDRE technique. Detailed derivation of the D-SDRE-based filter (D-SDREF) is provided under the assumption of Gaussian noises and the stability condition of the error signal between the measured signal and the estimated signals is proven to be input-to-state stable. For the non-Gaussian distributed noises, we propose a filter by combining the D-SDREF and the particle filter (PF), named the combined D-SDRE/PF. Two algorithms for the filtering techniques are provided. Several filtering techniques are compared with challenging numerical examples to show the reliability and efficacy of the proposed D-SDREF and the combined D-SDRE/PF.
Post-processing through linear regression
NASA Astrophysics Data System (ADS)
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Interference Resilient Sigma Delta-Based Pulse Oximeter.
Shokouhian, Mohsen; Morling, Richard; Kale, Izzet
2016-06-01
Ambient light and optical interference can severely affect the performance of pulse oximeters. The deployment of a robust modulation technique to drive the pulse oximeter LEDs can reduce these unwanted effects and increases the resilient of the pulse oximeter against artificial ambient light. The time division modulation technique used in conventional pulse oximeters can not remove the effect of modulated light coming from surrounding environment and this may cause huge measurement error in pulse oximeter readings. This paper presents a novel cross-coupled sigma delta modulator which ensures that measurement accuracy will be more robust in comparison with conventional fixed-frequency oximeter modulation technique especially in the presence of pulsed artificial ambient light. Moreover, this novel modulator gives an extra control over the pulse oximeter power consumption leading to improved power management.
Fault Accommodation in Control of Flexible Systems
NASA Technical Reports Server (NTRS)
Maghami, Peiman G.; Sparks, Dean W., Jr.; Lim, Kyong B.
1998-01-01
New synthesis techniques for the design of fault accommodating controllers for flexible systems are developed. Three robust control design strategies, static dissipative, dynamic dissipative and mu-synthesis, are used in the approach. The approach provides techniques for designing controllers that maximize, in some sense, the tolerance of the closed-loop system against faults in actuators and sensors, while guaranteeing performance robustness at a specified performance level, measured in terms of the proximity of the closed-loop poles to the imaginary axis (the degree of stability). For dissipative control designs, nonlinear programming is employed to synthesize the controllers, whereas in mu-synthesis, the traditional D-K iteration is used. To demonstrate the feasibility of the proposed techniques, they are applied to the control design of a structural model of a flexible laboratory test structure.
Nweke, Mauryn C; McCartney, R Graham; Bracewell, Daniel G
2017-12-29
Mechanical characterisation of agarose-based resins is an important factor in ensuring robust chromatographic performance in the manufacture of biopharmaceuticals. Pressure-flow profiles are most commonly used to characterise these properties. There are a number of drawbacks with this method, including the potential need for several re-packs to achieve the desired packing quality, the impact of wall effects on experimental set up and the quantities of chromatography media and buffers required. To address these issues, we have developed a dynamic mechanical analysis (DMA) technique that characterises the mechanical properties of resins based on the viscoelasticity of a 1ml sample of slurry. This technique was conducted on seven resins with varying degrees of mechanical robustness and the results were compared to pressure-flow test results on the same resins. Results show a strong correlation between the two techniques. The most mechanically robust resin (Capto Q) had a critical velocity 3.3 times higher than the weakest (Sepharose CL-4B), whilst the DMA technique showed Capto Q to have a slurry deformation rate 8.3 times lower than Sepharose CL-4B. To ascertain whether polymer structure is indicative of mechanical strength, scanning electron microscopy images were also used to study the structural properties of each resin. Results indicate that DMA can be used as a small volume, complementary technique for the mechanical characterisation of chromatography media. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins
Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.
2016-01-01
Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins
NASA Astrophysics Data System (ADS)
Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.
2016-12-01
Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Robust Means and Covariance Matrices by the Minimum Volume Ellipsoid (MVE).
ERIC Educational Resources Information Center
Blankmeyer, Eric
P. Rousseeuw and A. Leroy (1987) proposed a very robust alternative to classical estimates of mean vectors and covariance matrices, the Minimum Volume Ellipsoid (MVE). This paper describes the MVE technique and presents a BASIC program to implement it. The MVE is a "high breakdown" estimator, one that can cope with samples in which as…
Statistical Techniques to Analyze Pesticide Data Program Food Residue Observations.
Szarka, Arpad Z; Hayworth, Carol G; Ramanarayanan, Tharacad S; Joseph, Robert S I
2018-06-26
The U.S. EPA conducts dietary-risk assessments to ensure that levels of pesticides on food in the U.S. food supply are safe. Often these assessments utilize conservative residue estimates, maximum residue levels (MRLs), and a high-end estimate derived from registrant-generated field-trial data sets. A more realistic estimate of consumers' pesticide exposure from food may be obtained by utilizing residues from food-monitoring programs, such as the Pesticide Data Program (PDP) of the U.S. Department of Agriculture. A substantial portion of food-residue concentrations in PDP monitoring programs are below the limits of detection (left-censored), which makes the comparison of regulatory-field-trial and PDP residue levels difficult. In this paper, we present a novel adaption of established statistical techniques, the Kaplan-Meier estimator (K-M), the robust regression on ordered statistic (ROS), and the maximum-likelihood estimator (MLE), to quantify the pesticide-residue concentrations in the presence of heavily censored data sets. The examined statistical approaches include the most commonly used parametric and nonparametric methods for handling left-censored data that have been used in the fields of medical and environmental sciences. This work presents a case study in which data of thiamethoxam residue on bell pepper generated from registrant field trials were compared with PDP-monitoring residue values. The results from the statistical techniques were evaluated and compared with commonly used simple substitution methods for the determination of summary statistics. It was found that the maximum-likelihood estimator (MLE) is the most appropriate statistical method to analyze this residue data set. Using the MLE technique, the data analyses showed that the median and mean PDP bell pepper residue levels were approximately 19 and 7 times lower, respectively, than the corresponding statistics of the field-trial residues.
Relative velocity change measurement based on seismic noise analysis in exploration geophysics
NASA Astrophysics Data System (ADS)
Corciulo, M.; Roux, P.; Campillo, M.; Dubuq, D.
2011-12-01
Passive monitoring techniques based on noise cross-correlation analysis are still debated in exploration geophysics even if recent studies showed impressive performance in seismology at larger scale. Time evolution of complex geological structure using noise data includes localization of noise sources and measurement of relative velocity variations. Monitoring relative velocity variations only requires the measurement of phase shifts of seismic noise cross-correlation functions computed for successive time recordings. The existing algorithms, such as the Stretching and the Doublet, classically need great efforts in terms of computation time, making them not practical when continuous dataset on dense arrays are acquired. We present here an innovative technique for passive monitoring based on the measure of the instantaneous phase of noise-correlated signals. The Instantaneous Phase Variation (IPV) technique aims at cumulating the advantages of the Stretching and Doublet methods while proposing a faster measurement of the relative velocity change. The IPV takes advantage of the Hilbert transform to compute in the time domain the phase difference between two noise correlation functions. The relative velocity variation is measured through the slope of the linear regression of the phase difference curve as a function of correlation time. The large amount of noise correlation functions, classically available at exploration scale on dense arrays, allows for a statistical analysis that further improves the precision of the estimation of the velocity change. In this work, numerical tests first aim at comparing the IPV performance to the Stretching and Doublet techniques in terms of accuracy, robustness and computation time. Then experimental results are presented using a seismic noise dataset with five days of continuous recording on 397 geophones spread on a ~1 km-squared area.
NASA Astrophysics Data System (ADS)
Lee, Seungjoon; Kevrekidis, Ioannis G.; Karniadakis, George Em
2017-09-01
Exascale-level simulations require fault-resilient algorithms that are robust against repeated and expected software and/or hardware failures during computations, which may render the simulation results unsatisfactory. If each processor can share some global information about the simulation from a coarse, limited accuracy but relatively costless auxiliary simulator we can effectively fill-in the missing spatial data at the required times by a statistical learning technique - multi-level Gaussian process regression, on the fly; this has been demonstrated in previous work [1]. Based on the previous work, we also employ another (nonlinear) statistical learning technique, Diffusion Maps, that detects computational redundancy in time and hence accelerate the simulation by projective time integration, giving the overall computation a "patch dynamics" flavor. Furthermore, we are now able to perform information fusion with multi-fidelity and heterogeneous data (including stochastic data). Finally, we set the foundations of a new framework in CFD, called patch simulation, that combines information fusion techniques from, in principle, multiple fidelity and resolution simulations (and even experiments) with a new adaptive timestep refinement technique. We present two benchmark problems (the heat equation and the Navier-Stokes equations) to demonstrate the new capability that statistical learning tools can bring to traditional scientific computing algorithms. For each problem, we rely on heterogeneous and multi-fidelity data, either from a coarse simulation of the same equation or from a stochastic, particle-based, more "microscopic" simulation. We consider, as such "auxiliary" models, a Monte Carlo random walk for the heat equation and a dissipative particle dynamics (DPD) model for the Navier-Stokes equations. More broadly, in this paper we demonstrate the symbiotic and synergistic combination of statistical learning, domain decomposition, and scientific computing in exascale simulations.
SU-F-T-185: Study of the Robustness of a Proton Arc Technique Based On PBS Beams
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Z; Zheng, Y
Purpose: One potential technique to realize proton arc is through using PBS beams from many directions to form overlaid Bragg peak (OBP) spots and placing these OBP spots throughout the target volume to achieve desired dose distribution. In this study, we analyzed the robustness of this proton arc technique. Methods: We used a cylindrical water phantom of 20 cm in radius in our robustness analysis. To study the range uncertainty effect, we changed the density of the phantom by ±3%. To study the setup uncertainty effect, we shifted the phantom by 3 & 5 mm. We also combined the rangemore » and setup uncertainties (3mm/±3%). For each test plan, we performed dose calculation for the nominal and 6 disturbed scenarios. Two test plans were used, one with single OBP spot and the other consisting of 121 OBP spots covering a 10×10cm{sup 2} area. We compared the dose profiles between the nominal and disturbed scenarios to estimate the impact of the uncertainties. Dose calculation was performed with Gate/GEANT based Monte Carlo software in cloud computing environment. Results: For each of the 7 scenarios, we simulated 100k & 10M events for plans consisting of single OBP spot and 121 OBP spots respectively. For single OBP spot, the setup uncertainty had minimum impact on the spot’s dose profile while range uncertainty had significant impact on the dose profile. For plan consisting of 121 OBP spots, similar effect was observed but the extent of disturbance was much less compared to single OBP spot. Conclusion: For PBS arc technique, range uncertainty has significantly more impact than setup uncertainty. Although single OBP spot can be severely disturbed by the range uncertainty, the overall effect is much less when a large number of OBP spots are used. Robustness optimization for PBS arc technique should consider range uncertainty with priority.« less
NASA Astrophysics Data System (ADS)
Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.
2015-03-01
During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.
Classification of mislabelled microarrays using robust sparse logistic regression.
Bootkrajang, Jakramate; Kabán, Ata
2013-04-01
Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. The code is available from http://cs.bham.ac.uk/∼jxb008. Supplementary data are available at Bioinformatics online.
Effective techniques in healthy eating and physical activity interventions: a meta-regression.
Michie, Susan; Abraham, Charles; Whittington, Craig; McAteer, John; Gupta, Sunjai
2009-11-01
Meta-analyses of behavior change (BC) interventions typically find large heterogeneity in effectiveness and small effects. This study aimed to assess the effectiveness of active BC interventions designed to promote physical activity and healthy eating and investigate whether theoretically specified BC techniques improve outcome. Interventions, evaluated in experimental or quasi-experimental studies, using behavioral and/or cognitive techniques to increase physical activity and healthy eating in adults, were systematically reviewed. Intervention content was reliably classified into 26 BC techniques and the effects of individual techniques, and of a theoretically derived combination of self-regulation techniques, were assessed using meta-regression. Valid outcomes of physical activity and healthy eating. The 122 evaluations (N = 44,747) produced an overall pooled effect size of 0.31 (95% confidence interval = 0.26 to 0.36, I(2) = 69%). The technique, "self-monitoring," explained the greatest amount of among-study heterogeneity (13%). Interventions that combined self-monitoring with at least one other technique derived from control theory were significantly more effective than the other interventions (0.42 vs. 0.26). Classifying interventions according to component techniques and theoretically derived technique combinations and conducting meta-regression enabled identification of effective components of interventions designed to increase physical activity and healthy eating. PsycINFO Database Record (c) 2009 APA, all rights reserved.
Diagnostics and Robust Estimation When Transforming the Regression Model and the Response.
1986-10-01
Bounded-influence estimators place a bound on the influence function of each observation. Bounded-influence regression estimators have been proposed by...Hampel et al. (1986, chapter 4) for further details. First we note that the influence function of 9 satisfying (14) is IF -)8""’IF.(yi,.9) B -.(Y. ,9...where 1 1.1-1 Page 21 B - N-I N T B N IZ NVT O(Y0]1=1 1 1iY~). This definition of the influence function is conditional on x .. XN but coincides with
Intrinsic Raman spectroscopy for quantitative biological spectroscopy Part II
Bechtel, Kate L.; Shih, Wei-Chuan; Feld, Michael S.
2009-01-01
We demonstrate the effectiveness of intrinsic Raman spectroscopy (IRS) at reducing errors caused by absorption and scattering. Physical tissue models, solutions of varying absorption and scattering coefficients with known concentrations of Raman scatterers, are studied. We show significant improvement in prediction error by implementing IRS to predict concentrations of Raman scatterers using both ordinary least squares regression (OLS) and partial least squares regression (PLS). In particular, we show that IRS provides a robust calibration model that does not increase in error when applied to samples with optical properties outside the range of calibration. PMID:18711512
Schooling, Literacy and Individual Earnings. International Adult Literacy Survey.
ERIC Educational Resources Information Center
Osberg, Lars
This paper uses direct measures of literacy skill levels provided by the International Adult Literacy Survey to estimate the return to literacy skills. Using a very simple human capital earnings equation and standard ordinary least squares regression, it tested estimates of the return to literacy skills for their robustness to alternative scalings…
Least Principal Components Analysis (LPCA): An Alternative to Regression Analysis.
ERIC Educational Resources Information Center
Olson, Jeffery E.
Often, all of the variables in a model are latent, random, or subject to measurement error, or there is not an obvious dependent variable. When any of these conditions exist, an appropriate method for estimating the linear relationships among the variables is Least Principal Components Analysis. Least Principal Components are robust, consistent,…
Effects of Diverse Forms of Family Structure on Female and Male Homicide
ERIC Educational Resources Information Center
Schwartz, Jennifer
2006-01-01
Utilizing 2000 data on 1,618 counties and seemingly unrelated regression, I assess whether family structure effects on homicide vary across family structure measures and gender. There is evidence of robust, multidimensional family structure effects across constructs reflecting the presence of two-parent families: mother/father absence, shortages…
Accurate motion parameter estimation for colonoscopy tracking using a regression method
NASA Astrophysics Data System (ADS)
Liu, Jianfei; Subramanian, Kalpathi R.; Yoo, Terry S.
2010-03-01
Co-located optical and virtual colonoscopy images have the potential to provide important clinical information during routine colonoscopy procedures. In our earlier work, we presented an optical flow based algorithm to compute egomotion from live colonoscopy video, permitting navigation and visualization of the corresponding patient anatomy. In the original algorithm, motion parameters were estimated using the traditional Least Sum of squares(LS) procedure which can be unstable in the context of optical flow vectors with large errors. In the improved algorithm, we use the Least Median of Squares (LMS) method, a robust regression method for motion parameter estimation. Using the LMS method, we iteratively analyze and converge toward the main distribution of the flow vectors, while disregarding outliers. We show through three experiments the improvement in tracking results obtained using the LMS method, in comparison to the LS estimator. The first experiment demonstrates better spatial accuracy in positioning the virtual camera in the sigmoid colon. The second and third experiments demonstrate the robustness of this estimator, resulting in longer tracked sequences: from 300 to 1310 in the ascending colon, and 410 to 1316 in the transverse colon.
Li, Zhixun; Zhang, Yingtao; Gong, Huiling; Li, Weimin; Tang, Xianglong
2016-12-01
Coronary artery disease has become the most dangerous diseases to human life. And coronary artery segmentation is the basis of computer aided diagnosis and analysis. Existing segmentation methods are difficult to handle the complex vascular texture due to the projective nature in conventional coronary angiography. Due to large amount of data and complex vascular shapes, any manual annotation has become increasingly unrealistic. A fully automatic segmentation method is necessary in clinic practice. In this work, we study a method based on reliable boundaries via multi-domains remapping and robust discrepancy correction via distance balance and quantile regression for automatic coronary artery segmentation of angiography images. The proposed method can not only segment overlapping vascular structures robustly, but also achieve good performance in low contrast regions. The effectiveness of our approach is demonstrated on a variety of coronary blood vessels compared with the existing methods. The overall segmentation performances si, fnvf, fvpf and tpvf were 95.135%, 3.733%, 6.113%, 96.268%, respectively. Copyright © 2016 Elsevier Ltd. All rights reserved.
Signaling mechanisms underlying the robustness and tunability of the plant immune network
Kim, Yungil; Tsuda, Kenichi; Igarashi, Daisuke; Hillmer, Rachel A.; Sakakibara, Hitoshi; Myers, Chad L.; Katagiri, Fumiaki
2014-01-01
Summary How does robust and tunable behavior emerge in a complex biological network? We sought to understand this for the signaling network controlling pattern-triggered immunity (PTI) in Arabidopsis. A dynamic network model containing four major signaling sectors, the jasmonate, ethylene, PAD4, and salicylate sectors, which together explain up to 80% of the PTI level, was built using data for dynamic sector activities and PTI levels under exhaustive combinatorial sector perturbations. Our regularized multiple regression model had a high level of predictive power and captured known and unexpected signal flows in the network. The sole inhibitory sector in the model, the ethylene sector, was central to the network robustness via its inhibition of the jasmonate sector. The model's multiple input sites linked specific signal input patterns varying in strength and timing to different network response patterns, indicating a mechanism enabling tunability. PMID:24439900
TLE uncertainty estimation using robust weighted differencing
NASA Astrophysics Data System (ADS)
Geul, Jacco; Mooij, Erwin; Noomen, Ron
2017-05-01
Accurate knowledge of satellite orbit errors is essential for many types of analyses. Unfortunately, for two-line elements (TLEs) this is not available. This paper presents a weighted differencing method using robust least-squares regression for estimating many important error characteristics. The method is applied to both classic and enhanced TLEs, compared to previous implementations, and validated using Global Positioning System (GPS) solutions for the GOCE satellite in Low-Earth Orbit (LEO), prior to its re-entry. The method is found to be more accurate than previous TLE differencing efforts in estimating initial uncertainty, as well as error growth. The method also proves more reliable and requires no data filtering (such as outlier removal). Sensitivity analysis shows a strong relationship between argument of latitude and covariance (standard deviations and correlations), which the method is able to approximate. Overall, the method proves accurate, computationally fast, and robust, and is applicable to any object in the satellite catalogue (SATCAT).
Process for manufacture of semipermeable silicon nitride membranes
Galambos, Paul Charles; Shul, Randy J.; Willison, Christi Gober
2003-12-09
A new class of semipermeable membranes, and techniques for their fabrication, have been developed. These membranes, formed by appropriate etching of a deposited silicon nitride layer, are robust, easily manufacturable, and compatible with a wide range of silicon micromachining techniques.
APPLICATION OF STABLE ISOTOPE TECHNIQUES TO AIR POLLUTION RESEARCH
Stable isotope techniques provide a robust, yet under-utilized tool for examining pollutant effects on plant growth and ecosystem function. Here, we survey a range of mixing model, physiological and system level applications for documenting pollutant effects. Mixing model examp...
Lin, Ying-Ting
2013-04-30
A tandem technique of hard equipment is often used for the chemical analysis of a single cell to first isolate and then detect the wanted identities. The first part is the separation of wanted chemicals from the bulk of a cell; the second part is the actual detection of the important identities. To identify the key structural modifications around ligand binding, the present study aims to develop a counterpart of tandem technique for cheminformatics. A statistical regression and its outliers act as a computational technique for separation. A PPARγ (peroxisome proliferator-activated receptor gamma) agonist cellular system was subjected to such an investigation. Results show that this tandem regression-outlier analysis, or the prioritization of the context equations tagged with features of the outliers, is an effective regression technique of cheminformatics to detect key structural modifications, as well as their tendency of impact to ligand binding. The key structural modifications around ligand binding are effectively extracted or characterized out of cellular reactions. This is because molecular binding is the paramount factor in such ligand cellular system and key structural modifications around ligand binding are expected to create outliers. Therefore, such outliers can be captured by this tandem regression-outlier analysis.
Robust-mode analysis of hydrodynamic flows
NASA Astrophysics Data System (ADS)
Roy, Sukesh; Gord, James R.; Hua, Jia-Chen; Gunaratne, Gemunu H.
2017-04-01
The emergence of techniques to extract high-frequency high-resolution data introduces a new avenue for modal decomposition to assess the underlying dynamics, especially of complex flows. However, this task requires the differentiation of robust, repeatable flow constituents from noise and other irregular features of a flow. Traditional approaches involving low-pass filtering and principle components analysis have shortcomings. The approach outlined here, referred to as robust-mode analysis, is based on Koopman decomposition. Three applications to (a) a counter-rotating cellular flame state, (b) variations in financial markets, and (c) turbulent injector flows are provided.
Song, Qiankun; Yu, Qinqin; Zhao, Zhenjiang; Liu, Yurong; Alsaadi, Fuad E
2018-07-01
In this paper, the boundedness and robust stability for a class of delayed complex-valued neural networks with interval parameter uncertainties are investigated. By using Homomorphic mapping theorem, Lyapunov method and inequality techniques, sufficient condition to guarantee the boundedness of networks and the existence, uniqueness and global robust stability of equilibrium point is derived for the considered uncertain neural networks. The obtained robust stability criterion is expressed in complex-valued LMI, which can be calculated numerically using YALMIP with solver of SDPT3 in MATLAB. An example with simulations is supplied to show the applicability and advantages of the acquired result. Copyright © 2018 Elsevier Ltd. All rights reserved.
A Comparison of Mean Phase Difference and Generalized Least Squares for Analyzing Single-Case Data
ERIC Educational Resources Information Center
Manolov, Rumen; Solanas, Antonio
2013-01-01
The present study focuses on single-case data analysis specifically on two procedures for quantifying differences between baseline and treatment measurements. The first technique tested is based on generalized least square regression analysis and is compared to a proposed non-regression technique, which allows obtaining similar information. The…
Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression
ERIC Educational Resources Information Center
Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.
2013-01-01
Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…
Two biased estimation techniques in linear regression: Application to aircraft
NASA Technical Reports Server (NTRS)
Klein, Vladislav
1988-01-01
Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.
Zhao, Junbo; Wang, Shaobu; Mili, Lamine; ...
2018-01-08
Here, this paper develops a robust power system state estimation framework with the consideration of measurement correlations and imperfect synchronization. In the framework, correlations of SCADA and Phasor Measurements (PMUs) are calculated separately through unscented transformation and a Vector Auto-Regression (VAR) model. In particular, PMU measurements during the waiting period of two SCADA measurement scans are buffered to develop the VAR model with robustly estimated parameters using projection statistics approach. The latter takes into account the temporal and spatial correlations of PMU measurements and provides redundant measurements to suppress bad data and mitigate imperfect synchronization. In case where the SCADAmore » and PMU measurements are not time synchronized, either the forecasted PMU measurements or the prior SCADA measurements from the last estimation run are leveraged to restore system observability. Then, a robust generalized maximum-likelihood (GM)-estimator is extended to integrate measurement error correlations and to handle the outliers in the SCADA and PMU measurements. Simulation results that stem from a comprehensive comparison with other alternatives under various conditions demonstrate the benefits of the proposed framework.« less
Defining surfaces for skewed, highly variable data
Helsel, D.R.; Ryker, S.J.
2002-01-01
Skewness of environmental data is often caused by more than simply a handful of outliers in an otherwise normal distribution. Statistical procedures for such datasets must be sufficiently robust to deal with distributions that are strongly non-normal, containing both a large proportion of outliers and a skewed main body of data. In the field of water quality, skewness is commonly associated with large variation over short distances. Spatial analysis of such data generally requires either considerable effort at modeling or the use of robust procedures not strongly affected by skewness and local variability. Using a skewed dataset of 675 nitrate measurements in ground water, commonly used methods for defining a surface (least-squares regression and kriging) are compared to a more robust method (loess). Three choices are critical in defining a surface: (i) is the surface to be a central mean or median surface? (ii) is either a well-fitting transformation or a robust and scale-independent measure of center used? (iii) does local spatial autocorrelation assist in or detract from addressing objectives? Published in 2002 by John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Junbo; Wang, Shaobu; Mili, Lamine
Here, this paper develops a robust power system state estimation framework with the consideration of measurement correlations and imperfect synchronization. In the framework, correlations of SCADA and Phasor Measurements (PMUs) are calculated separately through unscented transformation and a Vector Auto-Regression (VAR) model. In particular, PMU measurements during the waiting period of two SCADA measurement scans are buffered to develop the VAR model with robustly estimated parameters using projection statistics approach. The latter takes into account the temporal and spatial correlations of PMU measurements and provides redundant measurements to suppress bad data and mitigate imperfect synchronization. In case where the SCADAmore » and PMU measurements are not time synchronized, either the forecasted PMU measurements or the prior SCADA measurements from the last estimation run are leveraged to restore system observability. Then, a robust generalized maximum-likelihood (GM)-estimator is extended to integrate measurement error correlations and to handle the outliers in the SCADA and PMU measurements. Simulation results that stem from a comprehensive comparison with other alternatives under various conditions demonstrate the benefits of the proposed framework.« less
Topology of foreign exchange markets using hierarchical structure methods
NASA Astrophysics Data System (ADS)
Naylor, Michael J.; Rose, Lawrence C.; Moyle, Brendan J.
2007-08-01
This paper uses two physics derived hierarchical techniques, a minimal spanning tree and an ultrametric hierarchical tree, to extract a topological influence map for major currencies from the ultrametric distance matrix for 1995-2001. We find that these two techniques generate a defined and robust scale free network with meaningful taxonomy. The topology is shown to be robust with respect to method, to time horizon and is stable during market crises. This topology, appropriately used, gives a useful guide to determining the underlying economic or regional causal relationships for individual currencies and to understanding the dynamics of exchange rate price determination as part of a complex network.
NASA Astrophysics Data System (ADS)
Dimova, E.; Steflekova, V.; Karatodorov, S.; Kyoseva, E.
2018-03-01
We propose a way of achieving efficient and robust second-harmonic generation. The technique proposed is similar to the adiabatic population transfer in a two-state quantum system with crossing energies. If the phase mismatching changes slowly, e.g., due to a temperature gradient along the crystal, and makes the phase match for second-harmonic generation to occur, then the energy would be converted adiabatically to the second harmonic. As an adiabatic technique, the second-harmonic generation scheme presented is stable to variations in the crystal parameters, as well as in the input light, crystal length, input intensity, wavelength and angle of incidence.
Design, test, and evaluation of three active flutter suppression controllers
NASA Technical Reports Server (NTRS)
Adams, William M., Jr.; Christhilf, David M.; Waszak, Martin R.; Mukhopadhyay, Vivek; Srinathkumar, S.
1992-01-01
Three control law design techniques for flutter suppression are presented. Each technique uses multiple control surfaces and/or sensors. The first method uses traditional tools (such as pole/zero loci and Nyquist diagrams) for producing a controller that has minimal complexity and which is sufficiently robust to handle plant uncertainty. The second procedure uses linear combinations of several accelerometer signals and dynamic compensation to synthesize the model rate of the critical mode for feedback to the distributed control surfaces. The third technique starts with a minimum-energy linear quadratic Gaussian controller, iteratively modifies intensity matrices corresponding to input and output noise, and applies controller order reduction to achieve a low-order, robust controller. The resulting designs were implemented digitally and tested subsonically on the active flexible wing wind-tunnel model in the Langley Transonic Dynamics Tunnel. Only the traditional pole/zero loci design was sufficiently robust to errors in the nominal plant to successfully suppress flutter during the test. The traditional pole/zero loci design provided simultaneous suppression of symmetric and antisymmetric flutter with a 24-percent increase in attainable dynamic pressure. Posttest analyses are shown which illustrate the problems encountered with the other laws.
Chaotic CDMA watermarking algorithm for digital image in FRFT domain
NASA Astrophysics Data System (ADS)
Liu, Weizhong; Yang, Wentao; Feng, Zhuoming; Zou, Xuecheng
2007-11-01
A digital image-watermarking algorithm based on fractional Fourier transform (FRFT) domain is presented by utilizing chaotic CDMA technique in this paper. As a popular and typical transmission technique, CDMA has many advantages such as privacy, anti-jamming and low power spectral density, which can provide robustness against image distortions and malicious attempts to remove or tamper with the watermark. A super-hybrid chaotic map, with good auto-correlation and cross-correlation characteristics, is adopted to produce many quasi-orthogonal codes (QOC) that can replace the periodic PN-code used in traditional CDAM system. The watermarking data is divided into a lot of segments that correspond to different chaotic QOC respectively and are modulated into the CDMA watermarking data embedded into low-frequency amplitude coefficients of FRFT domain of the cover image. During watermark detection, each chaotic QOC extracts its corresponding watermarking segment by calculating correlation coefficients between chaotic QOC and watermarked data of the detected image. The CDMA technique not only can enhance the robustness of watermark but also can compress the data of the modulated watermark. Experimental results show that the watermarking algorithm has good performances in three aspects: better imperceptibility, anti-attack robustness and security.
A Novel MEMS Gyro North Finder Design Based on the Rotation Modulation Technique
Zhang, Yongjian; Zhou, Bin; Song, Mingliang; Hou, Bo; Xing, Haifeng; Zhang, Rong
2017-01-01
Gyro north finders have been widely used in maneuvering weapon orientation, oil drilling and other areas. This paper proposes a novel Micro-Electro-Mechanical System (MEMS) gyroscope north finder based on the rotation modulation (RM) technique. Two rotation modulation modes (static and dynamic modulation) are applied. Compared to the traditional gyro north finders, only one single MEMS gyroscope and one MEMS accelerometer are needed, reducing the total cost since high-precision gyroscopes and accelerometers are the most expensive components in gyro north finders. To reduce the volume and enhance the reliability, wireless power and wireless data transmission technique are introduced into the rotation modulation system for the first time. To enhance the system robustness, the robust least square method (RLSM) and robust Kalman filter (RKF) are applied in the static and dynamic north finding methods, respectively. Experimental characterization resulted in a static accuracy of 0.66° and a dynamic repeatability accuracy of 1°, respectively, confirming the excellent potential of the novel north finding system. The proposed single gyro and single accelerometer north finding scheme is universal, and can be an important reference to both scientific research and industrial applications. PMID:28452936
Sainath, Kamalesh; Teixeira, Fernando L; Donderici, Burkay
2014-01-01
We develop a general-purpose formulation, based on two-dimensional spectral integrals, for computing electromagnetic fields produced by arbitrarily oriented dipoles in planar-stratified environments, where each layer may exhibit arbitrary and independent anisotropy in both its (complex) permittivity and permeability tensors. Among the salient features of our formulation are (i) computation of eigenmodes (characteristic plane waves) supported in arbitrarily anisotropic media in a numerically robust fashion, (ii) implementation of an hp-adaptive refinement for the numerical integration to evaluate the radiation and weakly evanescent spectra contributions, and (iii) development of an adaptive extension of an integral convergence acceleration technique to compute the strongly evanescent spectrum contribution. While other semianalytic techniques exist to solve this problem, none have full applicability to media exhibiting arbitrary double anisotropies in each layer, where one must account for the whole range of possible phenomena (e.g., mode coupling at interfaces and nonreciprocal mode propagation). Brute-force numerical methods can tackle this problem but only at a much higher computational cost. The present formulation provides an efficient and robust technique for field computation in arbitrary planar-stratified environments. We demonstrate the formulation for a number of problems related to geophysical exploration.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, X; Wu, H; Rosen, L
2015-06-15
Purpose: To develop a clinical feasible and robust proton therapy technique to spare bowel, bladder and rectum for high-risk prostate cancer patients Methods: The study includes 3 high-risk prostate cancer cases treated with bilateral opposed SFUD with lateral penumbra gradient matching technique prescribed to 5400cGyE in 30 fx in our institution. To treat whole pelvic lymph node chain, the complicated ‘H’ shape, using SFUD technique, we divided the target into two sub-targets (LLAT beam treating ‘90 degree T-shape’ and RLAT beam treating ‘: shape’) in Plan A and use lateral penumbra gradient matching at patient’s left side. Vice verse inmore » Plan B. Each plan deliver half of the prescription dose. Beam-specific PTVs were created to take range uncertainty and setup error into account. For daily treatment, patient received four fields from both plan A and B per day. Robustness evaluation were performed in the worst case scenario with 3.5% range uncertainty and 1, 2, 3mm overlap or gap between LLAT and RLAT field matching in Raystation 4.0. All of cases also have a Tomotherapy backup plan approved by physician as a dosimetric comparison. Results: The total treatment time take 15–20mins including IGRT and four fields delivery on ProteusONE, a compact size PBS proton system, compared to 25–30min in traditional Tomotherapy. Robustness analysis shows that this plan technique is insensitive to the range uncertainties. With the lateral gradient matching, 1, 2, 3mm overlap renders only 2.5%, 5.5% and 8% hot or cool spot in the junction areas. Dosimetric comparisons with Tomotherapy show a significant dose reduction in bladder D50%(14.7±9.3Gy), D35%(7.3±5.8Gy); small bowel and rectum average dose(19.6±7.5Gy and 14.5±6.3Gy respectively). Conclusion: The bilateral opposed(SFUD) plan with lateral penumbra gradient matching has been approved to be a safe, robust and efficient treatment option for whole pelvis high-risk prostate cancer patient which significantly spares the OARs.« less
Eddy current technique for predicting burst pressure
Petri, Mark C.; Kupperman, David S.; Morman, James A.; Reifman, Jaques; Wei, Thomas Y. C.
2003-01-01
A signal processing technique which correlates eddy current inspection data from a tube having a critical tubing defect with a range of predicted burst pressures for the tube is provided. The method can directly correlate the raw eddy current inspection data representing the critical tubing defect with the range of burst pressures using a regression technique, preferably an artificial neural network. Alternatively, the technique deconvolves the raw eddy current inspection data into a set of undistorted signals, each of which represents a separate defect of the tube. The undistorted defect signal which represents the critical tubing defect is related to a range of burst pressures utilizing a regression technique.
Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States
NASA Astrophysics Data System (ADS)
Yang, J.; Astitha, M.; Schwartz, C. S.
2017-12-01
Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.
Robust detection, isolation and accommodation for sensor failures
NASA Technical Reports Server (NTRS)
Emami-Naeini, A.; Akhter, M. M.; Rock, S. M.
1986-01-01
The objective is to extend the recent advances in robust control system design of multivariable systems to sensor failure detection, isolation, and accommodation (DIA), and estimator design. This effort provides analysis tools to quantify the trade-off between performance robustness and DIA sensitivity, which are to be used to achieve higher levels of performance robustness for given levels of DIA sensitivity. An innovations-based DIA scheme is used. Estimators, which depend upon a model of the process and process inputs and outputs, are used to generate these innovations. Thresholds used to determine failure detection are computed based on bounds on modeling errors, noise properties, and the class of failures. The applicability of the newly developed tools are demonstrated on a multivariable aircraft turbojet engine example. A new concept call the threshold selector was developed. It represents a significant and innovative tool for the analysis and synthesis of DiA algorithms. The estimators were made robust by introduction of an internal model and by frequency shaping. The internal mode provides asymptotically unbiased filter estimates.The incorporation of frequency shaping of the Linear Quadratic Gaussian cost functional modifies the estimator design to make it suitable for sensor failure DIA. The results are compared with previous studies which used thresholds that were selcted empirically. Comparison of these two techniques on a nonlinear dynamic engine simulation shows improved performance of the new method compared to previous techniques
USING BIOASSAYS TO EVALUATE THE PERFORMANCE OF RISK MANAGEMENT TECHNIQUES
Often, the performance of risk management techniques is evaluated by measuring the concentrations of the chemials of concern before and after risk management effoprts. However, using bioassays and chemical data provides a more robust understanding of the effectiveness of risk man...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Callister, Stephen J.; Barry, Richard C.; Adkins, Joshua N.
2006-02-01
Central tendency, linear regression, locally weighted regression, and quantile techniques were investigated for normalization of peptide abundance measurements obtained from high-throughput liquid chromatography-Fourier transform ion cyclotron resonance mass spectrometry (LC-FTICR MS). Arbitrary abundances of peptides were obtained from three sample sets, including a standard protein sample, two Deinococcus radiodurans samples taken from different growth phases, and two mouse striatum samples from control and methamphetamine-stressed mice (strain C57BL/6). The selected normalization techniques were evaluated in both the absence and presence of biological variability by estimating extraneous variability prior to and following normalization. Prior to normalization, replicate runs from each sample setmore » were observed to be statistically different, while following normalization replicate runs were no longer statistically different. Although all techniques reduced systematic bias, assigned ranks among the techniques revealed significant trends. For most LC-FTICR MS analyses, linear regression normalization ranked either first or second among the four techniques, suggesting that this technique was more generally suitable for reducing systematic biases.« less
Caballero, Daniel; Antequera, Teresa; Caro, Andrés; Ávila, María Del Mar; G Rodríguez, Pablo; Perez-Palacios, Trinidad
2017-07-01
Magnetic resonance imaging (MRI) combined with computer vision techniques have been proposed as an alternative or complementary technique to determine the quality parameters of food in a non-destructive way. The aim of this work was to analyze the sensory attributes of dry-cured loins using this technique. For that, different MRI acquisition sequences (spin echo, gradient echo and turbo 3D), algorithms for MRI analysis (GLCM, NGLDM, GLRLM and GLCM-NGLDM-GLRLM) and predictive data mining techniques (multiple linear regression and isotonic regression) were tested. The correlation coefficient (R) and mean absolute error (MAE) were used to validate the prediction results. The combination of spin echo, GLCM and isotonic regression produced the most accurate results. In addition, the MRI data from dry-cured loins seems to be more suitable than the data from fresh loins. The application of predictive data mining techniques on computational texture features from the MRI data of loins enables the determination of the sensory traits of dry-cured loins in a non-destructive way. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
Simple to complex modeling of breathing volume using a motion sensor.
John, Dinesh; Staudenmayer, John; Freedson, Patty
2013-06-01
To compare simple and complex modeling techniques to estimate categories of low, medium, and high ventilation (VE) from ActiGraph™ activity counts. Vertical axis ActiGraph™ GT1M activity counts, oxygen consumption and VE were measured during treadmill walking and running, sports, household chores and labor-intensive employment activities. Categories of low (<19.3 l/min), medium (19.3 to 35.4 l/min) and high (>35.4 l/min) VEs were derived from activity intensity classifications (light <2.9 METs, moderate 3.0 to 5.9 METs and vigorous >6.0 METs). We examined the accuracy of two simple techniques (multiple regression and activity count cut-point analyses) and one complex (random forest technique) modeling technique in predicting VE from activity counts. Prediction accuracy of the complex random forest technique was marginally better than the simple multiple regression method. Both techniques accurately predicted VE categories almost 80% of the time. The multiple regression and random forest techniques were more accurate (85 to 88%) in predicting medium VE. Both techniques predicted the high VE (70 to 73%) with greater accuracy than low VE (57 to 60%). Actigraph™ cut-points for light, medium and high VEs were <1381, 1381 to 3660 and >3660 cpm. There were minor differences in prediction accuracy between the multiple regression and the random forest technique. This study provides methods to objectively estimate VE categories using activity monitors that can easily be deployed in the field. Objective estimates of VE should provide a better understanding of the dose-response relationship between internal exposure to pollutants and disease. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Shimoyama, Koji; Jeong, Shinkyu; Obayashi, Shigeru
A new approach for multi-objective robust design optimization was proposed and applied to a real-world design problem with a large number of objective functions. The present approach is assisted by response surface approximation and visual data-mining, and resulted in two major gains regarding computational time and data interpretation. The Kriging model for response surface approximation can markedly reduce the computational time for predictions of robustness. In addition, the use of self-organizing maps as a data-mining technique allows visualization of complicated design information between optimality and robustness in a comprehensible two-dimensional form. Therefore, the extraction and interpretation of trade-off relations between optimality and robustness of design, and also the location of sweet spots in the design space, can be performed in a comprehensive manner.
Analysis and Synthesis of Robust Data Structures
1990-08-01
1.3.2 Multiversion Software. .. .. .. .. .. .... .. ... .. ...... 5 1.3.3 Robust Data Structure .. .. .. .. .. .. .. .. .. ... .. ..... 6 1.4...context are 0 multiversion software, which is an adaptation oi N-modulo redundancy (NMR) tech- nique. * recovery blocks, which is an adaptation of...implementations using these features for such a hybrid approach. 1.3.2 Multiversion Software Avizienis [AC77] was the first to adapt NMR technique into
Human motion tracking by temporal-spatial local gaussian process experts.
Zhao, Xu; Fu, Yun; Liu, Yuncai
2011-04-01
Human pose estimation via motion tracking systems can be considered as a regression problem within a discriminative framework. It is always a challenging task to model the mapping from observation space to state space because of the high-dimensional characteristic in the multimodal conditional distribution. In order to build the mapping, existing techniques usually involve a large set of training samples in the learning process which are limited in their capability to deal with multimodality. We propose, in this work, a novel online sparse Gaussian Process (GP) regression model to recover 3-D human motion in monocular videos. Particularly, we investigate the fact that for a given test input, its output is mainly determined by the training samples potentially residing in its local neighborhood and defined in the unified input-output space. This leads to a local mixture GP experts system composed of different local GP experts, each of which dominates a mapping behavior with the specific covariance function adapting to a local region. To handle the multimodality, we combine both temporal and spatial information therefore to obtain two categories of local experts. The temporal and spatial experts are integrated into a seamless hybrid system, which is automatically self-initialized and robust for visual tracking of nonlinear human motion. Learning and inference are extremely efficient as all the local experts are defined online within very small neighborhoods. Extensive experiments on two real-world databases, HumanEva and PEAR, demonstrate the effectiveness of our proposed model, which significantly improve the performance of existing models.
Use of a machine learning framework to predict substance use disorder treatment success
Kelmansky, Diana; van der Laan, Mark; Sahker, Ethan; Jones, DeShauna; Arndt, Stephan
2017-01-01
There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed. PMID:28394905
Besseris, George J
2018-03-01
Generalized regression neural networks (GRNN) may act as crowdsourcing cognitive agents to screen small, dense and complex datasets. The concurrent screening and optimization of several complex physical and sensory traits of bread is developed using a structured Taguchi-type micro-mining technique. A novel product outlook is offered to industrial operations to cover separate aspects of smart product design, engineering and marketing. Four controlling factors were selected to be modulated directly on a modern production line: 1) the dough weight, 2) the proofing time, 3) the baking time, and 4) the oven zone temperatures. Concentrated experimental recipes were programmed using the Taguchi-type L 9 (3 4 ) OA-sampler to detect potentially non-linear multi-response tendencies. The fused behavior of the master-ranked bread characteristics behavior was smart sampled with GRNN-crowdsourcing and robust analysis. It was found that the combination of the oven zone temperatures to play a highly influential role in all investigated scenarios. Moreover, the oven zone temperatures and the dough weight appeared to be instrumental when attempting to synchronously adjusting all four physical characteristics. The optimal oven-zone temperature setting for concurrent screening-and-optimization was found to be 270-240 °C. The optimized (median) responses for loaf weight, moisture, height, width, color, flavor, crumb structure, softness, and elasticity are: 782 g, 34.8 %, 9.36 cm, 10.41 cm, 6.6, 7.2, 7.6, 7.3, and 7.0, respectively.
Wang, Peifang; Wang, Teng; Yao, Yu; Wang, Chao; Liu, Cui; Yuan, Ye
2016-01-01
Management of heavy metal contamination requires accurate information about the distribution of bioavailable fractions, and about exchange between the solid and solution phases. In this study, we employed diffusive gradients in thin-films (DGT) and traditional chemical extraction methods (soil solution, HOAc, EDTA, CaCl2, and NaOAc) to determine the Cd bioavailability in Cd-contaminated soil with the addition of Pb. Two typical terrestrial species (wheat, Bainong AK58; maize, Zhengdan 958) were selected as the accumulation plants. The results showed that the added Pb may enhance the efficiency of Cd phytoextraction which is indicated by the increasing concentration of Cd accumulating in the plant tissues. The DGT-measured Cd concentrations and all the selected traditional extractants measured Cd concentrations all increased with increasing concentration of the addition Pb which were similar to the change trends of the accumulated Cd concentrations in plant tissues. Moreover, the Pearson regression coefficients between the different indicators obtained Cd concentrations and plants uptake Cd concentrations were further indicated significant correlations (p < 0.01). However, the values of Pearson regression coefficients showed the merits of DGT, CaCl2, and Csol over the other three methods. Consequently, the in situ measurement of DGT and the ex situ traditional methods could all reflect the inhibition effects between Cd and Pb. Due to the feature of dynamic measurements of DGT, it could be a robust tool to predict Cd bioavaiability in complex contaminated soil. PMID:27271644
Use of a machine learning framework to predict substance use disorder treatment success.
Acion, Laura; Kelmansky, Diana; van der Laan, Mark; Sahker, Ethan; Jones, DeShauna; Arndt, Stephan
2017-01-01
There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed.
2012-01-01
Background A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for within-participant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonly-used statistical methods for analyzing DCE data while accounting for within-participant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002. Methods A two-stage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using cluster-robust standard error (SE), random-effects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with cluster-robust SE and random-effects multinomial logistic model). We also fitted a bivariate probit model with cluster-robust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness. Results In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stage-one data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stage-two data on evaluating participants' willingness to undertake the tests. Little within-patient correlation (ICC ≈ 0) was found in stage-one data, but substantial within-patient correlation existed (ICC = 0.659) in stage-two data. Conclusions When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies. PMID:22348526
NASA Astrophysics Data System (ADS)
Loria Salazar, S. M.; Holmes, H.
2015-12-01
Health effects studies of aerosol pollution have been extended spatially using data assimilation techniques that combine surface PM2.5 concentrations and Aerosol Optical Depth (AOD) from satellite retrievals. While most of these models were developed for the dark-vegetated eastern U.S. they are being used in the semi-arid western U.S. to remotely sense atmospheric aerosol concentrations. These models are helpful to understand the spatial variability of surface PM2.5concentrations in the western U.S. because of the sparse network of surface monitoring stations. However, the models developed for the eastern U.S. are not robust in the western U.S. due to different aerosol formation mechanisms, transport phenomena, and optical properties. This region is a challenge because of complex terrain, anthropogenic and biogenic emissions, secondary organic aerosol formation, smoke from wildfires, and low background aerosol concentrations. This research concentrates on the use and evaluation of satellite remote sensing to estimate surface PM2.5 concentrations from AOD satellite retrievals over California and Nevada during the summer months of 2012 and 2013. The aim of this investigation is to incorporate a spatial statistical model that uses AOD from AERONET as well as MODIS, surface PM2.5 concentrations, and land-use regression to characterize spatial surface PM2.5 concentrations. The land use regression model uses traditional inputs (e.g. meteorology, population density, terrain) and non-traditional variables (e.g. FIre Inventory from NCAR (FINN) emissions and MODIS albedo product) to account for variability related to smoke plume trajectories and land use. The results will be used in a spatially resolved health study to determine the association between wildfire smoke exposure and cardiorespiratory health endpoints. This relationship can be used with future projections of wildfire emissions related to climate change and droughts to quantify the expected health impact.
Alves, Pedro; Liu, Shuang; Wang, Daifeng; Gerstein, Mark
2018-01-01
Machine learning is an integral part of computational biology, and has already shown its use in various applications, such as prognostic tests. In the last few years in the non-biological machine learning community, ensembling techniques have shown their power in data mining competitions such as the Netflix challenge; however, such methods have not found wide use in computational biology. In this work, we endeavor to show how ensembling techniques can be applied to practical problems, including problems in the field of bioinformatics, and how they often outperform other machine learning techniques in both predictive power and robustness. Furthermore, we develop a methodology of ensembling, Multi-Swarm Ensemble (MSWE) by using multiple particle swarm optimizations and demonstrate its ability to further enhance the performance of ensembles.
Image Hashes as Templates for Verification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Janik, Tadeusz; Jarman, Kenneth D.; Robinson, Sean M.
2012-07-17
Imaging systems can provide measurements that confidently assess characteristics of nuclear weapons and dismantled weapon components, and such assessment will be needed in future verification for arms control. Yet imaging is often viewed as too intrusive, raising concern about the ability to protect sensitive information. In particular, the prospect of using image-based templates for verifying the presence or absence of a warhead, or of the declared configuration of fissile material in storage, may be rejected out-of-hand as being too vulnerable to violation of information barrier (IB) principles. Development of a rigorous approach for generating and comparing reduced-information templates from images,more » and assessing the security, sensitivity, and robustness of verification using such templates, are needed to address these concerns. We discuss our efforts to develop such a rigorous approach based on a combination of image-feature extraction and encryption-utilizing hash functions to confirm proffered declarations, providing strong classified data security while maintaining high confidence for verification. The proposed work is focused on developing secure, robust, tamper-sensitive and automatic techniques that may enable the comparison of non-sensitive hashed image data outside an IB. It is rooted in research on so-called perceptual hash functions for image comparison, at the interface of signal/image processing, pattern recognition, cryptography, and information theory. Such perceptual or robust image hashing—which, strictly speaking, is not truly cryptographic hashing—has extensive application in content authentication and information retrieval, database search, and security assurance. Applying and extending the principles of perceptual hashing to imaging for arms control, we propose techniques that are sensitive to altering, forging and tampering of the imaged object yet robust and tolerant to content-preserving image distortions and noise. Ensuring that the information contained in the hashed image data (available out-of-IB) cannot be used to extract sensitive information about the imaged object is of primary concern. Thus the techniques are characterized by high unpredictability to guarantee security. We will present an assessment of the performance of our techniques with respect to security, sensitivity and robustness on the basis of a methodical and mathematically precise framework.« less
NASA Astrophysics Data System (ADS)
Vavadi, Hamed; Mostafa, Atahar; Li, Jinglong; Zhou, Feifei; Uddin, Shihab; Xu, Chen; Zhu, Quing
2017-02-01
According to the World Health Organization, breast cancer is the most common cancer among women worldwide, claiming the lives of hundreds of thousands of women each year. Near infrared diffuse optical tomography (DOT) has demonstrated a great potential as an adjunct modality for differentiation of malignant and benign breast lesions and for monitoring treatment response of patients with locally advanced breast cancers. The path toward commercialization of DOT techniques depends upon the improvement of robustness and user-friendliness of this technique in hardware and software. In the past, our group have developed three frequency domain prototype systems which were used in several clinical studies. In this study, we introduce our newly under development US-guided DOT system which is being improved in terms of size, robustness and user friendliness by several custom electronic and mechanical design. A new and robust probe designed to reduce preparation time in clinical process. The processing procedure, data selection and user interface software also updated. With all these improvements, our new system is more robust and accurate which is one step closer to commercialization and wide use of this technology in clinical settings. This system is aimed to be used by minimally trained user in the clinical settings with robust performance. The system performance has been tested in the phantom experiment and initial results are demonstrated in this study. We are currently working on finalizing this system and do further testing to validate the performance of this system. We are aiming toward use of this system in clinical setting for patients with breast cancer.
Liquid electrolyte informatics using an exhaustive search with linear regression.
Sodeyama, Keitaro; Igarashi, Yasuhiko; Nakayama, Tomofumi; Tateyama, Yoshitaka; Okada, Masato
2018-06-14
Exploring new liquid electrolyte materials is a fundamental target for developing new high-performance lithium-ion batteries. In contrast to solid materials, disordered liquid solution properties have been less studied by data-driven information techniques. Here, we examined the estimation accuracy and efficiency of three information techniques, multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), and exhaustive search with linear regression (ES-LiR), by using coordination energy and melting point as test liquid properties. We then confirmed that ES-LiR gives the most accurate estimation among the techniques. We also found that ES-LiR can provide the relationship between the "prediction accuracy" and "calculation cost" of the properties via a weight diagram of descriptors. This technique makes it possible to choose the balance of the "accuracy" and "cost" when the search of a huge amount of new materials was carried out.
Alexander, Terry W.; Wilson, Gary L.
1995-01-01
A generalized least-squares regression technique was used to relate the 2- to 500-year flood discharges from 278 selected streamflow-gaging stations to statistically significant basin characteristics. The regression relations (estimating equations) were defined for three hydrologic regions (I, II, and III) in rural Missouri. Ordinary least-squares regression analyses indicate that drainage area (Regions I, II, and III) and main-channel slope (Regions I and II) are the only basin characteristics needed for computing the 2- to 500-year design-flood discharges at gaged or ungaged stream locations. The resulting generalized least-squares regression equations provide a technique for estimating the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year flood discharges on unregulated streams in rural Missouri. The regression equations for Regions I and II were developed from stream-flow-gaging stations with drainage areas ranging from 0.13 to 11,500 square miles and 0.13 to 14,000 square miles, and main-channel slopes ranging from 1.35 to 150 feet per mile and 1.20 to 279 feet per mile. The regression equations for Region III were developed from streamflow-gaging stations with drainage areas ranging from 0.48 to 1,040 square miles. Standard errors of estimate for the generalized least-squares regression equations in Regions I, II, and m ranged from 30 to 49 percent.
Forest type mapping of the Interior West
Bonnie Ruefenacht; Gretchen G. Moisen; Jock A. Blackard
2004-01-01
This paper develops techniques for the mapping of forest types in Arizona, New Mexico, and Wyoming. The methods involve regression-tree modeling using a variety of remote sensing and GIS layers along with Forest Inventory Analysis (FIA) point data. Regression-tree modeling is a fast and efficient technique of estimating variables for large data sets with high accuracy...
Mokhtari, Amirhossein; Christopher Frey, H; Zheng, Junyu
2006-11-01
Sensitivity analyses of exposure or risk models can help identify the most significant factors to aid in risk management or to prioritize additional research to reduce uncertainty in the estimates. However, sensitivity analysis is challenged by non-linearity, interactions between inputs, and multiple days or time scales. Selected sensitivity analysis methods are evaluated with respect to their applicability to human exposure models with such features using a testbed. The testbed is a simplified version of a US Environmental Protection Agency's Stochastic Human Exposure and Dose Simulation (SHEDS) model. The methods evaluated include the Pearson and Spearman correlation, sample and rank regression, analysis of variance, Fourier amplitude sensitivity test (FAST), and Sobol's method. The first five methods are known as "sampling-based" techniques, wheras the latter two methods are known as "variance-based" techniques. The main objective of the test cases was to identify the main and total contributions of individual inputs to the output variance. Sobol's method and FAST directly quantified these measures of sensitivity. Results show that sensitivity of an input typically changed when evaluated under different time scales (e.g., daily versus monthly). All methods provided similar insights regarding less important inputs; however, Sobol's method and FAST provided more robust insights with respect to sensitivity of important inputs compared to the sampling-based techniques. Thus, the sampling-based methods can be used in a screening step to identify unimportant inputs, followed by application of more computationally intensive refined methods to a smaller set of inputs. The implications of time variation in sensitivity results for risk management are briefly discussed.
NASA Astrophysics Data System (ADS)
Kim, Saejoon
2018-01-01
We consider the problem of low-volatility portfolio selection which has been the subject of extensive research in the field of portfolio selection. To improve the currently existing techniques that rely purely on past information to select low-volatility portfolios, this paper investigates the use of time series regression techniques that make forecasts of future volatility to select the portfolios. In particular, for the first time, the utility of support vector regression and its enhancements as portfolio selection techniques is provided. It is shown that our regression-based portfolio selection provides attractive outperformances compared to the benchmark index and the portfolio defined by a well-known strategy on the data-sets of the S&P 500 and the KOSPI 200.
Aeroelastic Model Structure Computation for Envelope Expansion
NASA Technical Reports Server (NTRS)
Kukreja, Sunil L.
2007-01-01
Structure detection is a procedure for selecting a subset of candidate terms, from a full model description, that best describes the observed output. This is a necessary procedure to compute an efficient system description which may afford greater insight into the functionality of the system or a simpler controller design. Structure computation as a tool for black-box modeling may be of critical importance in the development of robust, parsimonious models for the flight-test community. Moreover, this approach may lead to efficient strategies for rapid envelope expansion that may save significant development time and costs. In this study, a least absolute shrinkage and selection operator (LASSO) technique is investigated for computing efficient model descriptions of non-linear aeroelastic systems. The LASSO minimises the residual sum of squares with the addition of an l(Sub 1) penalty term on the parameter vector of the traditional l(sub 2) minimisation problem. Its use for structure detection is a natural extension of this constrained minimisation approach to pseudo-linear regression problems which produces some model parameters that are exactly zero and, therefore, yields a parsimonious system description. Applicability of this technique for model structure computation for the F/A-18 (McDonnell Douglas, now The Boeing Company, Chicago, Illinois) Active Aeroelastic Wing project using flight test data is shown for several flight conditions (Mach numbers) by identifying a parsimonious system description with a high percent fit for cross-validated data.
Kastberger, G; Kranner, G
2000-02-01
Viscovery SOMine is a software tool for advanced analysis and monitoring of numerical data sets. It was developed for professional use in business, industry, and science and to support dependency analysis, deviation detection, unsupervised clustering, nonlinear regression, data association, pattern recognition, and animated monitoring. Based on the concept of self-organizing maps (SOMs), it employs a robust variant of unsupervised neural networks--namely, Kohonen's Batch-SOM, which is further enhanced with a new scaling technique for speeding up the learning process. This tool provides a powerful means by which to analyze complex data sets without prior statistical knowledge. The data representation contained in the trained SOM is systematically converted to be used in a spectrum of visualization techniques, such as evaluating dependencies between components, investigating geometric properties of the data distribution, searching for clusters, or monitoring new data. We have used this software tool to analyze and visualize multiple influences of the ocellar system on free-flight behavior in giant honeybees. Occlusion of ocelli will affect orienting reactivities in relation to flight target, level of disturbance, and position of the bee in the flight chamber; it will induce phototaxis and make orienting imprecise and dependent on motivational settings. Ocelli permit the adjustment of orienting strategies to environmental demands by enforcing abilities such as centering or flight kinetics and by providing independent control of posture and flight course.
Hyperspectral techniques in analysis of oral dosage forms.
Hamilton, Sara J; Lowell, Amanda E; Lodder, Robert A
2002-10-01
Pharmaceutical oral dosage forms are used in this paper to test the sensitivity and spatial resolution of hyperspectral imaging instruments. The first experiment tested the hypothesis that a near-infrared (IR) tunable diode-based remote sensing system is capable of monitoring degradation of hard gelatin capsules at a relatively long distance (0.5 km). Spectra from the capsules were used to differentiate among capsules exposed to an atmosphere containing 150 ppb formaldehyde for 0, 2, 4, and 8 h. Robust median-based principal component regression with Bayesian inference was employed for outlier detection. The second experiment tested the hypothesis that near-IR imaging spectrometry of tablets permits the identification and composition of multiple individual tablets to be determined simultaneously. A near-IR camera was used to collect thousands of spectra simultaneously from a field of blister-packaged tablets. The number of tablets that a typical near-IR camera can currently analyze simultaneously was estimated to be approximately 1300. The bootstrap error-adjusted single-sample technique chemometric-imaging algorithm was used to draw probability-density contour plots that revealed tablet composition. The single-capsule analysis provides an indication of how far apart the sample and instrumentation can be and still maintain adequate signal-to-noise ratio (S/N), while the multiple-tablet imaging experiment gives an indication of how many samples can be analyzed simultaneously while maintaining an adequate S/N and pixel coverage on each sample.