outlier detection method: Topics by Science.gov

Sample records for outlier detection method

Bayesian methods for outliers detection in GNSS time series

NASA Astrophysics Data System (ADS)

Qianqian, Zhang; Qingming, Gui

2013-07-01

This article is concerned with the problem of detecting outliers in GNSS time series based on Bayesian statistical theory. Firstly, a new model is proposed to simultaneously detect different types of outliers based on the conception of introducing different types of classification variables corresponding to the different types of outliers; the problem of outlier detection is converted into the computation of the corresponding posterior probabilities, and the algorithm for computing the posterior probabilities based on standard Gibbs sampler is designed. Secondly, we analyze the reasons of masking and swamping about detecting patches of additive outliers intensively; an unmasking Bayesian method for detecting additive outlier patches is proposed based on an adaptive Gibbs sampler. Thirdly, the correctness of the theories and methods proposed above is illustrated by simulated data and then by analyzing real GNSS observations, such as cycle slips detection in carrier phase data. Examples illustrate that the Bayesian methods for outliers detection in GNSS time series proposed by this paper are not only capable of detecting isolated outliers but also capable of detecting additive outlier patches. Furthermore, it can be successfully used to process cycle slips in phase data, which solves the problem of small cycle slips.
Incremental Principal Component Analysis Based Outlier Detection Methods for Spatiotemporal Data Streams

NASA Astrophysics Data System (ADS)

Bhushan, A.; Sharker, M. H.; Karimi, H. A.

2015-07-01

In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams.
Development of a methodology for the detection of hospital financial outliers using information systems.

PubMed

Okada, Sachiko; Nagase, Keisuke; Ito, Ayako; Ando, Fumihiko; Nakagawa, Yoshiaki; Okamoto, Kazuya; Kume, Naoto; Takemura, Tadamasa; Kuroda, Tomohiro; Yoshihara, Hiroyuki

2014-01-01

Comparison of financial indices helps to illustrate differences in operations and efficiency among similar hospitals. Outlier data tend to influence statistical indices, and so detection of outliers is desirable. Development of a methodology for financial outlier detection using information systems will help to reduce the time and effort required, eliminate the subjective elements in detection of outlier data, and improve the efficiency and quality of analysis. The purpose of this research was to develop such a methodology. Financial outliers were defined based on a case model. An outlier-detection method using the distances between cases in multi-dimensional space is proposed. Experiments using three diagnosis groups indicated successful detection of cases for which the profitability and income structure differed from other cases. Therefore, the method proposed here can be used to detect outliers. Copyright © 2013 John Wiley & Sons, Ltd.
Outlier detection for particle image velocimetry data using a locally estimated noise variance

NASA Astrophysics Data System (ADS)

Lee, Yong; Yang, Hua; Yin, ZhouPing

2017-03-01

This work describes an adaptive spatial variable threshold outlier detection algorithm for raw gridded particle image velocimetry data using a locally estimated noise variance. This method is an iterative procedure, and each iteration is composed of a reference vector field reconstruction step and an outlier detection step. We construct the reference vector field using a weighted adaptive smoothing method (Garcia 2010 Comput. Stat. Data Anal. 54 1167-78), and the weights are determined in the outlier detection step using a modified outlier detector (Ma et al 2014 IEEE Trans. Image Process. 23 1706-21). A hard decision on the final weights of the iteration can produce outlier labels of the field. The technical contribution is that the spatial variable threshold motivation is embedded in the modified outlier detector with a locally estimated noise variance in an iterative framework for the first time. It turns out that a spatial variable threshold is preferable to a single spatial constant threshold in complicated flows such as vortex flows or turbulent flows. Synthetic cellular vortical flows with simulated scattered or clustered outliers are adopted to evaluate the performance of our proposed method in comparison with popular validation approaches. This method also turns out to be beneficial in a real PIV measurement of turbulent flow. The experimental results demonstrated that the proposed method yields the competitive performance in terms of outlier under-detection count and over-detection count. In addition, the outlier detection method is computational efficient and adaptive, requires no user-defined parameters, and corresponding implementations are also provided in supplementary materials.
Outlier Detection in Urban Air Quality Sensor Networks.

PubMed

van Zoest, V M; Stein, A; Hoek, G

2018-01-01

Low-cost urban air quality sensor networks are increasingly used to study the spatio-temporal variability in air pollutant concentrations. Recently installed low-cost urban sensors, however, are more prone to result in erroneous data than conventional monitors, e.g., leading to outliers. Commonly applied outlier detection methods are unsuitable for air pollutant measurements that have large spatial and temporal variations as occur in urban areas. We present a novel outlier detection method based upon a spatio-temporal classification, focusing on hourly NO 2 concentrations. We divide a full year's observations into 16 spatio-temporal classes, reflecting urban background vs. urban traffic stations, weekdays vs. weekends, and four periods per day. For each spatio-temporal class, we detect outliers using the mean and standard deviation of the normal distribution underlying the truncated normal distribution of the NO 2 observations. Applying this method to a low-cost air quality sensor network in the city of Eindhoven, the Netherlands, we found 0.1-0.5% of outliers. Outliers could reflect measurement errors or unusual high air pollution events. Additional evaluation using expert knowledge is needed to decide on treatment of the identified outliers. We conclude that our method is able to detect outliers while maintaining the spatio-temporal variability of air pollutant concentrations in urban areas.
Ensemble Learning Method for Outlier Detection and its Application to Astronomical Light Curves

NASA Astrophysics Data System (ADS)

Nun, Isadora; Protopapas, Pavlos; Sim, Brandon; Chen, Wesley

2016-09-01

Outlier detection is necessary for automated data analysis, with specific applications spanning almost every domain from financial markets to epidemiology to fraud detection. We introduce a novel mixture of the experts outlier detection model, which uses a dynamically trained, weighted network of five distinct outlier detection methods. After dimensionality reduction, individual outlier detection methods score each data point for “outlierness” in this new feature space. Our model then uses dynamically trained parameters to weigh the scores of each method, allowing for a finalized outlier score. We find that the mixture of experts model performs, on average, better than any single expert model in identifying both artificially and manually picked outliers. This mixture model is applied to a data set of astronomical light curves, after dimensionality reduction via time series feature extraction. Our model was tested using three fields from the MACHO catalog and generated a list of anomalous candidates. We confirm that the outliers detected using this method belong to rare classes, like Novae, He-burning, and red giant stars; other outlier light curves identified have no available information associated with them. To elucidate their nature, we created a website containing the light-curve data and information about these objects. Users can attempt to classify the light curves, give conjectures about their identities, and sign up for follow up messages about the progress made on identifying these objects. This user submitted data can be used further train of our mixture of experts model. Our code is publicly available to all who are interested.
Detecting multiple outliers in linear functional relationship model for circular variables using clustering technique

NASA Astrophysics Data System (ADS)

Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor

2017-05-01

Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.
Stratification-Based Outlier Detection over the Deep Web.

PubMed

Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

2016-01-01

For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
Stratification-Based Outlier Detection over the Deep Web

PubMed Central

Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S.; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

2016-01-01

For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web. PMID:27313603
Detecting measurement outliers: remeasure efficiently

NASA Astrophysics Data System (ADS)

Ullrich, Albrecht

2010-09-01

Shrinking structures, advanced optical proximity correction (OPC) and complex measurement strategies continually challenge critical dimension (CD) metrology tools and recipe creation processes. One important quality ensuring task is the control of measurement outlier behavior. Outliers could trigger false positive alarm for specification violations impacting cycle time or potentially yield. Constant high level of outliers not only deteriorates cycle time but also puts unnecessary stress on tool operators leading eventually to human errors. At tool level the sources of outliers are natural variations (e.g. beam current etc.), drifts, contrast conditions, focus determination or pattern recognition issues, etc. Some of these can result from suboptimal or even wrong recipe settings, like focus position or measurement box size. Such outliers, created by an automatic recipe creation process faced with more complicated structures, would manifest itself rather as systematic variation of measurements than the one caused by 'pure' tool variation. I analyzed several statistical methods to detect outliers. These range from classical outlier tests for extrema, robust metrics like interquartile range (IQR) to methods evaluating the distribution of different populations of measurement sites, like the Cochran test. The latter suits especially the detection of systematic effects. The next level of outlier detection entwines additional information about the mask and the manufacturing process with the measurement results. The methods were reviewed for measured variations assumed to be normally distributed with zero mean but also for the presence of a statistically significant spatial process signature. I arrive at the conclusion that intelligent outlier detection can influence the efficiency and cycle time of CD metrology greatly. In combination with process information like target, typical platform variation and signature, one can tailor the detection to the needs of the photomask at hand. By monitoring the outlier behavior carefully, weaknesses of the automatic recipe creation process can be spotted.
Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection

ERIC Educational Resources Information Center

Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

2011-01-01

Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to…
Online Conditional Outlier Detection in Nonstationary Time Series

PubMed Central

Liu, Siqi; Wright, Adam; Hauskrecht, Milos

2017-01-01

The objective of this work is to develop methods for detecting outliers in time series data. Such methods can become the key component of various monitoring and alerting systems, where an outlier may be equal to some adverse condition that needs human attention. However, real-world time series are often affected by various sources of variability present in the environment that may influence the quality of detection; they may (1) explain some of the changes in the signal that would otherwise lead to false positive detections, as well as, (2) reduce the sensitivity of the detection algorithm leading to increase in false negatives. To alleviate these problems, we propose a new two-layer outlier detection approach that first tries to model and account for the nonstationarity and periodic variation in the time series, and then tries to use other observable variables in the environment to explain any additional signal variation. Our experiments on several data sets in different domains show that our method provides more accurate modeling of the time series, and that it is able to significantly improve outlier detection performance. PMID:29644345
Online Conditional Outlier Detection in Nonstationary Time Series.

PubMed

Liu, Siqi; Wright, Adam; Hauskrecht, Milos

2017-05-01

The objective of this work is to develop methods for detecting outliers in time series data. Such methods can become the key component of various monitoring and alerting systems, where an outlier may be equal to some adverse condition that needs human attention. However, real-world time series are often affected by various sources of variability present in the environment that may influence the quality of detection; they may (1) explain some of the changes in the signal that would otherwise lead to false positive detections, as well as, (2) reduce the sensitivity of the detection algorithm leading to increase in false negatives. To alleviate these problems, we propose a new two-layer outlier detection approach that first tries to model and account for the nonstationarity and periodic variation in the time series, and then tries to use other observable variables in the environment to explain any additional signal variation. Our experiments on several data sets in different domains show that our method provides more accurate modeling of the time series, and that it is able to significantly improve outlier detection performance.
Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate

PubMed Central

Motulsky, Harvey J; Brown, Ronald E

2006-01-01

Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949
Trend-Residual Dual Modeling for Detection of Outliers in Low-Cost GPS Trajectories.

PubMed

Chen, Xiaojian; Cui, Tingting; Fu, Jianhong; Peng, Jianwei; Shan, Jie

2016-12-01

Low-cost GPS (receiver) has become a ubiquitous and integral part of our daily life. Despite noticeable advantages such as being cheap, small, light, and easy to use, its limited positioning accuracy devalues and hampers its wide applications for reliable mapping and analysis. Two conventional techniques to remove outliers in a GPS trajectory are thresholding and Kalman-based methods, which are difficult in selecting appropriate thresholds and modeling the trajectories. Moreover, they are insensitive to medium and small outliers, especially for low-sample-rate trajectories. This paper proposes a model-based GPS trajectory cleaner. Rather than examining speed and acceleration or assuming a pre-determined trajectory model, we first use cubic smooth spline to adaptively model the trend of the trajectory. The residuals, i.e., the differences between the trend and GPS measurements, are then further modeled by time series method. Outliers are detected by scoring the residuals at every GPS trajectory point. Comparing to the conventional procedures, the trend-residual dual modeling approach has the following features: (a) it is able to model trajectories and detect outliers adaptively; (b) only one critical value for outlier scores needs to be set; (c) it is able to robustly detect unapparent outliers; and (d) it is effective in cleaning outliers for GPS trajectories with low sample rates. Tests are carried out on three real-world GPS trajectories datasets. The evaluation demonstrates an average of 9.27 times better performance in outlier detection for GPS trajectories than thresholding and Kalman-based techniques.
A simple transformation independent method for outlier definition.

PubMed

Johansen, Martin Berg; Christensen, Peter Astrup

2018-04-10

Definition and elimination of outliers is a key element for medical laboratories establishing or verifying reference intervals (RIs). Especially as inclusion of just a few outlying observations may seriously affect the determination of the reference limits. Many methods have been developed for definition of outliers. Several of these methods are developed for the normal distribution and often data require transformation before outlier elimination. We have developed a non-parametric transformation independent outlier definition. The new method relies on drawing reproducible histograms. This is done by using defined bin sizes above and below the median. The method is compared to the method recommended by CLSI/IFCC, which uses Box-Cox transformation (BCT) and Tukey's fences for outlier definition. The comparison is done on eight simulated distributions and an indirect clinical datasets. The comparison on simulated distributions shows that without outliers added the recommended method in general defines fewer outliers. However, when outliers are added on one side the proposed method often produces better results. With outliers on both sides the methods are equally good. Furthermore, it is found that the presence of outliers affects the BCT, and subsequently affects the determined limits of current recommended methods. This is especially seen in skewed distributions. The proposed outlier definition reproduced current RI limits on clinical data containing outliers. We find our simple transformation independent outlier detection method as good as or better than the currently recommended methods.
The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

NASA Astrophysics Data System (ADS)

Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

2017-05-01

Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Trend-Residual Dual Modeling for Detection of Outliers in Low-Cost GPS Trajectories

PubMed Central

Chen, Xiaojian; Cui, Tingting; Fu, Jianhong; Peng, Jianwei; Shan, Jie

2016-01-01

Low-cost GPS (receiver) has become a ubiquitous and integral part of our daily life. Despite noticeable advantages such as being cheap, small, light, and easy to use, its limited positioning accuracy devalues and hampers its wide applications for reliable mapping and analysis. Two conventional techniques to remove outliers in a GPS trajectory are thresholding and Kalman-based methods, which are difficult in selecting appropriate thresholds and modeling the trajectories. Moreover, they are insensitive to medium and small outliers, especially for low-sample-rate trajectories. This paper proposes a model-based GPS trajectory cleaner. Rather than examining speed and acceleration or assuming a pre-determined trajectory model, we first use cubic smooth spline to adaptively model the trend of the trajectory. The residuals, i.e., the differences between the trend and GPS measurements, are then further modeled by time series method. Outliers are detected by scoring the residuals at every GPS trajectory point. Comparing to the conventional procedures, the trend-residual dual modeling approach has the following features: (a) it is able to model trajectories and detect outliers adaptively; (b) only one critical value for outlier scores needs to be set; (c) it is able to robustly detect unapparent outliers; and (d) it is effective in cleaning outliers for GPS trajectories with low sample rates. Tests are carried out on three real-world GPS trajectories datasets. The evaluation demonstrates an average of 9.27 times better performance in outlier detection for GPS trajectories than thresholding and Kalman-based techniques. PMID:27916944
[Research on outlier detection methods for determination of oil yield in oil shales using near-infrared spectroscopy].

PubMed

Zhang, Huai-zhu; Lin, Jun; Zhang, Huai-Zhu

2014-06-01

In the present paper, the outlier detection methods for determination of oil yield in oil shale using near-infrared (NIR) diffuse reflection spectroscopy was studied. During the quantitative analysis with near-infrared spectroscopy, environmental change and operator error will both produce outliers. The presence of outliers will affect the overall distribution trend of samples and lead to the decrease in predictive capability. Thus, the detection of outliers are important for the construction of high-quality calibration models. The methods including principal component analysis-Mahalanobis distance (PCA-MD) and resampling by half-means (RHM) were applied to the discrimination and elimination of outliers in this work. The thresholds and confidences for MD and RHM were optimized using the performance of partial least squares (PLS) models constructed after the elimination of outliers, respectively. Compared with the model constructed with the data of full spectrum, the values of RMSEP of the models constructed with the application of PCA-MD with a threshold of a value equal to the sum of average and standard deviation of MD, RHM with the confidence level of 85%, and the combination of PCA-MD and RHM, were reduced by 48.3%, 27.5% and 44.8%, respectively. The predictive ability of the calibration model has been improved effectively.
Identifying Outliers of Non-Gaussian Groundwater State Data Based on Ensemble Estimation for Long-Term Trends

NASA Astrophysics Data System (ADS)

Park, E.; Jeong, J.; Choi, J.; Han, W. S.; Yun, S. T.

2016-12-01

Three modified outlier identification methods: the three sigma rule (3s), inter quantile range (IQR) and median absolute deviation (MAD), which take advantage of the ensemble regression method are proposed. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method is found to have a limitation in the false identification of excessive outliers, which may be supplemented by joint applications with the other methods (i.e., the 3s rule and MAD methods). The proposed methods can be also applied as a potential tool for future anomaly detection by model training based on currently available data.

Sparsity-weighted outlier FLOODing (OFLOOD) method: Efficient rare event sampling method using sparsity of distribution.

PubMed

Harada, Ryuhei; Nakamura, Tomotake; Shigeta, Yasuteru

2016-03-30

As an extension of the Outlier FLOODing (OFLOOD) method [Harada et al., J. Comput. Chem. 2015, 36, 763], the sparsity of the outliers defined by a hierarchical clustering algorithm, FlexDice, was considered to achieve an efficient conformational search as sparsity-weighted "OFLOOD." In OFLOOD, FlexDice detects areas of sparse distribution as outliers. The outliers are regarded as candidates that have high potential to promote conformational transitions and are employed as initial structures for conformational resampling by restarting molecular dynamics simulations. When detecting outliers, FlexDice defines a rank in the hierarchy for each outlier, which relates to sparsity in the distribution. In this study, we define a lower rank (first ranked), a medium rank (second ranked), and the highest rank (third ranked) outliers, respectively. For instance, the first-ranked outliers are located in a given conformational space away from the clusters (highly sparse distribution), whereas those with the third-ranked outliers are nearby the clusters (a moderately sparse distribution). To achieve the conformational search efficiently, resampling from the outliers with a given rank is performed. As demonstrations, this method was applied to several model systems: Alanine dipeptide, Met-enkephalin, Trp-cage, T4 lysozyme, and glutamine binding protein. In each demonstration, the present method successfully reproduced transitions among metastable states. In particular, the first-ranked OFLOOD highly accelerated the exploration of conformational space by expanding the edges. In contrast, the third-ranked OFLOOD reproduced local transitions among neighboring metastable states intensively. For quantitatively evaluations of sampled snapshots, free energy calculations were performed with a combination of umbrella samplings, providing rigorous landscapes of the biomolecules. © 2015 Wiley Periodicals, Inc.
Method for outlier detection: a tool to assess the consistency between laboratory data and ultraviolet-visible absorbance spectra in wastewater samples.

PubMed

Zamora, D; Torres, A

2014-01-01

Reliable estimations of the evolution of water quality parameters by using in situ technologies make it possible to follow the operation of a wastewater treatment plant (WWTP), as well as improving the understanding and control of the operation, especially in the detection of disturbances. However, ultraviolet (UV)-Vis sensors have to be calibrated by means of a local fingerprint laboratory reference concentration-value data-set. The detection of outliers in these data-sets is therefore important. This paper presents a method for detecting outliers in UV-Vis absorbances coupled to water quality reference laboratory concentrations for samples used for calibration purposes. Application to samples from the influent of the San Fernando WWTP (Medellín, Colombia) is shown. After the removal of outliers, improvements in the predictability of the influent concentrations using absorbance spectra were found.
Identifying outliers of non-Gaussian groundwater state data based on ensemble estimation for long-term trends

NASA Astrophysics Data System (ADS)

Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kueyoung; Choung, Sungwook; Chung, Il Moon

2017-05-01

A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods - the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) - that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.
Using State Estimation Residuals to Detect Abnormal SCADA Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Jian; Chen, Yousu; Huang, Zhenyu

2010-04-30

Detection of abnormal supervisory control and data acquisition (SCADA) data is critically important for safe and secure operation of modern power systems. In this paper, a methodology of abnormal SCADA data detection based on state estimation residuals is presented. Preceded with a brief overview of outlier detection methods and bad SCADA data detection for state estimation, the framework of the proposed methodology is described. Instead of using original SCADA measurements as the bad data sources, the residuals calculated based on the results of the state estimator are used as the input for the outlier detection algorithm. The BACON algorithm ismore » applied to the outlier detection task. The IEEE 118-bus system is used as a test base to evaluate the effectiveness of the proposed methodology. The accuracy of the BACON method is compared with that of the 3-σ method for the simulated SCADA measurements and residuals.« less
Multiple approaches to detect outliers in a genome scan for selection in ocellated lizards (Lacerta lepida) along an environmental gradient.

PubMed

Nunes, Vera L; Beaumont, Mark A; Butlin, Roger K; Paulo, Octávio S

2011-01-01

Identification of loci with adaptive importance is a key step to understand the speciation process in natural populations, because those loci are responsible for phenotypic variation that affects fitness in different environments. We conducted an AFLP genome scan in populations of ocellated lizards (Lacerta lepida) to search for candidate loci influenced by selection along an environmental gradient in the Iberian Peninsula. This gradient is strongly influenced by climatic variables, and two subspecies can be recognized at the opposite extremes: L. lepida iberica in the northwest and L. lepida nevadensis in the southeast. Both subspecies show substantial morphological differences that may be involved in their local adaptation to the climatic extremes. To investigate how the use of a particular outlier detection method can influence the results, a frequentist method, DFDIST, and a Bayesian method, BayeScan, were used to search for outliers influenced by selection. Additionally, the spatial analysis method was used to test for associations of AFLP marker band frequencies with 54 climatic variables by logistic regression. Results obtained with each method highlight differences in their sensitivity. DFDIST and BayeScan detected a similar proportion of outliers (3-4%), but only a few loci were simultaneously detected by both methods. Several loci detected as outliers were also associated with temperature, insolation or precipitation according to spatial analysis method. These results are in accordance with reported data in the literature about morphological and life-history variation of L. lepida subspecies along the environmental gradient. © 2010 Blackwell Publishing Ltd.
A Finite Mixture Method for Outlier Detection and Robustness in Meta-Analysis

ERIC Educational Resources Information Center

Beath, Ken J.

2014-01-01

When performing a meta-analysis unexplained variation above that predicted by within study variation is usually modeled by a random effect. However, in some cases, this is not sufficient to explain all the variation because of outlier or unusual studies. A previously described method is to define an outlier as a study requiring a higher random…
Detecting isotopic ratio outliers

NASA Astrophysics Data System (ADS)

Bayne, C. K.; Smith, D. H.

An alternative method is proposed for improving isotopic ratio estimates. This method mathematically models pulse-count data and uses iterative reweighted Poisson regression to estimate model parameters to calculate the isotopic ratios. This computer-oriented approach provides theoretically better methods than conventional techniques to establish error limits and to identify outliers.
[Outlier sample discriminating methods for building calibration model in melons quality detecting using NIR spectra].

PubMed

Tian, Hai-Qing; Wang, Chun-Guang; Zhang, Hai-Jun; Yu, Zhi-Hong; Li, Jian-Kang

2012-11-01

Outlier samples strongly influence the precision of the calibration model in soluble solids content measurement of melons using NIR Spectra. According to the possible sources of outlier samples, three methods (predicted concentration residual test; Chauvenet test; leverage and studentized residual test) were used to discriminate these outliers respectively. Nine suspicious outliers were detected from calibration set which including 85 fruit samples. Considering the 9 suspicious outlier samples maybe contain some no-outlier samples, they were reclaimed to the model one by one to see whether they influence the model and prediction precision or not. In this way, 5 samples which were helpful to the model joined in calibration set again, and a new model was developed with the correlation coefficient (r) 0. 889 and root mean square errors for calibration (RMSEC) 0.6010 Brix. For 35 unknown samples, the root mean square errors prediction (RMSEP) was 0.854 degrees Brix. The performance of this model was more better than that developed with non outlier was eliminated from calibration set (r = 0.797, RMSEC= 0.849 degrees Brix, RMSEP = 1.19 degrees Brix), and more representative and stable with all 9 samples were eliminated from calibration set (r = 0.892, RMSEC = 0.605 degrees Brix, RMSEP = 0.862 degrees).
Observed to expected or logistic regression to identify hospitals with high or low 30-day mortality?

PubMed Central

Helgeland, Jon; Clench-Aas, Jocelyne; Laake, Petter; Veierød, Marit B.

2018-01-01

Introduction A common quality indicator for monitoring and comparing hospitals is based on death within 30 days of admission. An important use is to determine whether a hospital has higher or lower mortality than other hospitals. Thus, the ability to identify such outliers correctly is essential. Two approaches for detection are: 1) calculating the ratio of observed to expected number of deaths (OE) per hospital and 2) including all hospitals in a logistic regression (LR) comparing each hospital to a form of average over all hospitals. The aim of this study was to compare OE and LR with respect to correctly identifying 30-day mortality outliers. Modifications of the methods, i.e., variance corrected approach of OE (OE-Faris), bias corrected LR (LR-Firth), and trimmed mean variants of LR and LR-Firth were also studied. Materials and methods To study the properties of OE and LR and their variants, we performed a simulation study by generating patient data from hospitals with known outlier status (low mortality, high mortality, non-outlier). Data from simulated scenarios with varying number of hospitals, hospital volume, and mortality outlier status, were analysed by the different methods and compared by level of significance (ability to falsely claim an outlier) and power (ability to reveal an outlier). Moreover, administrative data for patients with acute myocardial infarction (AMI), stroke, and hip fracture from Norwegian hospitals for 2012–2014 were analysed. Results None of the methods achieved the nominal (test) level of significance for both low and high mortality outliers. For low mortality outliers, the levels of significance were increased four- to fivefold for OE and OE-Faris. For high mortality outliers, OE and OE-Faris, LR 25% trimmed and LR-Firth 10% and 25% trimmed maintained approximately the nominal level. The methods agreed with respect to outlier status for 94.1% of the AMI hospitals, 98.0% of the stroke, and 97.8% of the hip fracture hospitals. Conclusion We recommend, on the balance, LR-Firth 10% or 25% trimmed for detection of both low and high mortality outliers. PMID:29652941
Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

NASA Astrophysics Data System (ADS)

Goel, Amit; Montgomery, Michele

2015-08-01

Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.
Penalized unsupervised learning with outliers

PubMed Central

Witten, Daniela M.

2013-01-01

We consider the problem of performing unsupervised learning in the presence of outliers – that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an “error” term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations’ errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored. PMID:23875057
Moving object detection via low-rank total variation regularization

NASA Astrophysics Data System (ADS)

Wang, Pengcheng; Chen, Qian; Shao, Na

2016-09-01

Moving object detection is a challenging task in video surveillance. Recently proposed Robust Principal Component Analysis (RPCA) can recover the outlier patterns from the low-rank data under some mild conditions. However, the l-penalty in RPCA doesn't work well in moving object detection because the irrepresentable condition is often not satisfied. In this paper, a method based on total variation (TV) regularization scheme is proposed. In our model, image sequences captured with a static camera are highly related, which can be described using a low-rank matrix. Meanwhile, the low-rank matrix can absorb background motion, e.g. periodic and random perturbation. The foreground objects in the sequence are usually sparsely distributed and drifting continuously, and can be treated as group outliers from the highly-related background scenes. Instead of l-penalty, we exploit the total variation of the foreground. By minimizing the total variation energy, the outliers tend to collapse and finally converge to be the exact moving objects. The TV-penalty is superior to the l-penalty especially when the outlier is in the majority for some pixels, and our method can estimate the outlier explicitly with less bias but higher variance. To solve the problem, a joint optimization function is formulated and can be effectively solved through the inexact Augmented Lagrange Multiplier (ALM) method. We evaluate our method along with several state-of-the-art approaches in MATLAB. Both qualitative and quantitative results demonstrate that our proposed method works effectively on a large range of complex scenarios.
System and Method for Outlier Detection via Estimating Clusters

NASA Technical Reports Server (NTRS)

Iverson, David J. (Inventor)

2016-01-01

An efficient method and system for real-time or offline analysis of multivariate sensor data for use in anomaly detection, fault detection, and system health monitoring is provided. Models automatically derived from training data, typically nominal system data acquired from sensors in normally operating conditions or from detailed simulations, are used to identify unusual, out of family data samples (outliers) that indicate possible system failure or degradation. Outliers are determined through analyzing a degree of deviation of current system behavior from the models formed from the nominal system data. The deviation of current system behavior is presented as an easy to interpret numerical score along with a measure of the relative contribution of each system parameter to any off-nominal deviation. The techniques described herein may also be used to "clean" the training data.
Adaptive vector validation in image velocimetry to minimise the influence of outlier clusters

NASA Astrophysics Data System (ADS)

Masullo, Alessandro; Theunissen, Raf

2016-03-01

The universal outlier detection scheme (Westerweel and Scarano in Exp Fluids 39:1096-1100, 2005) and the distance-weighted universal outlier detection scheme for unstructured data (Duncan et al. in Meas Sci Technol 21:057002, 2010) are the most common PIV data validation routines. However, such techniques rely on a spatial comparison of each vector with those in a fixed-size neighbourhood and their performance subsequently suffers in the presence of clusters of outliers. This paper proposes an advancement to render outlier detection more robust while reducing the probability of mistakenly invalidating correct vectors. Velocity fields undergo a preliminary evaluation in terms of local coherency, which parametrises the extent of the neighbourhood with which each vector will be compared subsequently. Such adaptivity is shown to reduce the number of undetected outliers, even when implemented in the afore validation schemes. In addition, the authors present an alternative residual definition considering vector magnitude and angle adopting a modified Gaussian-weighted distance-based averaging median. This procedure is able to adapt the degree of acceptable background fluctuations in velocity to the local displacement magnitude. The traditional, extended and recommended validation methods are numerically assessed on the basis of flow fields from an isolated vortex, a turbulent channel flow and a DNS simulation of forced isotropic turbulence. The resulting validation method is adaptive, requires no user-defined parameters and is demonstrated to yield the best performances in terms of outlier under- and over-detection. Finally, the novel validation routine is applied to the PIV analysis of experimental studies focused on the near wake behind a porous disc and on a supersonic jet, illustrating the potential gains in spatial resolution and accuracy.
Case-Deletion Diagnostics for Maximum Likelihood Multipoint Quantitative Trait Locus Linkage Analysis

PubMed Central

Mendoza, Maria C.B.; Burns, Trudy L.; Jones, Michael P.

2009-01-01

Objectives Case-deletion diagnostic methods are tools that allow identification of influential observations that may affect parameter estimates and model fitting conclusions. The goal of this paper was to develop two case-deletion diagnostics, the exact case deletion (ECD) and the empirical influence function (EIF), for detecting outliers that can affect results of sib-pair maximum likelihood quantitative trait locus (QTL) linkage analysis. Methods Subroutines to compute the ECD and EIF were incorporated into the maximum likelihood QTL variance estimation components of the linkage analysis program MAPMAKER/SIBS. Performance of the diagnostics was compared in simulation studies that evaluated the proportion of outliers correctly identified (sensitivity), and the proportion of non-outliers correctly identified (specificity). Results Simulations involving nuclear family data sets with one outlier showed EIF sensitivities approximated ECD sensitivities well for outlier-affected parameters. Sensitivities were high, indicating the outlier was identified a high proportion of the time. Simulations also showed the enormous computational time advantage of the EIF. Diagnostics applied to body mass index in nuclear families detected observations influential on the lod score and model parameter estimates. Conclusions The EIF is a practical diagnostic tool that has the advantages of high sensitivity and quick computation. PMID:19172086
PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data.

PubMed

Mejia, Amanda F; Nebel, Mary Beth; Eloyan, Ani; Caffo, Brian; Lindquist, Martin A

2017-07-01

Outlier detection for high-dimensional (HD) data is a popular topic in modern statistical research. However, one source of HD data that has received relatively little attention is functional magnetic resonance images (fMRI), which consists of hundreds of thousands of measurements sampled at hundreds of time points. At a time when the availability of fMRI data is rapidly growing-primarily through large, publicly available grassroots datasets-automated quality control and outlier detection methods are greatly needed. We propose principal components analysis (PCA) leverage and demonstrate how it can be used to identify outlying time points in an fMRI run. Furthermore, PCA leverage is a measure of the influence of each observation on the estimation of principal components, which are often of interest in fMRI data. We also propose an alternative measure, PCA robust distance, which is less sensitive to outliers and has controllable statistical properties. The proposed methods are validated through simulation studies and are shown to be highly accurate. We also conduct a reliability study using resting-state fMRI data from the Autism Brain Imaging Data Exchange and find that removal of outliers using the proposed methods results in more reliable estimation of subject-level resting-state networks using independent components analysis. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Applying robust variant of Principal Component Analysis as a damage detector in the presence of outliers

NASA Astrophysics Data System (ADS)

Gharibnezhad, Fahit; Mujica, Luis E.; Rodellar, José

2015-01-01

Using Principal Component Analysis (PCA) for Structural Health Monitoring (SHM) has received considerable attention over the past few years. PCA has been used not only as a direct method to identify, classify and localize damages but also as a significant primary step for other methods. Despite several positive specifications that PCA conveys, it is very sensitive to outliers. Outliers are anomalous observations that can affect the variance and the covariance as vital parts of PCA method. Therefore, the results based on PCA in the presence of outliers are not fully satisfactory. As a main contribution, this work suggests the use of robust variant of PCA not sensitive to outliers, as an effective way to deal with this problem in SHM field. In addition, the robust PCA is compared with the classical PCA in the sense of detecting probable damages. The comparison between the results shows that robust PCA can distinguish the damages much better than using classical one, and even in many cases allows the detection where classic PCA is not able to discern between damaged and non-damaged structures. Moreover, different types of robust PCA are compared with each other as well as with classical counterpart in the term of damage detection. All the results are obtained through experiments with an aircraft turbine blade using piezoelectric transducers as sensors and actuators and adding simulated damages.
An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

PubMed

Leung, Yuk Yee; Chang, Chun Qi; Hung, Yeung Sam

2012-01-01

Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the 'wrong' (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.
Novel Hyperspectral Anomaly Detection Methods Based on Unsupervised Nearest Regularized Subspace

NASA Astrophysics Data System (ADS)

Hou, Z.; Chen, Y.; Tan, K.; Du, P.

2018-04-01

Anomaly detection has been of great interest in hyperspectral imagery analysis. Most conventional anomaly detectors merely take advantage of spectral and spatial information within neighboring pixels. In this paper, two methods of Unsupervised Nearest Regularized Subspace-based with Outlier Removal Anomaly Detector (UNRSORAD) and Local Summation UNRSORAD (LSUNRSORAD) are proposed, which are based on the concept that each pixel in background can be approximately represented by its spatial neighborhoods, while anomalies cannot. Using a dual window, an approximation of each testing pixel is a representation of surrounding data via a linear combination. The existence of outliers in the dual window will affect detection accuracy. Proposed detectors remove outlier pixels that are significantly different from majority of pixels. In order to make full use of various local spatial distributions information with the neighboring pixels of the pixels under test, we take the local summation dual-window sliding strategy. The residual image is constituted by subtracting the predicted background from the original hyperspectral imagery, and anomalies can be detected in the residual image. Experimental results show that the proposed methods have greatly improved the detection accuracy compared with other traditional detection method.
Detection of outliers in water quality monitoring samples using functional data analysis in San Esteban estuary (Northern Spain).

PubMed

Díaz Muñiz, C; García Nieto, P J; Alonso Fernández, J R; Martínez Torres, J; Taboada, J

2012-11-15

Water quality controls involve large number of variables and observations, often subject to some outliers. An outlier is an observation that is numerically distant from the rest of the data or that appears to deviate markedly from other members of the sample in which it occurs. An interesting analysis is to find those observations that produce measurements that are different from the pattern established in the sample. Therefore, identification of atypical observations is an important concern in water quality monitoring and a difficult task because of the multivariate nature of water quality data. Our study provides a new method for detecting outliers in water quality monitoring parameters, using oxygen and turbidity as indicator variables. Until now, methods were based on considering the different parameters as a vector whose components were their concentration values. Our approach lies in considering water quality monitoring through time as curves instead of vectors, that is to say, the data set of the problem is considered as a time-dependent function and not as a set of discrete values in different time instants. The methodology, which is based on the concept of functional depth, was applied to the detection of outliers in water quality monitoring samples in San Esteban estuary. Results were discussed in terms of origin, causes, etc., and compared with those obtained using the conventional method based on vector comparison. Finally, the advantages of the functional method are exposed. Copyright © 2012 Elsevier B.V. All rights reserved.

Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks.

PubMed

Azcorra, A; Chiroque, L F; Cuevas, R; Fernández Anta, A; Laniado, H; Lillo, R E; Romo, J; Sguera, C

2018-05-03

Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.
The detection and correction of outlying determinations that may occur during geochemical analysis

USGS Publications Warehouse

Harvey, P.K.

1974-01-01

'Wild', 'rogue' or outlying determinations occur periodically during geochemical analysis. Existing tests in the literature for the detection of such determinations within a set of replicate measurements are often misleading. This account describes the chances of detecting outliers and the extent to which correction may be made for their presence in sample sizes of three to seven replicate measurements. A systematic procedure for monitoring data for outliers is outlined. The problem of outliers becomes more important as instrumental methods of analysis become faster and more highly automated; a state in which it becomes increasingly difficult for the analyst to examine every determination. The recommended procedure is easily adapted to such analytical systems. ?? 1974.
Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method.

PubMed

Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

2016-01-25

To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.
Comparison of outlier identification methods in hospital surgical quality improvement programs.

PubMed

Bilimoria, Karl Y; Cohen, Mark E; Merkow, Ryan P; Wang, Xue; Bentrem, David J; Ingraham, Angela M; Richards, Karen; Hall, Bruce L; Ko, Clifford Y

2010-10-01

Surgeons and hospitals are being increasingly assessed by third parties regarding surgical quality and outcomes, and much of this information is reported publicly. Our objective was to compare various methods used to classify hospitals as outliers in established surgical quality assessment programs by applying each approach to a single data set. Using American College of Surgeons National Surgical Quality Improvement Program data (7/2008-6/2009), hospital risk-adjusted 30-day morbidity and mortality were assessed for general surgery at 231 hospitals (cases = 217,630) and for colorectal surgery at 109 hospitals (cases = 17,251). The number of outliers (poor performers) identified using different methods and criteria were compared. The overall morbidity was 10.3% for general surgery and 25.3% for colorectal surgery. The mortality was 1.6% for general surgery and 4.0% for colorectal surgery. Programs used different methods (logistic regression, hierarchical modeling, partitioning) and criteria (P < 0.01, P < 0.05, P < 0.10) to identify outliers. Depending on outlier identification methods and criteria employed, when each approach was applied to this single dataset, the number of outliers ranged from 7 to 57 hospitals for general surgery morbidity, 1 to 57 hospitals for general surgery mortality, 4 to 27 hospitals for colorectal morbidity, and 0 to 27 hospitals for colorectal mortality. There was considerable variation in the number of outliers identified using different detection approaches. Quality programs seem to be utilizing outlier identification methods contrary to what might be expected, thus they should justify their methodology based on the intent of the program (i.e., quality improvement vs. reimbursement). Surgeons and hospitals should be aware of variability in methods used to assess their performance as these outlier designations will likely have referral and reimbursement consequences.
Outliers in Questionnaire Data: Can They Be Detected and Should They Be Removed?

ERIC Educational Resources Information Center

Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

2011-01-01

Outliers in questionnaire data are unusual observations, which may bias statistical results, and outlier statistics may be used to detect such outliers. The authors investigated the effect outliers have on the specificity and the sensitivity of each of six different outlier statistics. The Mahalanobis distance and the item-pair based outlier…
Micro- and macro-geographic scale effect on the molecular imprint of selection and adaptation in Norway spruce.

PubMed

Scalfi, Marta; Mosca, Elena; Di Pierro, Erica Adele; Troggio, Michela; Vendramin, Giovanni Giuseppe; Sperisen, Christoph; La Porta, Nicola; Neale, David B

2014-01-01

Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F(ST)-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F(ST)-outlier methods detected together 11 F(ST)-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F(ST)-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an integrative approach combining different outlier detection methods and population sampling at different geographic scales is useful to identify loci potentially involved in adaptation.
Micro- and Macro-Geographic Scale Effect on the Molecular Imprint of Selection and Adaptation in Norway Spruce

PubMed Central

Scalfi, Marta; Mosca, Elena; Di Pierro, Erica Adele; Troggio, Michela; Vendramin, Giovanni Giuseppe; Sperisen, Christoph; La Porta, Nicola; Neale, David B.

2014-01-01

Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F ST-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F ST-outlier methods detected together 11 F ST-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F ST-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an integrative approach combining different outlier detection methods and population sampling at different geographic scales is useful to identify loci potentially involved in adaptation. PMID:25551624
Iterative outlier removal: A method for identifying outliers in laboratory recalibration studies

PubMed Central

Parrinello, Christina M.; Grams, Morgan E.; Sang, Yingying; Couper, David; Wruck, Lisa M.; Li, Danni; Eckfeldt, John H.; Selvin, Elizabeth; Coresh, Josef

2016-01-01

Background Extreme values that arise for any reason, including through non-laboratory measurement procedure-related processes (inadequate mixing, evaporation, mislabeling), lead to outliers and inflate errors in recalibration studies. We present an approach termed iterative outlier removal (IOR) for identifying such outliers. Methods We previously identified substantial laboratory drift in uric acid measurements in the Atherosclerosis Risk in Communities (ARIC) Study over time. Serum uric acid was originally measured in 1990–92 on a Coulter DACOS instrument using an uricase-based measurement procedure. To recalibrate previous measured concentrations to a newer enzymatic colorimetric measurement procedure, uric acid was re-measured in 200 participants from stored plasma in 2011–13 on a Beckman Olympus 480 autoanalyzer. To conduct IOR, we excluded data points >3 standard deviations (SDs) from the mean difference. We continued this process using the resulting data until no outliers remained. Results IOR detected more outliers and yielded greater precision in simulation. The original mean difference (SD) in uric acid was 1.25 (0.62) mg/dL. After four iterations, 9 outliers were excluded, and the mean difference (SD) was 1.23 (0.45) mg/dL. Conducting only one round of outlier removal (standard approach) would have excluded 4 outliers (mean difference [SD] = 1.22 [0.51] mg/dL). Applying the recalibration (derived from Deming regression) from each approach to the original measurements, the prevalence of hyperuricemia (>7 mg/dL) was 28.5% before IOR and 8.5% after IOR. Conclusion IOR is a useful method for removal of extreme outliers irrelevant to recalibrating laboratory measurements, and identifies more extraneous outliers than the standard approach. PMID:27197675
Visualizing Big Data Outliers through Distributed Aggregation.

PubMed

Wilkinson, Leland

2017-08-29

Visualizing outliers in massive datasets requires statistical pre-processing in order to reduce the scale of the problem to a size amenable to rendering systems like D3, Plotly or analytic systems like R or SAS. This paper presents a new algorithm, called hdoutliers, for detecting multidimensional outliers. It is unique for a) dealing with a mixture of categorical and continuous variables, b) dealing with big-p (many columns of data), c) dealing with big-n (many rows of data), d) dealing with outliers that mask other outliers, and e) dealing consistently with unidimensional and multidimensional datasets. Unlike ad hoc methods found in many machine learning papers, hdoutliers is based on a distributional model that allows outliers to be tagged with a probability. This critical feature reduces the likelihood of false discoveries.
An online outlier identification and removal scheme for improving fault detection performance.

PubMed

Ferdowsi, Hasan; Jagannathan, Sarangapani; Zawodniok, Maciej

2014-05-01

Measured data or states for a nonlinear dynamic system is usually contaminated by outliers. Identifying and removing outliers will make the data (or system states) more trustworthy and reliable since outliers in the measured data (or states) can cause missed or false alarms during fault diagnosis. In addition, faults can make the system states nonstationary needing a novel analytical model-based fault detection (FD) framework. In this paper, an online outlier identification and removal (OIR) scheme is proposed for a nonlinear dynamic system. Since the dynamics of the system can experience unknown changes due to faults, traditional observer-based techniques cannot be used to remove the outliers. The OIR scheme uses a neural network (NN) to estimate the actual system states from measured system states involving outliers. With this method, the outlier detection is performed online at each time instant by finding the difference between the estimated and the measured states and comparing its median with its standard deviation over a moving time window. The NN weight update law in OIR is designed such that the detected outliers will have no effect on the state estimation, which is subsequently used for model-based fault diagnosis. In addition, since the OIR estimator cannot distinguish between the faulty or healthy operating conditions, a separate model-based observer is designed for fault diagnosis, which uses the OIR scheme as a preprocessing unit to improve the FD performance. The stability analysis of both OIR and fault diagnosis schemes are introduced. Finally, a three-tank benchmarking system and a simple linear system are used to verify the proposed scheme in simulations, and then the scheme is applied on an axial piston pump testbed. The scheme can be applied to nonlinear systems whose dynamics and underlying distribution of states are subjected to change due to both unknown faults and operating conditions.
Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method

PubMed Central

Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

2016-01-01

To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026
Accuracy of GIPSY PPP from version 6.2: a robust method to remove outliers

NASA Astrophysics Data System (ADS)

Hayal, Adem G.; Ugur Sanli, D.

2014-05-01

In this paper, we figure out the accuracy of GIPSY PPP from the latest version, version 6.2. As the research community prepares for the real-time PPP, it would be interesting to revise the accuracy of static GPS from the latest version of well established research software, the first among its kinds. Although the results do not significantly differ from the previous version, version 6.1.1, we still observe the slight improvement on the vertical component due to an enhanced second order ionospheric modeling which came out with the latest version. However, in this study, we rather turned our attention into outlier detection. Outliers usually occur among the solutions from shorter observation sessions and degrade the quality of the accuracy modeling. In our previous analysis from version 6.1.1, we argued that the elimination of outliers was cumbersome with the traditional method since repeated trials were needed, and subjectivity that could affect the statistical significance of the solutions might have been existed among the results (Hayal and Sanli, 2013). Here we overcome this problem using a robust outlier elimination method. Median is perhaps the simplest of the robust outlier detection methods in terms of applicability. At the same time, it might be considered to be the most efficient one with its highest breakdown point. In our analysis, we used a slightly different version of the median as introduced in Tut et al. 2013. Hence, we were able to remove suspected outliers at one run; which were, with the traditional methods, more problematic to remove this time from the solutions produced using the latest version of the software. References Hayal, AG, Sanli DU, Accuracy of GIPSY PPP from version 6, GNSS Precise Point Positioning Workshop: Reaching Full Potential, Vol. 1, pp. 41-42, (2013) Tut,İ., Sanli D.U., Erdogan B., Hekimoglu S., Efficiency of BERNESE single baseline rapid static positioning solutions with SEARCH strategy, Survey Review, Vol. 45, Issue 331, pp.296-304, (2013)
Evaluation of two outlier-detection-based methods for detecting tissue-selective genes from microarray data.

PubMed

Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro

2007-05-01

Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent's non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent's method is not suitable for ROKU.
HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations

PubMed Central

2017-01-01

The detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60–95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript. PMID:28423003
Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers

PubMed Central

Jackson, Monica C; Huang, Lan; Luo, Jun; Hachey, Mark; Feuer, Eric

2009-01-01

Background The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated. Methods We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*pop; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. Results For simulated data with outlier patterns, Tango's MEET, Moran's I and I*pop had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*pop (with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. Conclusion SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*pop perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan. PMID:19822013
Modeling Data Containing Outliers using ARIMA Additive Outlier (ARIMA-AO)

NASA Astrophysics Data System (ADS)

Saleh Ahmar, Ansari; Guritno, Suryo; Abdurakhman; Rahman, Abdul; Awi; Alimuddin; Minggi, Ilham; Arif Tiro, M.; Kasim Aidid, M.; Annas, Suwardi; Utami Sutiksno, Dian; Ahmar, Dewi S.; Ahmar, Kurniawan H.; Abqary Ahmar, A.; Zaki, Ahmad; Abdullah, Dahlan; Rahim, Robbi; Nurdiyanto, Heri; Hidayat, Rahmat; Napitupulu, Darmawan; Simarmata, Janner; Kurniasih, Nuning; Andretti Abdillah, Leon; Pranolo, Andri; Haviluddin; Albra, Wahyudin; Arifin, A. Nurani M.

2018-01-01

The aim this study is discussed on the detection and correction of data containing the additive outlier (AO) on the model ARIMA (p, d, q). The process of detection and correction of data using an iterative procedure popularized by Box, Jenkins, and Reinsel (1994). By using this method we obtained an ARIMA models were fit to the data containing AO, this model is added to the original model of ARIMA coefficients obtained from the iteration process using regression methods. In the simulation data is obtained that the data contained AO initial models are ARIMA (2,0,0) with MSE = 36,780, after the detection and correction of data obtained by the iteration of the model ARIMA (2,0,0) with the coefficients obtained from the regression Zt = 0,106+0,204Z t-1+0,401Z t-2-329X 1(t)+115X 2(t)+35,9X 3(t) and MSE = 19,365. This shows that there is an improvement of forecasting error rate data.
Latent Space Tracking from Heterogeneous Data with an Application for Anomaly Detection

DTIC Science & Technology

2015-11-01

specific, if the anomaly behaves as a sudden outlier after which the data stream goes back to normal state, then the anomalous data point should be...introduced three types of anomalies , all of them are sudden outliers . 438 J. Huang and X. Ning Table 2. Synthetic dataset: AUC and parameters method...Latent Space Tracking from Heterogeneous Data with an Application for Anomaly Detection Jiaji Huang1(B) and Xia Ning2 1 Department of Electrical
Mixture-Tuned, Clutter Matched Filter for Remote Detection of Subpixel Spectral Signals

NASA Technical Reports Server (NTRS)

Thompson, David R.; Mandrake, Lukas; Green, Robert O.

2013-01-01

Mapping localized spectral features in large images demands sensitive and robust detection algorithms. Two aspects of large images that can harm matched-filter detection performance are addressed simultaneously. First, multimodal backgrounds may thwart the typical Gaussian model. Second, outlier features can trigger false detections from large projections onto the target vector. Two state-of-the-art approaches are combined that independently address outlier false positives and multimodal backgrounds. The background clustering models multimodal backgrounds, and the mixture tuned matched filter (MT-MF) addresses outliers. Combining the two methods captures significant additional performance benefits. The resulting mixture tuned clutter matched filter (MT-CMF) shows effective performance on simulated and airborne datasets. The classical MNF transform was applied, followed by k-means clustering. Then, each cluster s mean, covariance, and the corresponding eigenvalues were estimated. This yields a cluster-specific matched filter estimate as well as a cluster- specific feasibility score to flag outlier false positives. The technology described is a proof of concept that may be employed in future target detection and mapping applications for remote imaging spectrometers. It is of most direct relevance to JPL proposals for airborne and orbital hyperspectral instruments. Applications include subpixel target detection in hyperspectral scenes for military surveillance. Earth science applications include mineralogical mapping, species discrimination for ecosystem health monitoring, and land use classification.
Image Corruption Detection in Diffusion Tensor Imaging for Post-Processing and Real-Time Monitoring

PubMed Central

Li, Yue; Shea, Steven M.; Lorenz, Christine H.; Jiang, Hangyi; Chou, Ming-Chung; Mori, Susumu

2013-01-01

Due to the high sensitivity of diffusion tensor imaging (DTI) to physiological motion, clinical DTI scans often suffer a significant amount of artifacts. Tensor-fitting-based, post-processing outlier rejection is often used to reduce the influence of motion artifacts. Although it is an effective approach, when there are multiple corrupted data, this method may no longer correctly identify and reject the corrupted data. In this paper, we introduce a new criterion called “corrected Inter-Slice Intensity Discontinuity” (cISID) to detect motion-induced artifacts. We compared the performance of algorithms using cISID and other existing methods with regard to artifact detection. The experimental results show that the integration of cISID into fitting-based methods significantly improves the retrospective detection performance at post-processing analysis. The performance of the cISID criterion, if used alone, was inferior to the fitting-based methods, but cISID could effectively identify severely corrupted images with a rapid calculation time. In the second part of this paper, an outlier rejection scheme was implemented on a scanner for real-time monitoring of image quality and reacquisition of the corrupted data. The real-time monitoring, based on cISID and followed by post-processing, fitting-based outlier rejection, could provide a robust environment for routine DTI studies. PMID:24204551
Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers.

PubMed

Jackson, Monica C; Huang, Lan; Luo, Jun; Hachey, Mark; Feuer, Eric

2009-10-12

The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated. We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*(pop); and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. For simulated data with outlier patterns, Tango's MEET, Moran's I and I*(pop) had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*(pop) (with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*(pop) perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan.

Evaluation of Two Outlier-Detection-Based Methods for Detecting Tissue-Selective Genes from Microarray Data

PubMed Central

Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro

2007-01-01

Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent’s non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent’s method is not suitable for ROKU. PMID:19936074
Robust Surface Reconstruction via Laplace-Beltrami Eigen-Projection and Boundary Deformation

PubMed Central

Shi, Yonggang; Lai, Rongjie; Morra, Jonathan H.; Dinov, Ivo; Thompson, Paul M.; Toga, Arthur W.

2010-01-01

In medical shape analysis, a critical problem is reconstructing a smooth surface of correct topology from a binary mask that typically has spurious features due to segmentation artifacts. The challenge is the robust removal of these outliers without affecting the accuracy of other parts of the boundary. In this paper, we propose a novel approach for this problem based on the Laplace-Beltrami (LB) eigen-projection and properly designed boundary deformations. Using the metric distortion during the LB eigen-projection, our method automatically detects the location of outliers and feeds this information to a well-composed and topology-preserving deformation. By iterating between these two steps of outlier detection and boundary deformation, we can robustly filter out the outliers without moving the smooth part of the boundary. The final surface is the eigen-projection of the filtered mask boundary that has the correct topology, desired accuracy and smoothness. In our experiments, we illustrate the robustness of our method on different input masks of the same structure, and compare with the popular SPHARM tool and the topology preserving level set method to show that our method can reconstruct accurate surface representations without introducing artificial oscillations. We also successfully validate our method on a large data set of more than 900 hippocampal masks and demonstrate that the reconstructed surfaces retain volume information accurately. PMID:20624704
Comparison of outliers and novelty detection to identify ionospheric TEC irregularities during geomagnetic storm and substorm

NASA Astrophysics Data System (ADS)

Pattisahusiwa, Asis; Houw Liong, The; Purqon, Acep

2016-08-01

In this study, we compare two learning mechanisms: outliers and novelty detection in order to detect ionospheric TEC disturbance by November 2004 geomagnetic storm and January 2005 substorm. The mechanisms are applied by using v-SVR learning algorithm which is a regression version of SVM. Our results show that both mechanisms are quiet accurate in learning TEC data. However, novelty detection is more accurate than outliers detection in extracting anomalies related to geomagnetic events. The detected anomalies by outliers detection are mostly related to trend of data, while novelty detection are associated to geomagnetic events. Novelty detection also shows evidence of LSTID during geomagnetic events.
Using State Estimation Residuals to Detect Abnormal SCADA Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Jian; Chen, Yousu; Huang, Zhenyu

2010-06-14

Detection of manipulated supervisory control and data acquisition (SCADA) data is critically important for the safe and secure operation of modern power systems. In this paper, a methodology of detecting manipulated SCADA data based on state estimation residuals is presented. A framework of the proposed methodology is described. Instead of using original SCADA measurements as the bad data sources, the residuals calculated based on the results of the state estimator are used as the input for the outlier detection process. The BACON algorithm is applied to detect outliers in the state estimation residuals. The IEEE 118-bus system is used asmore » a test case to evaluate the effectiveness of the proposed methodology. The accuracy of the BACON method is compared with that of the 3-σ method for the simulated SCADA measurements and residuals.« less
Spatio-temporal Outlier Detection in Precipitation Data

NASA Astrophysics Data System (ADS)

Wu, Elizabeth; Liu, Wei; Chawla, Sanjay

The detection of outliers from spatio-temporal data is an important task due to the increasing amount of spatio-temporal data available and the need to understand and interpret it. Due to the limitations of current data mining techniques, new techniques to handle this data need to be developed. We propose a spatio-temporal outlier detection algorithm called Outstretch, which discovers the outlier movement patterns of the top-k spatial outliers over several time periods. The top-k spatial outliers are found using the Exact-Grid Top- k and Approx-Grid Top- k algorithms, which are an extension of algorithms developed by Agarwal et al. [1]. Since they use the Kulldorff spatial scan statistic, they are capable of discovering all outliers, unaffected by neighbouring regions that may contain missing values. After generating the outlier sequences, we show one way they can be interpreted, by comparing them to the phases of the El Niño Southern Oscilliation (ENSO) weather phenomenon to provide a meaningful analysis of the results.
Clustering analysis of line indices for LAMOST spectra with AstroStat

NASA Astrophysics Data System (ADS)

Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

2018-06-01

The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.
Correction of Dual-PRF Doppler Velocity Outliers in the Presence of Aliasing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Altube, Patricia; Bech, Joan; Argemí, Oriol

In Doppler weather radars, the presence of unfolding errors or outliers is a well-known quality issue for radial velocity fields estimated using the dual–pulse repetition frequency (PRF) technique. Postprocessing methods have been developed to correct dual-PRF outliers, but these need prior application of a dealiasing algorithm for an adequate correction. Our paper presents an alternative procedure based on circular statistics that corrects dual-PRF errors in the presence of extended Nyquist aliasing. The correction potential of the proposed method is quantitatively tested by means of velocity field simulations and is exemplified in the application to real cases, including severe storm events.more » The comparison with two other existing correction methods indicates an improved performance in the correction of clustered outliers. The technique we propose is well suited for real-time applications requiring high-quality Doppler radar velocity fields, such as wind shear and mesocyclone detection algorithms, or assimilation in numerical weather prediction models.« less
Correction of Dual-PRF Doppler Velocity Outliers in the Presence of Aliasing

DOE PAGES

Altube, Patricia; Bech, Joan; Argemí, Oriol; ...

2017-07-18

In Doppler weather radars, the presence of unfolding errors or outliers is a well-known quality issue for radial velocity fields estimated using the dual–pulse repetition frequency (PRF) technique. Postprocessing methods have been developed to correct dual-PRF outliers, but these need prior application of a dealiasing algorithm for an adequate correction. Our paper presents an alternative procedure based on circular statistics that corrects dual-PRF errors in the presence of extended Nyquist aliasing. The correction potential of the proposed method is quantitatively tested by means of velocity field simulations and is exemplified in the application to real cases, including severe storm events.more » The comparison with two other existing correction methods indicates an improved performance in the correction of clustered outliers. The technique we propose is well suited for real-time applications requiring high-quality Doppler radar velocity fields, such as wind shear and mesocyclone detection algorithms, or assimilation in numerical weather prediction models.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sheng, Y; Ge, Y; Yuan, L

Purpose: To investigate the impact of outliers on knowledge modeling in radiation therapy, and develop a systematic workflow for identifying and analyzing geometric and dosimetric outliers using pelvic cases. Methods: Four groups (G1-G4) of pelvic plans were included: G1 (37 prostate cases), G2 (37 prostate plus lymph node cases), and G3 (37 prostate bed cases) are all clinical IMRT cases. G4 are 10 plans outside G1 re-planned with dynamic-arc to simulate dosimetric outliers. The workflow involves 2 steps: 1. identify geometric outliers, assess impact and clean up; 2. identify dosimetric outliers, assess impact and clean up.1. A baseline model wasmore » trained with all G1 cases. G2/G3 cases were then individually added to the baseline model as geometric outliers. The impact on the model was assessed by comparing leverage statistic of inliers (G1) and outliers (G2/G3). Receiver-operating-characteristics (ROC) analysis was performed to determine optimal threshold. 2. A separate baseline model was trained with 32 G1 cases. Each G4 case (dosimetric outliers) was then progressively added to perturb this model. DVH predictions were performed using these perturbed models for remaining 5 G1 cases. Normal tissue complication probability (NTCP) calculated from predicted DVH were used to evaluate dosimetric outliers’ impact. Results: The leverage of inliers and outliers was significantly different. The Area-Under-Curve (AUC) for differentiating G2 from G1 was 0.94 (threshold: 0.22) for bladder; and 0.80 (threshold: 0.10) for rectum. For differentiating G3 from G1, the AUC (threshold) was 0.68 (0.09) for bladder, 0.76 (0.08) for rectum. Significant increase in NTCP started from models with 4 dosimetric outliers for bladder (p<0.05), and with only 1 dosimetric outlier for rectum (p<0.05). Conclusion: We established a systematic workflow for identifying and analyzing geometric and dosimetric outliers, and investigated statistical metrics for detecting. Results validated the necessity for outlier detection and clean-up to enhance model quality in clinical practice. Research Grant: Varian master research grant.« less
Detection of Outliers in Spatial-Temporal Data

ERIC Educational Resources Information Center

Rogers, James P.

2010-01-01

Outlier detection is an important data mining task that is focused on the discovery of objects that deviate significantly when compared with a set of observations that are considered typical. Outlier detection can reveal objects that behave anomalously with respect to other observations, and these objects may highlight current or future problems. …
A method for separating seismo-ionospheric TEC outliers from heliogeomagnetic disturbances by using nu-SVR

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pattisahusiwa, Asis; Liong, The Houw; Purqon, Acep

Seismo-Ionospheric is a study of ionosphere disturbances associated with seismic activities. In many previous researches, heliogeomagnetic or strong earthquake activities can caused the disturbances in the ionosphere. However, it is difficult to separate these disturbances based on related sources. In this research, we proposed a method to separate these disturbances/outliers by using nu-SVR with the world-wide GPS data. TEC data related to the 26th December 2004 Sumatra and the 11th March 2011 Honshu earthquakes had been analyzed. After analyzed TEC data in several location around the earthquake epicenter and compared with geomagnetic data, the method shows a good result inmore » the average to detect the source of these outliers. This method is promising to use in the future research.« less
Outlier-resilient complexity analysis of heartbeat dynamics

NASA Astrophysics Data System (ADS)

Lo, Men-Tzung; Chang, Yi-Chung; Lin, Chen; Young, Hsu-Wen Vincent; Lin, Yen-Hung; Ho, Yi-Lwun; Peng, Chung-Kang; Hu, Kun

2015-03-01

Complexity in physiological outputs is believed to be a hallmark of healthy physiological control. How to accurately quantify the degree of complexity in physiological signals with outliers remains a major barrier for translating this novel concept of nonlinear dynamic theory to clinical practice. Here we propose a new approach to estimate the complexity in a signal by analyzing the irregularity of the sign time series of its coarse-grained time series at different time scales. Using surrogate data, we show that the method can reliably assess the complexity in noisy data while being highly resilient to outliers. We further apply this method to the analysis of human heartbeat recordings. Without removing any outliers due to ectopic beats, the method is able to detect a degradation of cardiac control in patients with congestive heart failure and a more degradation in critically ill patients whose life continuation relies on extracorporeal membrane oxygenator (ECMO). Moreover, the derived complexity measures can predict the mortality of ECMO patients. These results indicate that the proposed method may serve as a promising tool for monitoring cardiac function of patients in clinical settings.
Outlier and target detection in aerial hyperspectral imagery: a comparison of traditional and percentage occupancy hit or miss transform techniques

NASA Astrophysics Data System (ADS)

Young, Andrew; Marshall, Stephen; Gray, Alison

2016-05-01

The use of aerial hyperspectral imagery for the purpose of remote sensing is a rapidly growing research area. Currently, targets are generally detected by looking for distinct spectral features of the objects under surveillance. For example, a camouflaged vehicle, deliberately designed to blend into background trees and grass in the visible spectrum, can be revealed using spectral features in the near-infrared spectrum. This work aims to develop improved target detection methods, using a two-stage approach, firstly by development of a physics-based atmospheric correction algorithm to convert radiance into re ectance hyperspectral image data and secondly by use of improved outlier detection techniques. In this paper the use of the Percentage Occupancy Hit or Miss Transform is explored to provide an automated method for target detection in aerial hyperspectral imagery.
An anomaly detection approach for the identification of DME patients using spectral domain optical coherence tomography images.

PubMed

Sidibé, Désiré; Sankar, Shrinivasan; Lemaître, Guillaume; Rastgoo, Mojdeh; Massich, Joan; Cheung, Carol Y; Tan, Gavin S W; Milea, Dan; Lamoureux, Ecosse; Wong, Tien Y; Mériaudeau, Fabrice

2017-02-01

This paper proposes a method for automatic classification of spectral domain OCT data for the identification of patients with retinal diseases such as Diabetic Macular Edema (DME). We address this issue as an anomaly detection problem and propose a method that not only allows the classification of the OCT volume, but also allows the identification of the individual diseased B-scans inside the volume. Our approach is based on modeling the appearance of normal OCT images with a Gaussian Mixture Model (GMM) and detecting abnormal OCT images as outliers. The classification of an OCT volume is based on the number of detected outliers. Experimental results with two different datasets show that the proposed method achieves a sensitivity and a specificity of 80% and 93% on the first dataset, and 100% and 80% on the second one. Moreover, the experiments show that the proposed method achieves better classification performance than other recently published works. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
"Contrasting patterns of selection at Pinus pinaster Ait. Drought stress candidate genes as revealed by genetic differentiation analyses".

PubMed

Eveno, Emmanuelle; Collada, Carmen; Guevara, M Angeles; Léger, Valérie; Soto, Alvaro; Díaz, Luis; Léger, Patrick; González-Martínez, Santiago C; Cervera, M Teresa; Plomion, Christophe; Garnier-Géré, Pauline H

2008-02-01

The importance of natural selection for shaping adaptive trait differentiation among natural populations of allogamous tree species has long been recognized. Determining the molecular basis of local adaptation remains largely unresolved, and the respective roles of selection and demography in shaping population structure are actively debated. Using a multilocus scan that aims to detect outliers from simulated neutral expectations, we analyzed patterns of nucleotide diversity and genetic differentiation at 11 polymorphic candidate genes for drought stress tolerance in phenotypically contrasted Pinus pinaster Ait. populations across its geographical range. We compared 3 coalescent-based methods: 2 frequentist-like, including 1 approach specifically developed for biallelic single nucleotide polymorphisms (SNPs) here and 1 Bayesian. Five genes showed outlier patterns that were robust across methods at the haplotype level for 2 of them. Two genes presented higher F(ST) values than expected (PR-AGP4 and erd3), suggesting that they could have been affected by the action of diversifying selection among populations. In contrast, 3 genes presented lower F(ST) values than expected (dhn-1, dhn2, and lp3-1), which could represent signatures of homogenizing selection among populations. A smaller proportion of outliers were detected at the SNP level suggesting the potential functional significance of particular combinations of sites in drought-response candidate genes. The Bayesian method appeared robust to low sample sizes, flexible to assumptions regarding migration rates, and powerful for detecting selection at the haplotype level, but the frequentist-like method adapted to SNPs was more efficient for the identification of outlier SNPs showing low differentiation. Population-specific effects estimated in the Bayesian method also revealed populations with lower immigration rates, which could have led to favorable situations for local adaptation. Outlier patterns are discussed in relation to the different genes' putative involvement in drought tolerance responses, from published results in transcriptomics and association mapping in P. pinaster and other related species. These genes clearly constitute relevant candidates for future association studies in P. pinaster.
Analysis and detection of functional outliers in water quality parameters from different automated monitoring stations in the Nalón river basin (Northern Spain).

PubMed

Piñeiro Di Blasi, J I; Martínez Torres, J; García Nieto, P J; Alonso Fernández, J R; Díaz Muñiz, C; Taboada, J

2015-01-01

The purposes and intent of the authorities in establishing water quality standards are to provide enhancement of water quality and prevention of pollution to protect the public health or welfare in accordance with the public interest for drinking water supplies, conservation of fish, wildlife and other beneficial aquatic life, and agricultural, industrial, recreational, and other reasonable and necessary uses as well as to maintain and improve the biological integrity of the waters. In this way, water quality controls involve a large number of variables and observations, often subject to some outliers. An outlier is an observation that is numerically distant from the rest of the data or that appears to deviate markedly from other members of the sample in which it occurs. An interesting analysis is to find those observations that produce measurements that are different from the pattern established in the sample. Therefore, identification of atypical observations is an important concern in water quality monitoring and a difficult task because of the multivariate nature of water quality data. Our study provides a new method for detecting outliers in water quality monitoring parameters, using turbidity, conductivity and ammonium ion as indicator variables. Until now, methods were based on considering the different parameters as a vector whose components were their concentration values. This innovative approach lies in considering water quality monitoring over time as continuous curves instead of discrete points, that is to say, the dataset of the problem are considered as a time-dependent function and not as a set of discrete values in different time instants. This new methodology, which is based on the concept of functional depth, was applied to the detection of outliers in water quality monitoring samples in the Nalón river basin with success. Results of this study were discussed here in terms of origin, causes, etc. Finally, the conclusions as well as advantages of the functional method are exposed.
Query-Based Outlier Detection in Heterogeneous Information Networks.

PubMed

Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

2015-03-01

Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.
Query-Based Outlier Detection in Heterogeneous Information Networks

PubMed Central

Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

2015-01-01

Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397
The good, the bad and the outliers: automated detection of errors and outliers from groundwater hydrographs

NASA Astrophysics Data System (ADS)

Peterson, Tim J.; Western, Andrew W.; Cheng, Xiang

2018-03-01

Suspicious groundwater-level observations are common and can arise for many reasons ranging from an unforeseen biophysical process to bore failure and data management errors. Unforeseen observations may provide valuable insights that challenge existing expectations and can be deemed outliers, while monitoring and data handling failures can be deemed errors, and, if ignored, may compromise trend analysis and groundwater model calibration. Ideally, outliers and errors should be identified but to date this has been a subjective process that is not reproducible and is inefficient. This paper presents an approach to objectively and efficiently identify multiple types of errors and outliers. The approach requires only the observed groundwater hydrograph, requires no particular consideration of the hydrogeology, the drivers (e.g. pumping) or the monitoring frequency, and is freely available in the HydroSight toolbox. Herein, the algorithms and time-series model are detailed and applied to four observation bores with varying dynamics. The detection of outliers was most reliable when the observation data were acquired quarterly or more frequently. Outlier detection where the groundwater-level variance is nonstationary or the absolute trend increases rapidly was more challenging, with the former likely to result in an under-estimation of the number of outliers and the latter an overestimation in the number of outliers.
Robust w-Estimators for Cryo-EM Class Means

PubMed Central

Huang, Chenxi; Tagare, Hemant D.

2016-01-01

A critical step in cryogenic electron microscopy (cryo-EM) image analysis is to calculate the average of all images aligned to a projection direction. This average, called the “class mean”, improves the signal-to-noise ratio in single particle reconstruction (SPR). The averaging step is often compromised because of outlier images of ice, contaminants, and particle fragments. Outlier detection and rejection in the majority of current cryo-EM methods is done using cross-correlation with a manually determined threshold. Empirical assessment shows that the performance of these methods is very sensitive to the threshold. This paper proposes an alternative: a “w-estimator” of the average image, which is robust to outliers and which does not use a threshold. Various properties of the estimator, such as consistency and influence function are investigated. An extension of the estimator to images with different contrast transfer functions (CTFs) is also provided. Experiments with simulated and real cryo-EM images show that the proposed estimator performs quite well in the presence of outliers. PMID:26841397

Robust w-Estimators for Cryo-EM Class Means.

PubMed

Huang, Chenxi; Tagare, Hemant D

2016-02-01

A critical step in cryogenic electron microscopy (cryo-EM) image analysis is to calculate the average of all images aligned to a projection direction. This average, called the class mean, improves the signal-to-noise ratio in single-particle reconstruction. The averaging step is often compromised because of the outlier images of ice, contaminants, and particle fragments. Outlier detection and rejection in the majority of current cryo-EM methods are done using cross-correlation with a manually determined threshold. Empirical assessment shows that the performance of these methods is very sensitive to the threshold. This paper proposes an alternative: a w-estimator of the average image, which is robust to outliers and which does not use a threshold. Various properties of the estimator, such as consistency and influence function are investigated. An extension of the estimator to images with different contrast transfer functions is also provided. Experiments with simulated and real cryo-EM images show that the proposed estimator performs quite well in the presence of outliers.
Locally adaptive decision in detection of clustered microcalcifications in mammograms.

PubMed

Sainz de Cea, María V; Nishikawa, Robert M; Yang, Yongyi

2018-02-15

In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value <10 -4 ). There was also a reduction in case-to-case variability in detected FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.
Locally adaptive decision in detection of clustered microcalcifications in mammograms

NASA Astrophysics Data System (ADS)

Sainz de Cea, María V.; Nishikawa, Robert M.; Yang, Yongyi

2018-02-01

In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value <10-4). There was also a reduction in case-to-case variability in detected FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.
[Application of Stata software to test heterogeneity in meta-analysis method].

PubMed

Wang, Dan; Mou, Zhen-yun; Zhai, Jun-xia; Zong, Hong-xia; Zhao, Xiao-dong

2008-07-01

To introduce the application of Stata software to heterogeneity test in meta-analysis. A data set was set up according to the example in the study, and the corresponding commands of the methods in Stata 9 software were applied to test the example. The methods used were Q-test and I2 statistic attached to the fixed effect model forest plot, H statistic and Galbraith plot. The existence of the heterogeneity among studies could be detected by Q-test and H statistic and the degree of the heterogeneity could be detected by I2 statistic. The outliers which were the sources of the heterogeneity could be spotted from the Galbraith plot. Heterogeneity test in meta-analysis can be completed by the four methods in Stata software simply and quickly. H and I2 statistics are more robust, and the outliers of the heterogeneity can be clearly seen in the Galbraith plot among the four methods.
Outlier detection in a new half-circular distribution

NASA Astrophysics Data System (ADS)

Rambli, Adzhar; Mohamed, Ibrahim Bin; Shimizu, Kunio; Khalidin, Nurliza

2015-10-01

In this paper, we use a discordancy test based on spacing theory to detect outlier in a half-circular data. Up to now, numerous discordancy tests have been proposed to detect outlier in circular distributions which are defined in [0,2π). However, some circular data lie within just half of this range. Therefore, first we introduce a new half-circular distribution developed using the inverse stereographic projection technique on a gamma distributed variable. Then, we develop a new discordancy test to detect single or multiple outliers in the half-circular data based on the spacing theory. We show the practical value of the test by applying it to an eye data set obtained from a glaucoma clinic at the University of Malaya Medical Centre, Malaysia.
Outlier Detection in Hyperspectral Imagery Using Closest Distance to Center with Ellipsoidal Multivariate Trimming

DTIC Science & Technology

2011-01-01

where r << P. The use of PCA for finding outliers in multivariate data is surveyed by Gnanadesikan and Kettenring16 and Rao.17 As alluded to earlier...1984. 16. Gnanadesikan R and Kettenring JR. Robust estimates, residu als, and outlier detection with multiresponse data. Biometrics 1972; 28: 81–124
LOSITAN: a workbench to detect molecular adaptation based on a Fst-outlier method.

PubMed

Antao, Tiago; Lopes, Ana; Lopes, Ricardo J; Beja-Pereira, Albano; Luikart, Gordon

2008-07-28

Testing for selection is becoming one of the most important steps in the analysis of multilocus population genetics data sets. Existing applications are difficult to use, leaving many non-trivial, error-prone tasks to the user. Here we present LOSITAN, a selection detection workbench based on a well evaluated Fst-outlier detection method. LOSITAN greatly facilitates correct approximation of model parameters (e.g., genome-wide average, neutral Fst), provides data import and export functions, iterative contour smoothing and generation of graphics in a easy to use graphical user interface. LOSITAN is able to use modern multi-core processor architectures by locally parallelizing fdist, reducing computation time by half in current dual core machines and with almost linear performance gains in machines with more cores. LOSITAN makes selection detection feasible to a much wider range of users, even for large population genomic datasets, by both providing an easy to use interface and essential functionality to complete the whole selection detection process.
Rumbling Orchids: How To Assess Divergent Evolution Between Chloroplast Endosymbionts and the Nuclear Host.

PubMed

Pérez-Escobar, Oscar Alejandro; Balbuena, Juan Antonio; Gottschling, Marc

2016-01-01

Phylogenetic relationships inferred from multilocus organellar and nuclear DNA data are often difficult to resolve because of evolutionary conflicts among gene trees. However, conflicting or "outlier" associations (i.e., linked pairs of "operational terminal units" in two phylogenies) among these data sets often provide valuable information on evolutionary processes such as chloroplast capture following hybridization, incomplete lineage sorting, and horizontal gene transfer. Statistical tools that to date have been used in cophylogenetic studies only also have the potential to test for the degree of topological congruence between organellar and nuclear data sets and reliably detect outlier associations. Two distance-based methods, namely ParaFit and Procrustean Approach to Cophylogeny (PACo), were used in conjunction to detect those outliers contributing to conflicting phylogenies independently derived from chloroplast and nuclear sequence data. We explored their efficiency of retrieving outlier associations, and the impact of input data (unit branch length and additive trees) between data sets, by using several simulation approaches. To test their performance using real data sets, we additionally inferred the phylogenetic relationships within Neotropical Catasetinae (Epidendroideae, Orchidaceae), which is a suitable group to investigate phylogenetic incongruence because of hybridization processes between some of its constituent species. A comparison between trees derived from chloroplast and nuclear sequence data reflected strong, well-supported incongruence within Catasetum, Cycnoches, and Mormodes. As a result, outliers among chloroplast and nuclear data sets, and in experimental simulations, were successfully detected by PACo when using patristic distance matrices obtained from phylograms, but not from unit branch length trees. The performance of ParaFit was overall inferior compared to PACo, using either phylograms or unit branch lengths as input data. Because workflows for applying cophylogenetic analyses are not standardized yet, we provide a pipeline for executing PACo and ParaFit as well as displaying outlier associations in plots and trees by using the software R. The pipeline renders a method to identify outliers with high reliability and to assess the combinability of the independently derived data sets by means of statistical analyses. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Outlier detection for groundwater data in France

NASA Astrophysics Data System (ADS)

Valmy, Larissa; de Fouquet, Chantal; Bourgine, Bernard

2014-05-01

Quality and quantity water in France are increasingly observed since the 70s. Moreover, in 2000, the EU Water Framework Directive established a framework for community action in the water policy field for the protection of inland surface waters (rivers and lakes), transitional waters (estuaries), coastal waters and groundwater. It will ensure that all aquatic ecosystems and, with regard to their water needs, terrestrial ecosystems and wetlands meet 'good status' by 2015. The Directive requires Member States to establish river basin districts and for each of these a river basin management plan. In France, monitoring programs for the water status were implemented in each basin since 2007. The data collected through these programs feed into an information system which contributes to check the compliance of water environmental legislation implementation, assess the status of water guide management actions (programs of measures) and evaluate their effectiveness, and inform the public. Our work consists in study quality and quantity groundwater data for some basins in France. We propose a specific mathematical approach in order to detect outliers and study trends in time series. In statistic, an outlier is an observation that lies outside the overall pattern of a distribution. Usually, the presence of an outlier indicates some sort of problem, thus, it is important to detect it in order to know the cause. In fact, techniques for temporal data analysis have been developed for several decades in parallel with geostatistical methods. However compared to standard statistical methods, geostatistical analysis allows incomplete or irregular time series analysis. Otherwise, tests carried out by the BRGM showed the potential contribution of geostatistical methods for characterization of environmental data time series. Our approach is to exploit this potential through the development of specific algorithms, tests and validation of methods. We will introduce and explain our method and approach by considering the Loire Bretagne basin case.
A framework for periodic outlier pattern detection in time-series sequences.

PubMed

Rasheed, Faraz; Alhajj, Reda

2014-05-01

Periodic pattern detection in time-ordered sequences is an important data mining task, which discovers in the time series all patterns that exhibit temporal regularities. Periodic pattern mining has a large number of applications in real life; it helps understanding the regular trend of the data along time, and enables the forecast and prediction of future events. An interesting related and vital problem that has not received enough attention is to discover outlier periodic patterns in a time series. Outlier patterns are defined as those which are different from the rest of the patterns; outliers are not noise. While noise does not belong to the data and it is mostly eliminated by preprocessing, outliers are actual instances in the data but have exceptional characteristics compared with the majority of the other instances. Outliers are unusual patterns that rarely occur, and, thus, have lesser support (frequency of appearance) in the data. Outlier patterns may hint toward discrepancy in the data such as fraudulent transactions, network intrusion, change in customer behavior, recession in the economy, epidemic and disease biomarkers, severe weather conditions like tornados, etc. We argue that detecting the periodicity of outlier patterns might be more important in many sequences than the periodicity of regular, more frequent patterns. In this paper, we present a robust and time efficient suffix tree-based algorithm capable of detecting the periodicity of outlier patterns in a time series by giving more significance to less frequent yet periodic patterns. Several experiments have been conducted using both real and synthetic data; all aspects of the proposed approach are compared with the existing algorithm InfoMiner; the reported results demonstrate the effectiveness and applicability of the proposed approach.
Ranking Fragment Ions Based on Outlier Detection for Improved Label-Free Quantification in Data-Independent Acquisition LC-MS/MS

PubMed Central

Bilbao, Aivett; Zhang, Ying; Varesio, Emmanuel; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard

2016-01-01

Data-independent acquisition LC-MS/MS techniques complement supervised methods for peptide quantification. However, due to the wide precursor isolation windows, these techniques are prone to interference at the fragment ion level, which in turn is detrimental for accurate quantification. The “non-outlier fragment ion” (NOFI) ranking algorithm has been developed to assign low priority to fragment ions affected by interference. By using the optimal subset of high priority fragment ions these interfered fragment ions are effectively excluded from quantification. NOFI represents each fragment ion as a vector of four dimensions related to chromatographic and MS fragmentation attributes and applies multivariate outlier detection techniques. Benchmarking conducted on a well-defined quantitative dataset (i.e. the SWATH Gold Standard), indicates that NOFI on average is able to accurately quantify 11-25% more peptides than the commonly used Top-N library intensity ranking method. The sum of the area of the Top3-5 NOFIs produces similar coefficients of variation as compared to the library intensity method but with more accurate quantification results. On a biologically relevant human dendritic cell digest dataset, NOFI properly assigns low priority ranks to 85% of annotated interferences, resulting in sensitivity values between 0.92 and 0.80 against 0.76 for the Spectronaut interference detection algorithm. PMID:26412574
Addressing the issue of insufficient information in data-based bridge health monitoring : final report.

DOT National Transportation Integrated Search

2015-11-01

One of the most efficient ways to solve the damage detection problem using the statistical pattern recognition : approach is that of exploiting the methods of outlier analysis. Cast within the pattern recognition framework, : damage detection assesse...
Stacked Autoencoders for Outlier Detection in Over-the-Horizon Radar Signals

PubMed Central

Protopapadakis, Eftychios; Doulamis, Anastasios; Doulamis, Nikolaos; Dres, Dimitrios; Bimpas, Matthaios

2017-01-01

Detection of outliers in radar signals is a considerable challenge in maritime surveillance applications. High-Frequency Surface-Wave (HFSW) radars have attracted significant interest as potential tools for long-range target identification and outlier detection at over-the-horizon (OTH) distances. However, a number of disadvantages, such as their low spatial resolution and presence of clutter, have a negative impact on their accuracy. In this paper, we explore the applicability of deep learning techniques for detecting deviations from the norm in behavioral patterns of vessels (outliers) as they are tracked from an OTH radar. The proposed methodology exploits the nonlinear mapping capabilities of deep stacked autoencoders in combination with density-based clustering. A comparative experimental evaluation of the approach shows promising results in terms of the proposed methodology's performance. PMID:29312449
Lower reference limits of quantitative cord glucose-6-phosphate dehydrogenase estimated from healthy term neonates according to the clinical and laboratory standards institute guidelines: a cross sectional retrospective study

PubMed Central

2013-01-01

Background Previous studies have reported the lower reference limit (LRL) of quantitative cord glucose-6-phosphate dehydrogenase (G6PD), but they have not used approved international statistical methodology. Using common standards is expecting to yield more true findings. Therefore, we aimed to estimate LRL of quantitative G6PD detection in healthy term neonates by using statistical analyses endorsed by the International Federation of Clinical Chemistry (IFCC) and the Clinical and Laboratory Standards Institute (CLSI) for reference interval estimation. Methods This cross sectional retrospective study was performed at King Abdulaziz Hospital, Saudi Arabia, between March 2010 and June 2012. The study monitored consecutive neonates born to mothers from one Arab Muslim tribe that was assumed to have a low prevalence of G6PD-deficiency. Neonates that satisfied the following criteria were included: full-term birth (37 weeks); no admission to the special care nursery; no phototherapy treatment; negative direct antiglobulin test; and fathers of female neonates were from the same mothers’ tribe. The G6PD activity (Units/gram Hemoglobin) was measured spectrophotometrically by an automated kit. This study used statistical analyses endorsed by IFCC and CLSI for reference interval estimation. The 2.5th percentiles and the corresponding 95% confidence intervals (CI) were estimated as LRLs, both in presence and absence of outliers. Results 207 males and 188 females term neonates who had cord blood quantitative G6PD testing met the inclusion criteria. Method of Horn detected 20 G6PD values as outliers (8 males and 12 females). Distributions of quantitative cord G6PD values exhibited a normal distribution in absence of the outliers only. The Harris-Boyd method and proportion criteria revealed that combined gender LRLs were reliable. The combined bootstrap LRL in presence of the outliers was 10.0 (95% CI: 7.5-10.7) and the combined parametric LRL in absence of the outliers was 11.0 (95% CI: 10.5-11.3). Conclusion These results contribute to the LRL of quantitative cord G6PD detection in full-term neonates. They are transferable to another laboratory when pre-analytical factors and testing methods are comparable and the IFCC-CLSI requirements of transference are satisfied. We are suggesting using estimated LRL in absence of the outliers as mislabeling G6PD-deficient neonates as normal is intolerable whereas mislabeling G6PD-normal neonates as deficient is tolerable. PMID:24016342
Intelligent agent-based intrusion detection system using enhanced multiclass SVM.

PubMed

Ganapathy, S; Yogesh, P; Kannan, A

2012-01-01

Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set.
Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM

PubMed Central

Ganapathy, S.; Yogesh, P.; Kannan, A.

2012-01-01

Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set. PMID:23056036
Model diagnostics in reduced-rank estimation

PubMed Central

Chen, Kun

2016-01-01

Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding. Under Stein's unbiased risk estimation framework, we propose a set of tools, including leverage score and generalized information score, to perform model diagnostics and outlier detection in large-scale reduced-rank estimation. The leverage scores give an exact decomposition of the so-called model degrees of freedom to the observation level, which lead to exact decomposition of many commonly-used information criteria; the resulting quantities are thus named information scores of the observations. The proposed information score approach provides a principled way of combining the residuals and leverage scores for anomaly detection. Simulation studies confirm that the proposed diagnostic tools work well. A pattern recognition example with hand-writing digital images and a time series analysis example with monthly U.S. macroeconomic data further demonstrate the efficacy of the proposed approaches. PMID:28003860
Model diagnostics in reduced-rank estimation.

PubMed

Chen, Kun

2016-01-01

Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding. Under Stein's unbiased risk estimation framework, we propose a set of tools, including leverage score and generalized information score, to perform model diagnostics and outlier detection in large-scale reduced-rank estimation. The leverage scores give an exact decomposition of the so-called model degrees of freedom to the observation level, which lead to exact decomposition of many commonly-used information criteria; the resulting quantities are thus named information scores of the observations. The proposed information score approach provides a principled way of combining the residuals and leverage scores for anomaly detection. Simulation studies confirm that the proposed diagnostic tools work well. A pattern recognition example with hand-writing digital images and a time series analysis example with monthly U.S. macroeconomic data further demonstrate the efficacy of the proposed approaches.
Portraying the Expression Landscapes of B-Cell Lymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes

PubMed Central

Hopp, Lydia; Lembcke, Kathrin; Binder, Hans; Wirth, Henry

2013-01-01

We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics. PMID:24833231
Slowing ash mortality: a potential strategy to slam emerald ash borer in outlier sites

Treesearch

Deborah G. McCullough; Nathan W. Siegert; John Bedford

2009-01-01

Several isolated outlier populations of emerald ash borer (Agrilus planipennis Fairmaire) were discovered in 2008 and additional outliers will likely be found as detection surveys and public outreach activities...

Aberrant Gene Expression in Humans

PubMed Central

Yang, Ence; Ji, Guoli; Brinkmeyer-Langford, Candice L.; Cai, James J.

2015-01-01

Gene expression as an intermediate molecular phenotype has been a focus of research interest. In particular, studies of expression quantitative trait loci (eQTL) have offered promise for understanding gene regulation through the discovery of genetic variants that explain variation in gene expression levels. Existing eQTL methods are designed for assessing the effects of common variants, but not rare variants. Here, we address the problem by establishing a novel analytical framework for evaluating the effects of rare or private variants on gene expression. Our method starts from the identification of outlier individuals that show markedly different gene expression from the majority of a population, and then reveals the contributions of private SNPs to the aberrant gene expression in these outliers. Using population-scale mRNA sequencing data, we identify outlier individuals using a multivariate approach. We find that outlier individuals are more readily detected with respect to gene sets that include genes involved in cellular regulation and signal transduction, and less likely to be detected with respect to the gene sets with genes involved in metabolic pathways and other fundamental molecular functions. Analysis of polymorphic data suggests that private SNPs of outlier individuals are enriched in the enhancer and promoter regions of corresponding aberrantly-expressed genes, suggesting a specific regulatory role of private SNPs, while the commonly-occurring regulatory genetic variants (i.e., eQTL SNPs) show little evidence of involvement. Additional data suggest that non-genetic factors may also underlie aberrant gene expression. Taken together, our findings advance a novel viewpoint relevant to situations wherein common eQTLs fail to predict gene expression when heritable, rare inter-individual variation exists. The analytical framework we describe, taking into consideration the reality of differential phenotypic robustness, may be valuable for investigating complex traits and conditions. PMID:25617623
Unsupervised Sequential Outlier Detection With Deep Architectures.

PubMed

Lu, Weining; Cheng, Yu; Xiao, Cao; Chang, Shiyu; Huang, Shuai; Liang, Bin; Huang, Thomas

2017-09-01

Unsupervised outlier detection is a vital task and has high impact on a wide variety of applications domains, such as image analysis and video surveillance. It also gains long-standing attentions and has been extensively studied in multiple research areas. Detecting and taking action on outliers as quickly as possible are imperative in order to protect network and related stakeholders or to maintain the reliability of critical systems. However, outlier detection is difficult due to the one class nature and challenges in feature construction. Sequential anomaly detection is even harder with more challenges from temporal correlation in data, as well as the presence of noise and high dimensionality. In this paper, we introduce a novel deep structured framework to solve the challenging sequential outlier detection problem. We use autoencoder models to capture the intrinsic difference between outliers and normal instances and integrate the models to recurrent neural networks that allow the learning to make use of previous context as well as make the learners more robust to warp along the time axis. Furthermore, we propose to use a layerwise training procedure, which significantly simplifies the training procedure and hence helps achieve efficient and scalable training. In addition, we investigate a fine-tuning step to update all parameters set by incorporating the temporal correlation in the sequence. We further apply our proposed models to conduct systematic experiments on five real-world benchmark data sets. Experimental results demonstrate the effectiveness of our model, compared with other state-of-the-art approaches.
Improving Electronic Sensor Reliability by Robust Outlier Screening

PubMed Central

Moreno-Lizaranzu, Manuel J.; Cuesta, Federico

2013-01-01

Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs. PMID:24113682
Improving electronic sensor reliability by robust outlier screening.

PubMed

Moreno-Lizaranzu, Manuel J; Cuesta, Federico

2013-10-09

Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs.
Using Innovative Outliers to Detect Discrete Shifts in Dynamics in Group-Based State-Space Models

ERIC Educational Resources Information Center

Chow, Sy-Miin; Hamaker, Ellen L.; Allaire, Jason C.

2009-01-01

Outliers are typically regarded as data anomalies that should be discarded. However, dynamic or "innovative" outliers can be appropriately utilized to capture unusual but substantively meaningful shifts in a system's dynamics. We extend De Jong and Penzer's 1998 approach for representing outliers in single-subject state-space models to a…
A tandem regression-outlier analysis of a ligand cellular system for key structural modifications around ligand binding.

PubMed

Lin, Ying-Ting

2013-04-30

A tandem technique of hard equipment is often used for the chemical analysis of a single cell to first isolate and then detect the wanted identities. The first part is the separation of wanted chemicals from the bulk of a cell; the second part is the actual detection of the important identities. To identify the key structural modifications around ligand binding, the present study aims to develop a counterpart of tandem technique for cheminformatics. A statistical regression and its outliers act as a computational technique for separation. A PPARγ (peroxisome proliferator-activated receptor gamma) agonist cellular system was subjected to such an investigation. Results show that this tandem regression-outlier analysis, or the prioritization of the context equations tagged with features of the outliers, is an effective regression technique of cheminformatics to detect key structural modifications, as well as their tendency of impact to ligand binding. The key structural modifications around ligand binding are effectively extracted or characterized out of cellular reactions. This is because molecular binding is the paramount factor in such ligand cellular system and key structural modifications around ligand binding are expected to create outliers. Therefore, such outliers can be captured by this tandem regression-outlier analysis.
Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization.

PubMed

Zair, Salim; Le Hégarat-Mascle, Sylvie; Seignez, Emmanuel

2016-04-22

In urban areas or space-constrained environments with obstacles, vehicle localization using Global Navigation Satellite System (GNSS) data is hindered by Non-Line Of Sight (NLOS) and multipath receptions. These phenomena induce faulty data that disrupt the precise localization of the GNSS receiver. In this study, we detect the outliers among the observations, Pseudo-Range (PR) and/or Doppler measurements, and we evaluate how discarding them improves the localization. We specify a contrario modeling for GNSS raw data to derive an algorithm that partitions the dataset between inliers and outliers. Then, only the inlier data are considered in the localization process performed either through a classical Particle Filter (PF) or a Rao-Blackwellization (RB) approach. Both localization algorithms exclusively use GNSS data, but they differ by the way Doppler measurements are processed. An experiment has been performed with a GPS receiver aboard a vehicle. Results show that the proposed algorithms are able to detect the 'outliers' in the raw data while being robust to non-Gaussian noise and to intermittent satellite blockage. We compare the performance results achieved either estimating only PR outliers or estimating both PR and Doppler outliers. The best localization is achieved using the RB approach coupled with PR-Doppler outlier estimation.
Outlier detection in contamination control

NASA Astrophysics Data System (ADS)

Weintraub, Jeffrey; Warrick, Scott

2018-03-01

A machine-learning model is presented that effectively partitions historical process data into outlier and inlier subpopulations. This is necessary in order to avoid using outlier data to build a model for detecting process instability. Exact control limits are given without recourse to approximations and the error characteristics of the control model are derived. A worked example for contamination control is presented along with the machine learning algorithm used and all the programming statements needed for implementation.
Detecting Outliers in Factor Analysis Using the Forward Search Algorithm

ERIC Educational Resources Information Center

Mavridis, Dimitris; Moustaki, Irini

2008-01-01

In this article we extend and implement the forward search algorithm for identifying atypical subjects/observations in factor analysis models. The forward search has been mainly developed for detecting aberrant observations in regression models (Atkinson, 1994) and in multivariate methods such as cluster and discriminant analysis (Atkinson, Riani,…
Treatment of Outliers via Interpolation Method with Neural Network Forecast Performances

NASA Astrophysics Data System (ADS)

Wahir, N. A.; Nor, M. E.; Rusiman, M. S.; Gopal, K.

2018-04-01

Outliers often lurk in many datasets, especially in real data. Such anomalous data can negatively affect statistical analyses, primarily normality, variance, and estimation aspects. Hence, handling the occurrences of outliers require special attention. Therefore, it is important to determine the suitable ways in treating outliers so as to ensure that the quality of the analyzed data is indeed high. As such, this paper discusses an alternative method to treat outliers via linear interpolation method. In fact, assuming outlier as a missing value in the dataset allows the application of the interpolation method to interpolate the outliers thus, enabling the comparison of data series using forecast accuracy before and after outlier treatment. With that, the monthly time series of Malaysian tourist arrivals from January 1998 until December 2015 had been used to interpolate the new series. The results indicated that the linear interpolation method, which was comprised of improved time series data, displayed better results, when compared to the original time series data in forecasting from both Box-Jenkins and neural network approaches.
Detecting outliers and learning complex structures with large spectroscopic surveys - a case study with APOGEE stars

NASA Astrophysics Data System (ADS)

Reis, Itamar; Poznanski, Dovi; Baron, Dalya; Zasowski, Gail; Shahaf, Sahar

2018-05-01

In this work, we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the data set, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the data set for objects allows us to find objects that are impossible to find using their best-fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the data set, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data.
Influence of outliers on accuracy estimation in genomic prediction in plant breeding.

PubMed

Estaghvirou, Sidi Boubacar Ould; Ogutu, Joseph O; Piepho, Hans-Peter

2014-10-01

Outliers often pose problems in analyses of data in plant breeding, but their influence on the performance of methods for estimating predictive accuracy in genomic prediction studies has not yet been evaluated. Here, we evaluate the influence of outliers on the performance of methods for accuracy estimation in genomic prediction studies using simulation. We simulated 1000 datasets for each of 10 scenarios to evaluate the influence of outliers on the performance of seven methods for estimating accuracy. These scenarios are defined by the number of genotypes, marker effect variance, and magnitude of outliers. To mimic outliers, we added to one observation in each simulated dataset, in turn, 5-, 8-, and 10-times the error SD used to simulate small and large phenotypic datasets. The effect of outliers on accuracy estimation was evaluated by comparing deviations in the estimated and true accuracies for datasets with and without outliers. Outliers adversely influenced accuracy estimation, more so at small values of genetic variance or number of genotypes. A method for estimating heritability and predictive accuracy in plant breeding and another used to estimate accuracy in animal breeding were the most accurate and resistant to outliers across all scenarios and are therefore preferable for accuracy estimation in genomic prediction studies. The performances of the other five methods that use cross-validation were less consistent and varied widely across scenarios. The computing time for the methods increased as the size of outliers and sample size increased and the genetic variance decreased. Copyright © 2014 Ould Estaghvirou et al.
Identification of Outliers in Grace Data for Indo-Gangetic Plain Using Various Methods (Z-Score, Modified Z-score and Adjusted Boxplot) and Its Removal

NASA Astrophysics Data System (ADS)

Srivastava, S.

2015-12-01

Gravity Recovery and Climate Experiment (GRACE) data are widely used for the hydrological studies for large scale basins (≥100,000 sq km). GRACE data (Stokes Coefficients or Equivalent Water Height) used for hydrological studies are not direct observations but result from high level processing of raw data from the GRACE mission. Different partner agencies like CSR, GFZ and JPL implement their own methodology and their processing methods are independent from each other. The primary source of errors in GRACE data are due to measurement and modeling errors and the processing strategy of these agencies. Because of different processing methods, the final data from all the partner agencies are inconsistent with each other at some epoch. GRACE data provide spatio-temporal variations in Earth's gravity which is mainly attributed to the seasonal fluctuations in water level on Earth surfaces and subsurface. During the quantification of error/uncertainties, several high positive and negative peaks were observed which do not correspond to any hydrological processes but may emanate from a combination of primary error sources, or some other geophysical processes (e.g. Earthquakes, landslide, etc.) resulting in redistribution of earth's mass. Such peaks can be considered as outliers for hydrological studies. In this work, an algorithm has been designed to extract outliers from the GRACE data for Indo-Gangetic plain, which considers the seasonal variations and the trend in data. Different outlier detection methods have been used such as Z-score, modified Z-score and adjusted boxplot. For verification, assimilated hydrological (GLDAS) and hydro-meteorological data are used as the reference. The results have shown that the consistency amongst all data sets improved significantly after the removal of outliers.
Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization

PubMed Central

Zair, Salim; Le Hégarat-Mascle, Sylvie; Seignez, Emmanuel

2016-01-01

In urban areas or space-constrained environments with obstacles, vehicle localization using Global Navigation Satellite System (GNSS) data is hindered by Non-Line Of Sight (NLOS) and multipath receptions. These phenomena induce faulty data that disrupt the precise localization of the GNSS receiver. In this study, we detect the outliers among the observations, Pseudo-Range (PR) and/or Doppler measurements, and we evaluate how discarding them improves the localization. We specify a contrario modeling for GNSS raw data to derive an algorithm that partitions the dataset between inliers and outliers. Then, only the inlier data are considered in the localization process performed either through a classical Particle Filter (PF) or a Rao-Blackwellization (RB) approach. Both localization algorithms exclusively use GNSS data, but they differ by the way Doppler measurements are processed. An experiment has been performed with a GPS receiver aboard a vehicle. Results show that the proposed algorithms are able to detect the ‘outliers’ in the raw data while being robust to non-Gaussian noise and to intermittent satellite blockage. We compare the performance results achieved either estimating only PR outliers or estimating both PR and Doppler outliers. The best localization is achieved using the RB approach coupled with PR-Doppler outlier estimation. PMID:27110796
Supervised Detection of Anomalous Light Curves in Massive Astronomical Catalogs

NASA Astrophysics Data System (ADS)

Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won

2014-09-01

The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.
Improving the estimation of zenith dry tropospheric delays using regional surface meteorological data

NASA Astrophysics Data System (ADS)

Luo, X.; Heck, B.; Awange, J. L.

2013-12-01

Global Navigation Satellite Systems (GNSS) are emerging as possible tools for remote sensing high-resolution atmospheric water vapour that improves weather forecasting through numerical weather prediction models. Nowadays, the GNSS-derived tropospheric zenith total delay (ZTD), comprising zenith dry delay (ZDD) and zenith wet delay (ZWD), is achievable with sub-centimetre accuracy. However, if no representative near-site meteorological information is available, the quality of the ZDD derived from tropospheric models is degraded, leading to inaccurate estimation of the water vapour component ZWD as difference between ZTD and ZDD. On the basis of freely accessible regional surface meteorological data, this paper proposes a height-dependent linear correction model for a priori ZDD. By applying the ordinary least-squares estimation (OLSE), bootstrapping (BOOT), and leave-one-out cross-validation (CROS) methods, the model parameters are estimated and analysed with respect to outlier detection. The model validation is carried out using GNSS stations with near-site meteorological measurements. The results verify the efficiency of the proposed ZDD correction model, showing a significant reduction in the mean bias from several centimetres to about 5 mm. The OLSE method enables a fast computation, while the CROS procedure allows for outlier detection. All the three methods produce consistent results after outlier elimination, which improves the regression quality by about 20% and the model accuracy by up to 30%.
Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation.

PubMed

Adikaram, K K L B; Hussein, M A; Effenberger, M; Becker, T

2015-01-01

Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio Rmax of amax - amin and Sn - amin*n and that of Rmin of amax - amin and amax*n - Sn are always equal to 2/n, where amax is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax > 2/n and Rmin > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k1) and 2/n * (1 + k2), respectively, where k1 > k2 and 0 ≤ k1 ≤ n/2 - 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10-4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.
TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees.

PubMed

Mai, Uyen; Mirarab, Siavash

2018-05-08

Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .
A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

NASA Astrophysics Data System (ADS)

Radhakrishnan, Regunathan; Divakaran, Ajay; Xiong, Ziyou; Otsuka, Isao

2006-12-01

We propose a content-adaptive analysis and representation framework to discover events using audio features from "unscripted" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that "interesting" events in unscripted multimedia occur sparsely in a background of usual or "uninteresting" events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select "key audio classes" that are indicative of events of interest in the chosen domain.
Community trees: Identifying codiversification in the Páramo dipteran community.

PubMed

Carstens, Bryan C; Gruenstaeudl, Michael; Reid, Noah M

2016-05-01

Groups of codistributed species that responded in a concerted manner to environmental events are expected to share patterns of evolutionary diversification. However, the identification of such groups has largely been based on qualitative, post hoc analyses. We develop here two methods (posterior predictive simulation [PPS], Kuhner-Felsenstein [K-F] analysis of variance [ANOVA]) for the analysis of codistributed species that, given a group of species with a shared pattern of diversification, allow empiricists to identify those taxa that do not codiversify (i.e., "outlier" species). The identification of outlier species makes it possible to jointly estimate the evolutionary history of co-diversifying taxa. To evaluate the approaches presented here, we collected data from Páramo dipterans, identified outlier species, and estimated a "community tree" from species that are identified as having codiversified. Our results demonstrate that dipteran communities from different Páramo habitats in the same mountain range are more closely related than communities in other ranges. We also conduct simulation testing to evaluate this approach. Results suggest that our approach provides a useful addition to comparative phylogeographic methods, while identifying aspects of the analysis that require careful interpretation. In particular, both the PPS and K-F ANOVA perform acceptably when there are one or two outlier species, but less so as the number of outliers increases. This is likely a function of the corresponding degradation of the signal of community divergence; without a strong signal from a codiversifying community, there is no dominant pattern from which to detect an outlier species. For this reason, both the magnitude of K-F distance distribution and outside knowledge about the phylogeographic history of each putative member of the community should be considered when interpreting the results. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

Quality assurance using outlier detection on an automatic segmentation method for the cerebellar peduncles

NASA Astrophysics Data System (ADS)

Li, Ke; Ye, Chuyang; Yang, Zhen; Carass, Aaron; Ying, Sarah H.; Prince, Jerry L.

2016-03-01

Cerebellar peduncles (CPs) are white matter tracts connecting the cerebellum to other brain regions. Automatic segmentation methods of the CPs have been proposed for studying their structure and function. Usually the performance of these methods is evaluated by comparing segmentation results with manual delineations (ground truth). However, when a segmentation method is run on new data (for which no ground truth exists) it is highly desirable to efficiently detect and assess algorithm failures so that these cases can be excluded from scientific analysis. In this work, two outlier detection methods aimed to assess the performance of an automatic CP segmentation algorithm are presented. The first one is a univariate non-parametric method using a box-whisker plot. We first categorize automatic segmentation results of a dataset of diffusion tensor imaging (DTI) scans from 48 subjects as either a success or a failure. We then design three groups of features from the image data of nine categorized failures for failure detection. Results show that most of these features can efficiently detect the true failures. The second method—supervised classification—was employed on a larger DTI dataset of 249 manually categorized subjects. Four classifiers—linear discriminant analysis (LDA), logistic regression (LR), support vector machine (SVM), and random forest classification (RFC)—were trained using the designed features and evaluated using a leave-one-out cross validation. Results show that the LR performs worst among the four classifiers and the other three perform comparably, which demonstrates the feasibility of automatically detecting segmentation failures using classification methods.
Robust detrending, rereferencing, outlier detection, and inpainting for multichannel data.

PubMed

de Cheveigné, Alain; Arzounian, Dorothée

2018-05-15

Electroencephalography (EEG), magnetoencephalography (MEG) and related techniques are prone to glitches, slow drift, steps, etc., that contaminate the data and interfere with the analysis and interpretation. These artifacts are usually addressed in a preprocessing phase that attempts to remove them or minimize their impact. This paper offers a set of useful techniques for this purpose: robust detrending, robust rereferencing, outlier detection, data interpolation (inpainting), step removal, and filter ringing artifact removal. These techniques provide a less wasteful alternative to discarding corrupted trials or channels, and they are relatively immune to artifacts that disrupt alternative approaches such as filtering. Robust detrending allows slow drifts and common mode signals to be factored out while avoiding the deleterious effects of glitches. Robust rereferencing reduces the impact of artifacts on the reference. Inpainting allows corrupt data to be interpolated from intact parts based on the correlation structure estimated over the intact parts. Outlier detection allows the corrupt parts to be identified. Step removal fixes the high-amplitude flux jump artifacts that are common with some MEG systems. Ringing removal allows the ringing response of the antialiasing filter to glitches (steps, pulses) to be suppressed. The performance of the methods is illustrated and evaluated using synthetic data and data from real EEG and MEG systems. These methods, which are mainly automatic and require little tuning, can greatly improve the quality of the data. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Detecting short spatial scale local adaptation and epistatic selection in climate-related candidate genes in European beech (Fagus sylvatica) populations.

PubMed

Csilléry, Katalin; Lalagüe, Hadrien; Vendramin, Giovanni G; González-Martínez, Santiago C; Fady, Bruno; Oddou-Muratorio, Sylvie

2014-10-01

Detecting signatures of selection in tree populations threatened by climate change is currently a major research priority. Here, we investigated the signature of local adaptation over a short spatial scale using 96 European beech (Fagus sylvatica L.) individuals originating from two pairs of populations on the northern and southern slopes of Mont Ventoux (south-eastern France). We performed both single and multilocus analysis of selection based on 53 climate-related candidate genes containing 546 SNPs. FST outlier methods at the SNP level revealed a weak signal of selection, with three marginally significant outliers in the northern populations. At the gene level, considering haplotypes as alleles, two additional marginally significant outliers were detected, one on each slope. To account for the uncertainty of haplotype inference, we averaged the Bayes factors over many possible phase reconstructions. Epistatic selection offers a realistic multilocus model of selection in natural populations. Here, we used a test suggested by Ohta based on the decomposition of the variance of linkage disequilibrium. Overall populations, 0.23% of the SNP pairs (haplotypes) showed evidence of epistatic selection, with nearly 80% of them being within genes. One of the between gene epistatic selection signals arose between an FST outlier and a nonsynonymous mutation in a drought response gene. Additionally, we identified haplotypes containing selectively advantageous allele combinations which were unique to high or low elevations and northern or southern populations. Several haplotypes contained nonsynonymous mutations situated in genes with known functional importance for adaptation to climatic factors. © 2014 John Wiley & Sons Ltd.
An efficient sampling algorithm for uncertain abnormal data detection in biomedical image processing and disease prediction.

PubMed

Liu, Fei; Zhang, Xi; Jia, Yan

2015-01-01

In this paper, we propose a computer information processing algorithm that can be used for biomedical image processing and disease prediction. A biomedical image is considered a data object in a multi-dimensional space. Each dimension is a feature that can be used for disease diagnosis. We introduce a new concept of the top (k1,k2) outlier. It can be used to detect abnormal data objects in the multi-dimensional space. This technique focuses on uncertain space, where each data object has several possible instances with distinct probabilities. We design an efficient sampling algorithm for the top (k1,k2) outlier in uncertain space. Some improvement techniques are used for acceleration. Experiments show our methods' high accuracy and high efficiency.
Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction.

PubMed

Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi

2017-08-08

Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors.
Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction

PubMed Central

Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi

2017-01-01

Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors. PMID:28786957
Outlier identification in urban soils and its implications for identification of potential contaminated land

NASA Astrophysics Data System (ADS)

Zhang, Chaosheng

2010-05-01

Outliers in urban soil geochemical databases may imply potential contaminated land. Different methodologies which can be easily implemented for the identification of global and spatial outliers were applied for Pb concentrations in urban soils of Galway City in Ireland. Due to its strongly skewed probability feature, a Box-Cox transformation was performed prior to further analyses. The graphic methods of histogram and box-and-whisker plot were effective in identification of global outliers at the original scale of the dataset. Spatial outliers could be identified by a local indicator of spatial association of local Moran's I, cross-validation of kriging, and a geographically weighted regression. The spatial locations of outliers were visualised using a geographical information system. Different methods showed generally consistent results, but differences existed. It is suggested that outliers identified by statistical methods should be confirmed and justified using scientific knowledge before they are properly dealt with.
Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT

PubMed Central

Wang, Dantong; Fong, Simon; Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Wong, Kelvin K. L.

2017-01-01

Outlier detection in bioinformatics data streaming mining has received significant attention by research communities in recent years. The problems of how to distinguish noise from an exception and deciding whether to discard it or to devise an extra decision path for accommodating it are causing dilemma. In this paper, we propose a novel algorithm called ODR with incrementally Optimized Very Fast Decision Tree (ODR-ioVFDT) for taking care of outliers in the progress of continuous data learning. By using an adaptive interquartile-range based identification method, a tolerance threshold is set. It is then used to judge if a data of exceptional value should be included for training or otherwise. This is different from the traditional outlier detection/removal approaches which are two separate steps in processing through the data. The proposed algorithm is tested using datasets of five bioinformatics scenarios and comparing the performance of our model and other ones without ODR. The results show that ODR-ioVFDT has better performance in classification accuracy, kappa statistics, and time consumption. The ODR-ioVFDT applied onto bioinformatics streaming data processing for detecting and quantifying the information of life phenomena, states, characters, variables and components of the organism can help to diagnose and treat disease more effectively. PMID:28230161
A genome scan for selection signatures comparing farmed Atlantic salmon with two wild populations: Testing colocalization among outlier markers, candidate genes, and quantitative trait loci for production traits.

PubMed

Liu, Lei; Ang, Keng Pee; Elliott, J A K; Kent, Matthew Peter; Lien, Sigbjørn; MacDonald, Danielle; Boulding, Elizabeth Grace

2017-03-01

Comparative genome scans can be used to identify chromosome regions, but not traits, that are putatively under selection. Identification of targeted traits may be more likely in recently domesticated populations under strong artificial selection for increased production. We used a North American Atlantic salmon 6K SNP dataset to locate genome regions of an aquaculture strain (Saint John River) that were highly diverged from that of its putative wild founder population (Tobique River). First, admixed individuals with partial European ancestry were detected using STRUCTURE and removed from the dataset. Outlier loci were then identified as those showing extreme differentiation between the aquaculture population and the founder population. All Arlequin methods identified an overlapping subset of 17 outlier loci, three of which were also identified by BayeScan. Many outlier loci were near candidate genes and some were near published quantitative trait loci (QTLs) for growth, appetite, maturity, or disease resistance. Parallel comparisons using a wild, nonfounder population (Stewiacke River) yielded only one overlapping outlier locus as well as a known maturity QTL. We conclude that genome scans comparing a recently domesticated strain with its wild founder population can facilitate identification of candidate genes for traits known to have been under strong artificial selection.
Detection of outliers in the response and explanatory variables of the simple circular regression model

NASA Astrophysics Data System (ADS)

Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

2016-06-01

The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Sources of Artefacts in Synthetic Aperture Radar Interferometry Data Sets

NASA Astrophysics Data System (ADS)

Becek, K.; Borkowski, A.

2012-07-01

In recent years, much attention has been devoted to digital elevation models (DEMs) produced using Synthetic Aperture Radar Interferometry (InSAR). This has been triggered by the relative novelty of the InSAR method and its world-famous product—the Shuttle Radar Topography Mission (SRTM) DEM. However, much less attention, if at all, has been paid to sources of artefacts in SRTM. In this work, we focus not on the missing pixels (null pixels) due to shadows or the layover effect, but rather on outliers that were undetected by the SRTM validation process. The aim of this study is to identify some of the causes of the elevation outliers in SRTM. Such knowledge may be helpful to mitigate similar problems in future InSAR DEMs, notably the ones currently being developed from data acquired by the TanDEM-X mission. We analysed many cross-sections derived from SRTM. These cross-sections were extracted over the elevation test areas, which are available from the Global Elevation Data Testing Facility (GEDTF) whose database contains about 8,500 runways with known vertical profiles. Whenever a significant discrepancy between the known runway profile and the SRTM cross-section was detected, a visual interpretation of the high-resolution satellite image was carried out to identify the objects causing the irregularities. A distance and a bearing from the outlier to the object were recorded. Moreover, we considered the SRTM look direction parameter. A comprehensive analysis of the acquired data allows us to establish that large metallic structures, such as hangars or car parking lots, are causing the outliers. Water areas or plain wet terrains may also cause an InSAR outlier. The look direction and the depression angle of the InSAR system in relation to the suspected objects influence the magnitude of the outliers. We hope that these findings will be helpful in designing the error detection routines of future InSAR or, in fact, any microwave aerial- or space-based survey. The presence of outliers in SRTM was first reported in Becek, K. (2008). Investigating error structure of shuttle radar topography mission elevation data product, Geophys. Res. Lett., 35, L15403.
Spatial detection of tv channel logos as outliers from the content

NASA Astrophysics Data System (ADS)

Ekin, Ahmet; Braspenning, Ralph

2006-01-01

This paper proposes a purely image-based TV channel logo detection algorithm that can detect logos independently from their motion and transparency features. The proposed algorithm can robustly detect any type of logos, such as transparent and animated, without requiring any temporal constraints whereas known methods have to wait for the occurrence of large motion in the scene and assume stationary logos. The algorithm models logo pixels as outliers from the actual scene content that is represented by multiple 3-D histograms in the YC BC R space. We use four scene histograms corresponding to each of the four corners because the content characteristics change from one image corner to another. A further novelty of the proposed algorithm is that we define image corners and the areas where we compute the scene histograms by a cinematic technique called Golden Section Rule that is used by professionals. The robustness of the proposed algorithm is demonstrated over a dataset of representative TV content.
Evaluating Effect of Albendazole on Trichuris trichiura Infection: A Systematic Review Article.

PubMed

Ahmadi Jouybari, Toraj; Najaf Ghobadi, Khadije; Lotfi, Bahare; Alavi Majd, Hamid; Ahmadi, Nayeb Ali; Rostami-Nejad, Mohammad; Aghaei, Abbas

2016-01-01

The aim of the study was assessment of defaults and conducted meta-analysis of the efficacy of single-dose oral albendazole against T. trichiura infection. We searched PubMed, ISI Web of Science, Science Direct, the Cochrane Central Register of Controlled Trials, and WHO library databases between 1983 and 2014. Data from 13 clinical trial articles were used. Each article was included the effect of single oral dose (400 mg) albendazole and placebo in treating two groups of patients with T. trichiura infection. For both groups in each article, sample size, the number of those with T. trichiura infection, and the number of those recovered following the intake of albendazole were identified and recorded. The relative risk and variance were computed. Funnel plot, Beggs and Eggers tests were used for assessment of publication bias. The random effect variance shift outlier model and likelihood ratio test were applied for detecting outliers. In order to detect influence, DFFITS values, Cook's distances and COVRATIO were used. Data were analyzed using STATA and R software. The article number 13 and 9 were outlier and influence, respectively. Outlier is diagnosed by variance shift of target study in inferential method and by RR value in graphical method. Funnel plot and Beggs test did not show the publication bias ( P =0.272). However, the Eggers test confirmed it ( P =0.034). Meta-analysis after removal of article 13 showed that relative risk was 1.99 (CI 95% 1.71 - 2.31). The estimated RR and our meta-analyses show that treatment of T. trichiura with single oral doses of albendazole is unsatisfactory. New anthelminthics are urgently needed.
Data quality enhancement and knowledge discovery from relevant signals in acoustic emission

NASA Astrophysics Data System (ADS)

Mejia, Felipe; Shyu, Mei-Ling; Nanni, Antonio

2015-10-01

The increasing popularity of structural health monitoring has brought with it a growing need for automated data management and data analysis tools. Of great importance are filters that can systematically detect unwanted signals in acoustic emission datasets. This study presents a semi-supervised data mining scheme that detects data belonging to unfamiliar distributions. This type of outlier detection scheme is useful detecting the presence of new acoustic emission sources, given a training dataset of unwanted signals. In addition to classifying new observations (herein referred to as "outliers") within a dataset, the scheme generates a decision tree that classifies sub-clusters within the outlier context set. The obtained tree can be interpreted as a series of characterization rules for newly-observed data, and they can potentially describe the basic structure of different modes within the outlier distribution. The data mining scheme is first validated on a synthetic dataset, and an attempt is made to confirm the algorithms' ability to discriminate outlier acoustic emission sources from a controlled pencil-lead-break experiment. Finally, the scheme is applied to data from two fatigue crack-growth steel specimens, where it is shown that extracted rules can adequately describe crack-growth related acoustic emission sources while filtering out background "noise." Results show promising performance in filter generation, thereby allowing analysts to extract, characterize, and focus only on meaningful signals.
Identification and influence of spatio-temporal outliers in urban air quality measurements.

PubMed

O'Leary, Brendan; Reiners, John J; Xu, Xiaohong; Lemke, Lawrence D

2016-12-15

Forty eight potential outliers in air pollution measurements taken simultaneously in Detroit, Michigan, USA and Windsor, Ontario, Canada in 2008 and 2009 were identified using four independent methods: box plots, variogram clouds, difference maps, and the Local Moran's I statistic. These methods were subsequently used in combination to reduce and select a final set of 13 outliers for nitrogen dioxide (NO 2 ), volatile organic compounds (VOCs), total benzene, toluene, ethyl benzene, and xylene (BTEX), and particulate matter in two size fractions (PM 2.5 and PM 10 ). The selected outliers were excluded from the measurement datasets and used to revise air pollution models. In addition, a set of temporally-scaled air pollution models was generated using time series measurements from community air quality monitors, with and without the selected outliers. The influence of outlier exclusion on associations with asthma exacerbation rates aggregated at a postal zone scale in both cities was evaluated. Results demonstrate that the inclusion or exclusion of outliers influences the strength of observed associations between intraurban air quality and asthma exacerbation in both cities. The box plot, variogram cloud, and difference map methods largely determined the final list of outliers, due to the high degree of conformity among their results. The Moran's I approach was not useful for outlier identification in the datasets studied. Removing outliers changed the spatial distribution of modeled concentration values and derivative exposure estimates averaged over postal zones. Overall, associations between air pollution and acute asthma exacerbation rates were weaker with outliers removed, but improved with the addition of temporal information. Decreases in statistically significant associations between air pollution and asthma resulted, in part, from smaller pollutant concentration ranges used for linear regression. Nevertheless, the practice of identifying outliers through congruence among multiple methods strengthens confidence in the analysis of outlier presence and influence in environmental datasets. Copyright Â© 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Robust volcano plot: identification of differential metabolites in the presence of outliers.

PubMed

Kumar, Nishith; Hoque, Md Aminul; Sugimoto, Masahiro

2018-04-11

The identification of differential metabolites in metabolomics is still a big challenge and plays a prominent role in metabolomics data analyses. Metabolomics datasets often contain outliers because of analytical, experimental, and biological ambiguity, but the currently available differential metabolite identification techniques are sensitive to outliers. We propose a kernel weight based outlier-robust volcano plot for identifying differential metabolites from noisy metabolomics datasets. Two numerical experiments are used to evaluate the performance of the proposed technique against nine existing techniques, including the t-test and the Kruskal-Wallis test. Artificially generated data with outliers reveal that the proposed method results in a lower misclassification error rate and a greater area under the receiver operating characteristic curve compared with existing methods. An experimentally measured breast cancer dataset to which outliers were artificially added reveals that our proposed method produces only two non-overlapping differential metabolites whereas the other nine methods produced between seven and 57 non-overlapping differential metabolites. Our data analyses show that the performance of the proposed differential metabolite identification technique is better than that of existing methods. Thus, the proposed method can contribute to analysis of metabolomics data with outliers. The R package and user manual of the proposed method are available at https://github.com/nishithkumarpaul/Rvolcano .
Improvement of statistical methods for detecting anomalies in climate and environmental monitoring systems

NASA Astrophysics Data System (ADS)

Yakunin, A. G.; Hussein, H. M.

2018-01-01

The article shows how the known statistical methods, which are widely used in solving financial problems and a number of other fields of science and technology, can be effectively applied after minor modification for solving such problems in climate and environment monitoring systems, as the detection of anomalies in the form of abrupt changes in signal levels, the occurrence of positive and negative outliers and the violation of the cycle form in periodic processes.
Analyzing contentious relationships and outlier genes in phylogenomics.

PubMed

Walker, Joseph F; Brown, Joseph W; Smith, Stephen A

2018-06-08

Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Here, we examined two datasets where supermatrix and coalescent-based species trees conflict. We identified two highly influential "outlier" genes in each dataset. When removed from each dataset, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate dataset have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant dataset did not exhibit any obvious systematic error and therefore may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting (ILS). Here we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic datasets that does not assume a single topology for all genes. For both datasets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic datasets by asking more targeted edge-based questions.
The Outlier Detection for Ordinal Data Using Scalling Technique of Regression Coefficients

NASA Astrophysics Data System (ADS)

Adnan, Arisman; Sugiarto, Sigit

2017-06-01

The aims of this study is to detect the outliers by using coefficients of Ordinal Logistic Regression (OLR) for the case of k category responses where the score from 1 (the best) to 8 (the worst). We detect them by using the sum of moduli of the ordinal regression coefficients calculated by jackknife technique. This technique is improved by scalling the regression coefficients to their means. R language has been used on a set of ordinal data from reference distribution. Furthermore, we compare this approach by using studentised residual plots of jackknife technique for ANOVA (Analysis of Variance) and OLR. This study shows that the jackknifing technique along with the proper scaling may lead us to reveal outliers in ordinal regression reasonably well.
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

ERIC Educational Resources Information Center

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

2013-01-01

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

Supervised Outlier Detection in Large-Scale Mvs Point Clouds for 3d City Modeling Applications

NASA Astrophysics Data System (ADS)

Stucker, C.; Richard, A.; Wegner, J. D.; Schindler, K.

2018-05-01

We propose to use a discriminative classifier for outlier detection in large-scale point clouds of cities generated via multi-view stereo (MVS) from densely acquired images. What makes outlier removal hard are varying distributions of inliers and outliers across a scene. Heuristic outlier removal using a specific feature that encodes point distribution often delivers unsatisfying results. Although most outliers can be identified correctly (high recall), many inliers are erroneously removed (low precision), too. This aggravates object 3D reconstruction due to missing data. We thus propose to discriminatively learn class-specific distributions directly from the data to achieve high precision. We apply a standard Random Forest classifier that infers a binary label (inlier or outlier) for each 3D point in the raw, unfiltered point cloud and test two approaches for training. In the first, non-semantic approach, features are extracted without considering the semantic interpretation of the 3D points. The trained model approximates the average distribution of inliers and outliers across all semantic classes. Second, semantic interpretation is incorporated into the learning process, i.e. we train separate inlieroutlier classifiers per semantic class (building facades, roof, ground, vegetation, fields, and water). Performance of learned filtering is evaluated on several large SfM point clouds of cities. We find that results confirm our underlying assumption that discriminatively learning inlier-outlier distributions does improve precision over global heuristics by up to ≍ 12 percent points. Moreover, semantically informed filtering that models class-specific distributions further improves precision by up to ≍ 10 percent points, being able to remove very isolated building, roof, and water points while preserving inliers on building facades and vegetation.
Outlier identification and visualization for Pb concentrations in urban soils and its implications for identification of potential contaminated land.

PubMed

Zhang, Chaosheng; Tang, Ya; Luo, Lin; Xu, Weilin

2009-11-01

Outliers in urban soil geochemical databases may imply potential contaminated land. Different methodologies which can be easily implemented for the identification of global and spatial outliers were applied for Pb concentrations in urban soils of Galway City in Ireland. Due to its strongly skewed probability feature, a Box-Cox transformation was performed prior to further analyses. The graphic methods of histogram and box-and-whisker plot were effective in identification of global outliers at the original scale of the dataset. Spatial outliers could be identified by a local indicator of spatial association of local Moran's I, cross-validation of kriging, and a geographically weighted regression. The spatial locations of outliers were visualised using a geographical information system. Different methods showed generally consistent results, but differences existed. It is suggested that outliers identified by statistical methods should be confirmed and justified using scientific knowledge before they are properly dealt with.
Risk adjustment in the American College of Surgeons National Surgical Quality Improvement Program: a comparison of logistic versus hierarchical modeling.

PubMed

Cohen, Mark E; Dimick, Justin B; Bilimoria, Karl Y; Ko, Clifford Y; Richards, Karen; Hall, Bruce Lee

2009-12-01

Although logistic regression has commonly been used to adjust for risk differences in patient and case mix to permit quality comparisons across hospitals, hierarchical modeling has been advocated as the preferred methodology, because it accounts for clustering of patients within hospitals. It is unclear whether hierarchical models would yield important differences in quality assessments compared with logistic models when applied to American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) data. Our objective was to evaluate differences in logistic versus hierarchical modeling for identifying hospitals with outlying outcomes in the ACS-NSQIP. Data from ACS-NSQIP patients who underwent colorectal operations in 2008 at hospitals that reported at least 100 operations were used to generate logistic and hierarchical prediction models for 30-day morbidity and mortality. Differences in risk-adjusted performance (ratio of observed-to-expected events) and outlier detections from the two models were compared. Logistic and hierarchical models identified the same 25 hospitals as morbidity outliers (14 low and 11 high outliers), but the hierarchical model identified 2 additional high outliers. Both models identified the same eight hospitals as mortality outliers (five low and three high outliers). The values of observed-to-expected events ratios and p values from the two models were highly correlated. Results were similar when data were permitted from hospitals providing < 100 patients. When applied to ACS-NSQIP data, logistic and hierarchical models provided nearly identical results with respect to identification of hospitals' observed-to-expected events ratio outliers. As hierarchical models are prone to implementation problems, logistic regression will remain an accurate and efficient method for performing risk adjustment of hospital quality comparisons.
Measurement Consistency from Magnetic Resonance Images

PubMed Central

Chung, Dongjun; Chung, Moo K.; Durtschi, Reid B.; Lindell, R. Gentry; Vorperian, Houri K.

2010-01-01

Rationale and Objectives In quantifying medical images, length-based measurements are still obtained manually. Due to possible human error, a measurement protocol is required to guarantee the consistency of measurements. In this paper, we review various statistical techniques that can be used in determining measurement consistency. The focus is on detecting a possible measurement bias and determining the robustness of the procedures to outliers. Materials and Methods We review correlation analysis, linear regression, Bland-Altman method, paired t-test, and analysis of variance (ANOVA). These techniques were applied to measurements, obtained by two raters, of head and neck structures from magnetic resonance images (MRI). Results The correlation analysis and the linear regression were shown to be insufficient for detecting measurement inconsistency. They are also very sensitive to outliers. The widely used Bland-Altman method is a visualization technique so it lacks the numerical quantification. The paired t-test tends to be sensitive to small measurement bias. On the other hand, ANOVA performs well even under small measurement bias. Conclusion In almost all cases, using only one method is insufficient and it is recommended to use several methods simultaneously. In general, ANOVA performs the best. PMID:18790405
Real-time detection of organic contamination events in water distribution systems by principal components analysis of ultraviolet spectral data.

PubMed

Zhang, Jian; Hou, Dibo; Wang, Ke; Huang, Pingjie; Zhang, Guangxin; Loáiciga, Hugo

2017-05-01

The detection of organic contaminants in water distribution systems is essential to protect public health from potential harmful compounds resulting from accidental spills or intentional releases. Existing methods for detecting organic contaminants are based on quantitative analyses such as chemical testing and gas/liquid chromatography, which are time- and reagent-consuming and involve costly maintenance. This study proposes a novel procedure based on discrete wavelet transform and principal component analysis for detecting organic contamination events from ultraviolet spectral data. Firstly, the spectrum of each observation is transformed using discrete wavelet with a coiflet mother wavelet to capture the abrupt change along the wavelength. Principal component analysis is then employed to approximate the spectra based on capture and fusion features. The significant value of Hotelling's T 2 statistics is calculated and used to detect outliers. An alarm of contamination event is triggered by sequential Bayesian analysis when the outliers appear continuously in several observations. The effectiveness of the proposed procedure is tested on-line using a pilot-scale setup and experimental data.
Outlier Detection in Infrared Signatures

DTIC Science & Technology

1992-01-01

for model idcntification. Gnanadcsikan (1977) pointed out that Hampci’s influence function (Huampcl (1974)) can bc used to estimate the effect...individual outliers have on sample estimates of parameters. Chernick noted that the influence function for parameters of intcrcst to the users of a data...important outliers, while those with amall estimated influence are not). In this way the influence function provides a "distance" measure for multi
Unsupervised universal steganalyzer for high-dimensional steganalytic features

NASA Astrophysics Data System (ADS)

Hou, Xiaodan; Zhang, Tao

2016-11-01

The research in developing steganalytic features has been highly successful. These features are extremely powerful when applied to supervised binary classification problems. However, they are incompatible with unsupervised universal steganalysis because the unsupervised method cannot distinguish embedding distortion from varying levels of noises caused by cover variation. This study attempts to alleviate the problem by introducing similarity retrieval of image statistical properties (SRISP), with the specific aim of mitigating the effect of cover variation on the existing steganalytic features. First, cover images with some statistical properties similar to those of a given test image are searched from a retrieval cover database to establish an aided sample set. Then, unsupervised outlier detection is performed on a test set composed of the given test image and its aided sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called SRISP-aided unsupervised outlier detection, requires no training. Thus, it does not suffer from model mismatch mess. Compared with prior unsupervised outlier detectors that do not consider SRISP, the proposed framework not only retains the universality but also exhibits superior performance when applied to high-dimensional steganalytic features.
Atomic absorption spectrophotometric determination of tin in canned foods, using nitric acid-hydrochloric acid digestion and nitrous oxide-acetylene flame: collaborative study.

PubMed

Dabeka, R W; McKenzie, A D; Albert, R H

1985-01-01

Twenty-six collaborators participated in a study to evaluate an atomic absorption spectrophotometric (AAS) method for the determination of tin in canned foods. The 5 foods evaluated were meat, pineapple juice, tomato paste, evaporated milk, and green beans, each spiked at 2 levels. The concentration range of tin in the samples was 10-450 micrograms/g, and each level was sent as a blind duplicate. Statistical treatment of results revealed no laboratory outliers and 6 individual or replicate-total outliers, accounting for 3.3% of the data. Repeatability (within-laboratory) coefficient of variation (CVo) ranged from 2.2 to 48%, depending on the tin level and food evaluated. For samples containing greater than or equal to 80 micrograms/g of tin, repeatability CV averaged 5.6% including outliers and 3.7% after their rejection. Overall among-laboratories coefficient of variation (CVx) varied from 3.3 to 58%; at levels greater than or equal to 80 micrograms/g, it averaged 7.3% with outliers and 5.3% after their rejection. Recovery of tin, based on spiking levels, ranged from 100.0 to 112.8% and averaged 105.4%. Detection limit range is 2-10 micrograms/g, and lower quantitation limit is 40 micrograms/g. This method has been adopted official first action.
Detection of Outliers in TWSTFT Data Used in TAI

DTIC Science & Technology

2009-11-01

41st Annual Precise Time and Time Interval (PTTI) Meeting 421 DETECTION OF OUTLIERS IN TWSTFT DATA USED IN TAI A...data in two-way satellite time and frequency transfer ( TWSTFT ) time links. In the case of TWSTFT data used to calculate International Atomic Time...data; that TWSTFT links can show an underlying slope which renders the standard treatment more difficult. Using phase and frequency filtering
Automated Detection of Knickpoints and Knickzones Across Transient Landscapes

NASA Astrophysics Data System (ADS)

Gailleton, B.; Mudd, S. M.; Clubb, F. J.

2017-12-01

Mountainous regions are ubiquitously dissected by river channels, which transmit climate and tectonic signals to the rest of the landscape by adjusting their long profiles. Fluvial response to allogenic forcing is often expressed through the upstream propagation of steepened reaches, referred to as knickpoints or knickzones. The identification and analysis of these steepened reaches has numerous applications in geomorphology, such as modelling long-term landscape evolution, understanding controls on fluvial incision, and constraining tectonic uplift histories. Traditionally, the identification of knickpoints or knickzones from fluvial profiles requires manual selection or calibration. This process is both time-consuming and subjective, as different workers may select different steepened reaches within the profile. We propose an objective, statistically-based method to systematically pick knickpoints/knickzones on a landscape scale using an outlier-detection algorithm. Our method integrates river profiles normalised by drainage area (Chi, using the approach of Perron and Royden, 2013), then separates the chi-elevation plots into a series of transient segments using the method of Mudd et al. (2014). This method allows the systematic detection of knickpoints across a DEM, regardless of size, using a high-performance algorithm implemented in the open-source Edinburgh Land Surface Dynamics Topographic Tools (LSDTopoTools) software package. After initial knickpoint identification, outliers are selected using several sorting and binning methods based on the Median Absolute Deviation, to avoid the influence sample size. We test our method on a series of DEMs and grid resolutions, and show that our method consistently identifies accurate knickpoint locations across each landscape tested.
New approach for the identification of implausible values and outliers in longitudinal childhood anthropometric data.

PubMed

Shi, Joy; Korsiak, Jill; Roth, Daniel E

2018-03-01

We aimed to demonstrate the use of jackknife residuals to take advantage of the longitudinal nature of available growth data in assessing potential biologically implausible values and outliers. Artificial errors were induced in 5% of length, weight, and head circumference measurements, measured on 1211 participants from the Maternal Vitamin D for Infant Growth (MDIG) trial from birth to 24 months of age. Each child's sex- and age-standardized z-score or raw measurements were regressed as a function of age in child-specific models. Each error responsible for a biologically implausible decrease between a consecutive pair of measurements was identified based on the higher of the two absolute values of jackknife residuals in each pair. In further analyses, outliers were identified as those values beyond fixed cutoffs of the jackknife residuals (e.g., greater than +5 or less than -5 in primary analyses). Kappa, sensitivity, and specificity were calculated over 1000 simulations to assess the ability of the jackknife residual method to detect induced errors and to compare these methods with the use of conditional growth percentiles and conventional cross-sectional methods. Among the induced errors that resulted in a biologically implausible decrease in measurement between two consecutive values, the jackknife residual method identified the correct value in 84.3%-91.5% of these instances when applied to the sex- and age-standardized z-scores, with kappa values ranging from 0.685 to 0.795. Sensitivity and specificity of the jackknife method were higher than those of the conditional growth percentile method, but specificity was lower than for conventional cross-sectional methods. Using jackknife residuals provides a simple method to identify biologically implausible values and outliers in longitudinal child growth data sets in which each child contributes at least 4 serial measurements. Crown Copyright © 2018. Published by Elsevier Inc. All rights reserved.
Cross-visit tumor sub-segmentation and registration with outlier rejection for dynamic contrast-enhanced MRI time series data.

PubMed

Buonaccorsi, G A; Rose, C J; O'Connor, J P B; Roberts, C; Watson, Y; Jackson, A; Jayson, G C; Parker, G J M

2010-01-01

Clinical trials of anti-angiogenic and vascular-disrupting agents often use biomarkers derived from DCE-MRI, typically reporting whole-tumor summary statistics and so overlooking spatial parameter variations caused by tissue heterogeneity. We present a data-driven segmentation method comprising tracer-kinetic model-driven registration for motion correction, conversion from MR signal intensity to contrast agent concentration for cross-visit normalization, iterative principal components analysis for imputation of missing data and dimensionality reduction, and statistical outlier detection using the minimum covariance determinant to obtain a robust Mahalanobis distance. After applying these techniques we cluster in the principal components space using k-means. We present results from a clinical trial of a VEGF inhibitor, using time-series data selected because of problems due to motion and outlier time series. We obtained spatially-contiguous clusters that map to regions with distinct microvascular characteristics. This methodology has the potential to uncover localized effects in trials using DCE-MRI-based biomarkers.
Robustly detecting differential expression in RNA sequencing data using observation weights

PubMed Central

Zhou, Xiaobei; Lindsay, Helen; Robinson, Mark D.

2014-01-01

A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of ‘sharing of information’ across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/. PMID:24753412
A Geometrical-Statistical Approach to Outlier Removal for TDOA Measurements

NASA Astrophysics Data System (ADS)

Compagnoni, Marco; Pini, Alessia; Canclini, Antonio; Bestagini, Paolo; Antonacci, Fabio; Tubaro, Stefano; Sarti, Augusto

2017-08-01

The curse of outlier measurements in estimation problems is a well known issue in a variety of fields. Therefore, outlier removal procedures, which enables the identification of spurious measurements within a set, have been developed for many different scenarios and applications. In this paper, we propose a statistically motivated outlier removal algorithm for time differences of arrival (TDOAs), or equivalently range differences (RD), acquired at sensor arrays. The method exploits the TDOA-space formalism and works by only knowing relative sensor positions. As the proposed method is completely independent from the application for which measurements are used, it can be reliably used to identify outliers within a set of TDOA/RD measurements in different fields (e.g. acoustic source localization, sensor synchronization, radar, remote sensing, etc.). The proposed outlier removal algorithm is validated by means of synthetic simulations and real experiments.
Outlier Detection in High-Stakes Certification Testing. Research Report.

ERIC Educational Resources Information Center

Meijer, Rob R.

Recent developments of person-fit analysis in computerized adaptive testing (CAT) are discussed. Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to…
Outlier Detection in High-Stakes Certification Testing.

ERIC Educational Resources Information Center

Meijer, Rob R.

2002-01-01

Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)
Evaluation of statistical protocols for quality control of ecosystem carbon dioxide fluxes

Treesearch

Jorge F. Perez-Quezada; Nicanor Z. Saliendra; William E. Emmerich; Emilio A. Laca

2007-01-01

The process of quality control of micrometeorological and carbon dioxide (CO2) flux data can be subjective and may lack repeatability, which would undermine the results of many studies. Multivariate statistical methods and time series analysis were used together and independently to detect and replace outliers in CO2 flux...
A feasibility study in adapting Shamos Bickel and Hodges Lehman estimator into T-Method for normalization

NASA Astrophysics Data System (ADS)

Harudin, N.; Jamaludin, K. R.; Muhtazaruddin, M. Nabil; Ramlie, F.; Muhamad, Wan Zuki Azman Wan

2018-03-01

T-Method is one of the techniques governed under Mahalanobis Taguchi System that developed specifically for multivariate data predictions. Prediction using T-Method is always possible even with very limited sample size. The user of T-Method required to clearly understanding the population data trend since this method is not considering the effect of outliers within it. Outliers may cause apparent non-normality and the entire classical methods breakdown. There exist robust parameter estimate that provide satisfactory results when the data contain outliers, as well as when the data are free of them. The robust parameter estimates of location and scale measure called Shamos Bickel (SB) and Hodges Lehman (HL) which are used as a comparable method to calculate the mean and standard deviation of classical statistic is part of it. Embedding these into T-Method normalize stage feasibly help in enhancing the accuracy of the T-Method as well as analysing the robustness of T-method itself. However, the result of higher sample size case study shows that T-method is having lowest average error percentages (3.09%) on data with extreme outliers. HL and SB is having lowest error percentages (4.67%) for data without extreme outliers with minimum error differences compared to T-Method. The error percentages prediction trend is vice versa for lower sample size case study. The result shows that with minimum sample size, which outliers always be at low risk, T-Method is much better on that, while higher sample size with extreme outliers, T-Method as well show better prediction compared to others. For the case studies conducted in this research, it shows that normalization of T-Method is showing satisfactory results and it is not feasible to adapt HL and SB or normal mean and standard deviation into it since it’s only provide minimum effect of percentages errors. Normalization using T-method is still considered having lower risk towards outlier’s effect.
Simple automatic strategy for background drift correction in chromatographic data analysis.

PubMed

Fu, Hai-Yan; Li, He-Dong; Yu, Yong-Jie; Wang, Bing; Lu, Peng; Cui, Hua-Peng; Liu, Ping-Ping; She, Yuan-Bin

2016-06-03

Chromatographic background drift correction, which influences peak detection and time shift alignment results, is a critical stage in chromatographic data analysis. In this study, an automatic background drift correction methodology was developed. Local minimum values in a chromatogram were initially detected and organized as a new baseline vector. Iterative optimization was then employed to recognize outliers, which belong to the chromatographic peaks, in this vector, and update the outliers in the baseline until convergence. The optimized baseline vector was finally expanded into the original chromatogram, and linear interpolation was employed to estimate background drift in the chromatogram. The principle underlying the proposed method was confirmed using a complex gas chromatographic dataset. Finally, the proposed approach was applied to eliminate background drift in liquid chromatography quadrupole time-of-flight samples used in the metabolic study of Escherichia coli samples. The proposed method was comparable with three classical techniques: morphological weighted penalized least squares, moving window minimum value strategy and background drift correction by orthogonal subspace projection. The proposed method allows almost automatic implementation of background drift correction, which is convenient for practical use. Copyright © 2016 Elsevier B.V. All rights reserved.
The effectiveness of robust RMCD control chart as outliers’ detector

NASA Astrophysics Data System (ADS)

Darmanto; Astutik, Suci

2017-12-01

A well-known control chart to monitor a multivariate process is Hotelling’s T 2 which its parameters are estimated classically, very sensitive and also marred by masking and swamping of outliers data effect. To overcome these situation, robust estimators are strongly recommended. One of robust estimators is re-weighted minimum covariance determinant (RMCD) which has robust characteristics as same as MCD. In this paper, the effectiveness term is accuracy of the RMCD control chart in detecting outliers as real outliers. In other word, how effectively this control chart can identify and remove masking and swamping effects of outliers. We assessed the effectiveness the robust control chart based on simulation by considering different scenarios: n sample sizes, proportion of outliers, number of p quality characteristics. We found that in some scenarios, this RMCD robust control chart works effectively.

A Comparison of Best Fit Lines for Data with Outliers

ERIC Educational Resources Information Center

Glaister, P.

2005-01-01

Three techniques for determining a straight line fit to data are compared. The methods are applied to a range of datasets containing one or more outliers, and to a specific example from the field of chemistry. For the method which is the most resistant to the presence of outliers, a Microsoft Excel spreadsheet, as well as two Matlab routines, are…
On damage detection in wind turbine gearboxes using outlier analysis

NASA Astrophysics Data System (ADS)

Antoniadou, Ifigeneia; Manson, Graeme; Dervilis, Nikolaos; Staszewski, Wieslaw J.; Worden, Keith

2012-04-01

The proportion of worldwide installed wind power in power systems increases over the years as a result of the steadily growing interest in renewable energy sources. Still, the advantages offered by the use of wind power are overshadowed by the high operational and maintenance costs, resulting in the low competitiveness of wind power in the energy market. In order to reduce the costs of corrective maintenance, the application of condition monitoring to gearboxes becomes highly important, since gearboxes are among the wind turbine components with the most frequent failure observations. While condition monitoring of gearboxes in general is common practice, with various methods having been developed over the last few decades, wind turbine gearbox condition monitoring faces a major challenge: the detection of faults under the time-varying load conditions prevailing in wind turbine systems. Classical time and frequency domain methods fail to detect faults under variable load conditions, due to the temporary effect that these faults have on vibration signals. This paper uses the statistical discipline of outlier analysis for the damage detection of gearbox tooth faults. A simplified two-degree-of-freedom gearbox model considering nonlinear backlash, time-periodic mesh stiffness and static transmission error, simulates the vibration signals to be analysed. Local stiffness reduction is used for the simulation of tooth faults and statistical processes determine the existence of intermittencies. The lowest level of fault detection, the threshold value, is considered and the Mahalanobis squared-distance is calculated for the novelty detection problem.
A subagging regression method for estimating the qualitative and quantitative state of groundwater

NASA Astrophysics Data System (ADS)

Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young

2017-08-01

A subsample aggregating (subagging) regression (SBR) method for the analysis of groundwater data pertaining to trend-estimation-associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of other methods, and the uncertainties are reasonably estimated; the others have no uncertainty analysis option. To validate further, actual groundwater data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by both SBR and GPR regardless of Gaussian or non-Gaussian skewed data. However, it is expected that GPR has a limitation in applications to severely corrupted data by outliers owing to its non-robustness. From the implementations, it is determined that the SBR method has the potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data such as the groundwater level and contaminant concentration.
Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

NASA Astrophysics Data System (ADS)

Gusriani, N.; Firdaniza

2018-03-01

The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
Optimum outlier model for potential improvement of environmental cleaning and disinfection.

PubMed

Rupp, Mark E; Huerta, Tomas; Cavalieri, R J; Lyden, Elizabeth; Van Schooneveld, Trevor; Carling, Philip; Smith, Philip W

2014-06-01

The effectiveness and efficiency of 17 housekeepers in terminal cleaning 292 hospital rooms was evaluated through adenosine triphosphate detection. A subgroup of housekeepers was identified who were significantly more effective and efficient than their coworkers. These optimum outliers may be used in performance improvement to optimize environmental cleaning.
Baseline Estimation and Outlier Identification for Halocarbons

NASA Astrophysics Data System (ADS)

Wang, D.; Schuck, T.; Engel, A.; Gallman, F.

2017-12-01

The aim of this paper is to build a baseline model for halocarbons and to statistically identify the outliers under specific conditions. In this paper, time series of regional CFC-11 and Chloromethane measurements was discussed, which taken over the last 4 years at two locations, including a monitoring station at northwest of Frankfurt am Main (Germany) and Mace Head station (Ireland). In addition to analyzing time series of CFC-11 and Chloromethane, more importantly, a statistical approach of outlier identification is also introduced in this paper in order to make a better estimation of baseline. A second-order polynomial plus harmonics are fitted to CFC-11 and chloromethane mixing ratios data. Measurements with large distance to the fitting curve are regard as outliers and flagged. Under specific requirement, the routine is iteratively adopted without the flagged measurements until no additional outliers are found. Both model fitting and the proposed outlier identification method are realized with the help of a programming language, Python. During the period, CFC-11 shows a gradual downward trend. And there is a slightly upward trend in the mixing ratios of Chloromethane. The concentration of chloromethane also has a strong seasonal variation, mostly due to the seasonal cycle of OH. The usage of this statistical method has a considerable effect on the results. This method efficiently identifies a series of outliers according to the standard deviation requirements. After removing the outliers, the fitting curves and trend estimates are more reliable.
Iterative Ellipsoidal Trimming.

DTIC Science & Technology

1980-02-11

to above. Iterative ellipsoidal trimming has been investigated before by other statisticians, most notably by Gnanadesikan and his coworkers...J., Gnanadesikan R., and Kettenring, J. R. (1975). "Robust estimation and outlier detection with correlation coefficients." Biometrika. 62, 531-45. [6...Duda, Richard, and Hart, Peter (1973). Pattern Classification and Scene Analysis. Wiley, New York. [7] Gnanadesikan , R. (1977). Methods for
DOE Office of Scientific and Technical Information (OSTI.GOV)

Nun, Isadora; Pichara, Karim; Protopapas, Pavlos

The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each ofmore » the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.« less
Analysis of lightning outliers in the EUCLID network

NASA Astrophysics Data System (ADS)

Poelman, Dieter R.; Schulz, Wolfgang; Kaltenboeck, Rudolf; Delobbe, Laurent

2017-11-01

Lightning data as observed by the European Cooperation for Lightning Detection (EUCLID) network are used in combination with radar data to retrieve the temporal and spatial behavior of lightning outliers, i.e., discharges located in a wrong place, over a 5-year period from 2011 to 2016. Cloud-to-ground (CG) stroke and intracloud (IC) pulse data are superimposed on corresponding 5 min radar precipitation fields in two topographically different areas, Belgium and Austria, in order to extract lightning outliers based on the distance between each lightning event and the nearest precipitation. It is shown that the percentage of outliers is sensitive to changes in the network and to the location algorithm itself. The total percentage of outliers for both regions varies over the years between 0.8 and 1.7 % for a distance to the nearest precipitation of 2 km, with an average of approximately 1.2 % in Belgium and Austria. Outside the European summer thunderstorm season, the percentage of outliers tends to increase somewhat. The majority of all the outliers are low peak current events with absolute values falling between 0 and 10 kA. More specifically, positive cloud-to-ground strokes are more likely to be classified as outliers compared to all other types of discharges. Furthermore, it turns out that the number of sensors participating in locating a lightning discharge is different for outliers versus correctly located events, with outliers having the lowest amount of sensors participating. In addition, it is shown that in most cases the semi-major axis (SMA) assigned to a lightning discharge as a confidence indicator in the location accuracy (LA) is smaller for correctly located events compared to the semi-major axis of outliers.
Privacy Preserving Nearest Neighbor Search

NASA Astrophysics Data System (ADS)

Shaneck, Mark; Kim, Yongdae; Kumar, Vipin

Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation
Detecting Anomalies from End-to-End Internet Performance Measurements (PingER) Using Cluster Based Local Outlier Factor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie

PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less
Detecting Anomalies from End-to-End Internet Performance Measurements (PingER) Using Cluster Based Local Outlier Factor

DOE PAGES

Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie; ...

2018-05-28

PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less
A New Secondary Structure Assignment Algorithm Using Cα Backbone Fragments

PubMed Central

Cao, Chen; Wang, Guishen; Liu, An; Xu, Shutan; Wang, Lincong; Zou, Shuxue

2016-01-01

The assignment of secondary structure elements in proteins is a key step in the analysis of their structures and functions. We have developed an algorithm, SACF (secondary structure assignment based on Cα fragments), for secondary structure element (SSE) assignment based on the alignment of Cα backbone fragments with central poses derived by clustering known SSE fragments. The assignment algorithm consists of three steps: First, the outlier fragments on known SSEs are detected. Next, the remaining fragments are clustered to obtain the central fragments for each cluster. Finally, the central fragments are used as a template to make assignments. Following a large-scale comparison of 11 secondary structure assignment methods, SACF, KAKSI and PROSS are found to have similar agreement with DSSP, while PCASSO agrees with DSSP best. SACF and PCASSO show preference to reducing residues in N and C cap regions, whereas KAKSI, P-SEA and SEGNO tend to add residues to the terminals when DSSP assignment is taken as standard. Moreover, our algorithm is able to assign subtle helices (310-helix, π-helix and left-handed helix) and make uniform assignments, as well as to detect rare SSEs in β-sheets or long helices as outlier fragments from other programs. The structural uniformity should be useful for protein structure classification and prediction, while outlier fragments underlie the structure–function relationship. PMID:26978354
Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences

NASA Technical Reports Server (NTRS)

Budalakoti, Suratna; Srivastava, Ashok N.; Akella, Ram; Turkov, Eugene

2006-01-01

This paper addresses the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. The approach taken uses unsupervised clustering of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by detailed analysis of outliers to detect anomalies. As the LCS measure is expensive to compute, the first part of the paper discusses existing algorithms, such as the Hunt-Szymanski algorithm, that have low time-complexity. We then discuss why these algorithms often do not work well in practice and present a new hybrid algorithm for computing the LCS that, in our tests, outperforms the Hunt-Szymanski algorithm by a factor of five. The second part of the paper presents new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence, compared to more normal sequences. The algorithms we present are general and domain-independent, so we discuss applications in related areas such as anomaly detection.
On Visualizing Mixed-Type Data: A Joint Metric Approach to Profile Construction and Outlier Detection

ERIC Educational Resources Information Center

Grané, Aurea; Romera, Rosario

2018-01-01

Survey data are usually of mixed type (quantitative, multistate categorical, and/or binary variables). Multidimensional scaling (MDS) is one of the most extended methodologies to visualize the profile structure of the data. Since the past 60s, MDS methods have been introduced in the literature, initially in publications in the psychometrics area.…
Identification of unusual events in multichannel bridge monitoring data using wavelet transform and outlier analysis

NASA Astrophysics Data System (ADS)

Omenzetter, Piotr; Brownjohn, James M. W.; Moyo, Pilate

2003-08-01

Continuously operating instrumented structural health monitoring (SHM) systems are becoming a practical alternative to replace visual inspection for assessment of condition and soundness of civil infrastructure. However, converting large amount of data from an SHM system into usable information is a great challenge to which special signal processing techniques must be applied. This study is devoted to identification of abrupt, anomalous and potentially onerous events in the time histories of static, hourly sampled strains recorded by a multi-sensor SHM system installed in a major bridge structure in Singapore and operating continuously for a long time. Such events may result, among other causes, from sudden settlement of foundation, ground movement, excessive traffic load or failure of post-tensioning cables. A method of outlier detection in multivariate data has been applied to the problem of finding and localizing sudden events in the strain data. For sharp discrimination of abrupt strain changes from slowly varying ones wavelet transform has been used. The proposed method has been successfully tested using known events recorded during construction of the bridge, and later effectively used for detection of anomalous post-construction events.
Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

NASA Astrophysics Data System (ADS)

Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

2015-09-01

Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.
Open-Source Radiation Exposure Extraction Engine (RE3) with Patient-Specific Outlier Detection.

PubMed

Weisenthal, Samuel J; Folio, Les; Kovacs, William; Seff, Ari; Derderian, Vana; Summers, Ronald M; Yao, Jianhua

2016-08-01

We present an open-source, picture archiving and communication system (PACS)-integrated radiation exposure extraction engine (RE3) that provides study-, series-, and slice-specific data for automated monitoring of computed tomography (CT) radiation exposure. RE3 was built using open-source components and seamlessly integrates with the PACS. RE3 calculations of dose length product (DLP) from the Digital imaging and communications in medicine (DICOM) headers showed high agreement (R (2) = 0.99) with the vendor dose pages. For study-specific outlier detection, RE3 constructs robust, automatically updating multivariable regression models to predict DLP in the context of patient gender and age, scan length, water-equivalent diameter (D w), and scanned body volume (SBV). As proof of concept, the model was trained on 811 CT chest, abdomen + pelvis (CAP) exams and 29 outliers were detected. The continuous variables used in the outlier detection model were scan length (R (2) = 0.45), D w (R (2) = 0.70), SBV (R (2) = 0.80), and age (R (2) = 0.01). The categorical variables were gender (male average 1182.7 ± 26.3 and female 1047.1 ± 26.9 mGy cm) and pediatric status (pediatric average 710.7 ± 73.6 mGy cm and adult 1134.5 ± 19.3 mGy cm).
A robust ridge regression approach in the presence of both multicollinearity and outliers in the data

NASA Astrophysics Data System (ADS)

Shariff, Nurul Sima Mohamad; Ferdaos, Nur Aqilah

2017-08-01

Multicollinearity often leads to inconsistent and unreliable parameter estimates in regression analysis. This situation will be more severe in the presence of outliers it will cause fatter tails in the error distributions than the normal distributions. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is expected to be affected by the presence of outliers due to some assumptions imposed in the modeling procedure. Thus, the robust version of existing ridge method with some modification in the inverse matrix and the estimated response value is introduced. The performance of the proposed method is discussed and comparisons are made with several existing estimators namely, Ordinary Least Squares (OLS), ridge regression and robust ridge regression based on GM-estimates. The finding of this study is able to produce reliable parameter estimates in the presence of both multicollinearity and outliers in the data.
An application of robust ridge regression model in the presence of outliers to real data problem

NASA Astrophysics Data System (ADS)

Shariff, N. S. Md.; Ferdaos, N. A.

2017-09-01

Multicollinearity and outliers are often leads to inconsistent and unreliable parameter estimates in regression analysis. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is believed are affected by the presence of outlier. The combination of GM-estimation and ridge parameter that is robust towards both problems is on interest in this study. As such, both techniques are employed to investigate the relationship between stock market price and macroeconomic variables in Malaysia due to curiosity of involving the multicollinearity and outlier problem in the data set. There are four macroeconomic factors selected for this study which are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The results demonstrate that the proposed procedure is able to produce reliable results towards the presence of multicollinearity and outliers in the real data.

A practical method to detect the freezing/thawing onsets of seasonal frozen ground in Alaska

NASA Astrophysics Data System (ADS)

Chen, Xiyu; Liu, Lin

2017-04-01

Microwave remote sensing can provide useful information about freeze/thaw state of soil at the Earth surface. An edge detection method is applied in this study to estimate the onsets of soil freeze/thaw state transition using L band space-borne radiometer data. The Soil Moisture Active Passive (SMAP) mission has a L band radiometer and can provide daily brightness temperature (TB) with horizontal/vertical polarizations. We use the normalized polarization ratios (NPR) calculated based on the Level-1C TB product of SMAP (spatial resolution: 36 km) as the indicator for soil freeze/thaw state, to estimate the freezing and thawing onsets in Alaska in the year of 2015 and 2016. NPR is calculated based on the difference between TB at vertical and horizontal polarizations. Therefore, it is strongly sensitive to liquid water content change in the soil and independent with the soil temperature. Onset estimation is based on the detection of abrupt changes of NPR in transition seasons using edge detection method, and the validation is to compare estimated onsets with the onsets derived from in situ measurement. According to the comparison, the estimated onsets were generally 15 days earlier than the measured onsets in 2015. However, in 2016 there were 4 days in average for the estimation earlier than the measured, which may be due to the less snow cover. Moreover, we extended our estimation to the entire state of Alaska. The estimated freeze/thaw onsets showed a reasonable latitude-dependent distribution although there are still some outliers caused by the noisy variation of NPR. At last, we also try to remove these outliers and improve the performance of the method by smoothing the NPR time series.
SU-E-J-85: Leave-One-Out Perturbation (LOOP) Fitting Algorithm for Absolute Dose Film Calibration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chu, A; Ahmad, M; Chen, Z

2014-06-01

Purpose: To introduce an outliers-recognition fitting routine for film dosimetry. It cannot only be flexible with any linear and non-linear regression but also can provide information for the minimal number of sampling points, critical sampling distributions and evaluating analytical functions for absolute film-dose calibration. Methods: The technique, leave-one-out (LOO) cross validation, is often used for statistical analyses on model performance. We used LOO analyses with perturbed bootstrap fitting called leave-one-out perturbation (LOOP) for film-dose calibration . Given a threshold, the LOO process detects unfit points (“outliers”) compared to other cohorts, and a bootstrap fitting process follows to seek any possibilitiesmore » of using perturbations for further improvement. After that outliers were reconfirmed by a traditional t-test statistics and eliminated, then another LOOP feedback resulted in the final. An over-sampled film-dose- calibration dataset was collected as a reference (dose range: 0-800cGy), and various simulated conditions for outliers and sampling distributions were derived from the reference. Comparisons over the various conditions were made, and the performance of fitting functions, polynomial and rational functions, were evaluated. Results: (1) LOOP can prove its sensitive outlier-recognition by its statistical correlation to an exceptional better goodness-of-fit as outliers being left-out. (2) With sufficient statistical information, the LOOP can correct outliers under some low-sampling conditions that other “robust fits”, e.g. Least Absolute Residuals, cannot. (3) Complete cross-validated analyses of LOOP indicate that the function of rational type demonstrates a much superior performance compared to the polynomial. Even with 5 data points including one outlier, using LOOP with rational function can restore more than a 95% value back to its reference values, while the polynomial fitting completely failed under the same conditions. Conclusion: LOOP can cooperate with any fitting routine functioning as a “robust fit”. In addition, it can be set as a benchmark for film-dose calibration fitting performance.« less
L1 norm based common spatial patterns decomposition for scalp EEG BCI.

PubMed

Li, Peiyang; Xu, Peng; Zhang, Rui; Guo, Lanjin; Yao, Dezhong

2013-08-06

Brain computer interfaces (BCI) is one of the most popular branches in biomedical engineering. It aims at constructing a communication between the disabled persons and the auxiliary equipments in order to improve the patients' life. In motor imagery (MI) based BCI, one of the popular feature extraction strategies is Common Spatial Patterns (CSP). In practical BCI situation, scalp EEG inevitably has the outlier and artifacts introduced by ocular, head motion or the loose contact of electrodes in scalp EEG recordings. Because outlier and artifacts are usually observed with large amplitude, when CSP is solved in view of L2 norm, the effect of outlier and artifacts will be exaggerated due to the imposing of square to outliers, which will finally influence the MI based BCI performance. While L1 norm will lower the outlier effects as proved in other application fields like EEG inverse problem, face recognition, etc. In this paper, we present a new CSP implementation using the L1 norm technique, instead of the L2 norm, to solve the eigen problem for spatial filter estimation with aim to improve the robustness of CSP to outliers. To evaluate the performance of our method, we applied our method as well as the standard CSP and the regularized CSP with Tikhonov regularization (TR-CSP), on both the peer BCI dataset with simulated outliers and the dataset from the MI BCI system developed in our group. The McNemar test is used to investigate whether the difference among the three CSPs is of statistical significance. The results of both the simulation and real BCI datasets consistently reveal that the proposed method has much higher classification accuracies than the conventional CSP and the TR-CSP. By combining L1 norm based Eigen decomposition into Common Spatial Patterns, the proposed approach can effectively improve the robustness of BCI system to EEG outliers and thus be potential for the actual MI BCI application, where outliers are inevitably introduced into EEG recordings.
Single nucleotide polymorphisms unravel hierarchical divergence and signatures of selection among Alaskan sockeye salmon (Oncorhynchus nerka) populations.

PubMed

Gomez-Uchida, Daniel; Seeb, James E; Smith, Matt J; Habicht, Christopher; Quinn, Thomas P; Seeb, Lisa W

2011-02-18

Disentangling the roles of geography and ecology driving population divergence and distinguishing adaptive from neutral evolution at the molecular level have been common goals among evolutionary and conservation biologists. Using single nucleotide polymorphism (SNP) multilocus genotypes for 31 sockeye salmon (Oncorhynchus nerka) populations from the Kvichak River, Alaska, we assessed the relative roles of geography (discrete boundaries or continuous distance) and ecology (spawning habitat and timing) driving genetic divergence in this species at varying spatial scales within the drainage. We also evaluated two outlier detection methods to characterize candidate SNPs responding to environmental selection, emphasizing which mechanism(s) may maintain the genetic variation of outlier loci. For the entire drainage, Mantel tests suggested a greater role of geographic distance on population divergence than differences in spawn timing when each variable was correlated with pairwise genetic distances. Clustering and hierarchical analyses of molecular variance indicated that the largest genetic differentiation occurred between populations from distinct lakes or subdrainages. Within one population-rich lake, however, Mantel tests suggested a greater role of spawn timing than geographic distance on population divergence when each variable was correlated with pairwise genetic distances. Variable spawn timing among populations was linked to specific spawning habitats as revealed by principal coordinate analyses. We additionally identified two outlier SNPs located in the major histocompatibility complex (MHC) class II that appeared robust to violations of demographic assumptions from an initial pool of eight candidates for selection. First, our results suggest that geography and ecology have influenced genetic divergence between Alaskan sockeye salmon populations in a hierarchical manner depending on the spatial scale. Second, we found consistent evidence for diversifying selection in two loci located in the MHC class II by means of outlier detection methods; yet, alternative scenarios for the evolution of these loci were also evaluated. Both conclusions argue that historical contingency and contemporary adaptation have likely driven differentiation between Kvichak River sockeye salmon populations, as revealed by a suite of SNPs. Our findings highlight the need for conservation of complex population structure, because it provides resilience in the face of environmental change, both natural and anthropogenic.
Single nucleotide polymorphisms unravel hierarchical divergence and signatures of selection among Alaskan sockeye salmon (Oncorhynchus nerka) populations

PubMed Central

2011-01-01

Background Disentangling the roles of geography and ecology driving population divergence and distinguishing adaptive from neutral evolution at the molecular level have been common goals among evolutionary and conservation biologists. Using single nucleotide polymorphism (SNP) multilocus genotypes for 31 sockeye salmon (Oncorhynchus nerka) populations from the Kvichak River, Alaska, we assessed the relative roles of geography (discrete boundaries or continuous distance) and ecology (spawning habitat and timing) driving genetic divergence in this species at varying spatial scales within the drainage. We also evaluated two outlier detection methods to characterize candidate SNPs responding to environmental selection, emphasizing which mechanism(s) may maintain the genetic variation of outlier loci. Results For the entire drainage, Mantel tests suggested a greater role of geographic distance on population divergence than differences in spawn timing when each variable was correlated with pairwise genetic distances. Clustering and hierarchical analyses of molecular variance indicated that the largest genetic differentiation occurred between populations from distinct lakes or subdrainages. Within one population-rich lake, however, Mantel tests suggested a greater role of spawn timing than geographic distance on population divergence when each variable was correlated with pairwise genetic distances. Variable spawn timing among populations was linked to specific spawning habitats as revealed by principal coordinate analyses. We additionally identified two outlier SNPs located in the major histocompatibility complex (MHC) class II that appeared robust to violations of demographic assumptions from an initial pool of eight candidates for selection. Conclusions First, our results suggest that geography and ecology have influenced genetic divergence between Alaskan sockeye salmon populations in a hierarchical manner depending on the spatial scale. Second, we found consistent evidence for diversifying selection in two loci located in the MHC class II by means of outlier detection methods; yet, alternative scenarios for the evolution of these loci were also evaluated. Both conclusions argue that historical contingency and contemporary adaptation have likely driven differentiation between Kvichak River sockeye salmon populations, as revealed by a suite of SNPs. Our findings highlight the need for conservation of complex population structure, because it provides resilience in the face of environmental change, both natural and anthropogenic. PMID:21332997
Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs

PubMed Central

Freedman, Adam H.; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Davis, Brian W.; Gronau, Ilan; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Marques-Bonet, Tomas; Ostrander, Elaine A.; Wayne, Robert K.; Novembre, John

2016-01-01

Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers. PMID:26943675
The Utility of Robust Means in Statistics

ERIC Educational Resources Information Center

Goodwyn, Fara

2012-01-01

Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…
Pathway-based outlier method reveals heterogeneous genomic structure of autism in blood transcriptome

PubMed Central

2013-01-01

Background Decades of research strongly suggest that the genetic etiology of autism spectrum disorders (ASDs) is heterogeneous. However, most published studies focus on group differences between cases and controls. In contrast, we hypothesized that the heterogeneity of the disorder could be characterized by identifying pathways for which individuals are outliers rather than pathways representative of shared group differences of the ASD diagnosis. Methods Two previously published blood gene expression data sets – the Translational Genetics Research Institute (TGen) dataset (70 cases and 60 unrelated controls) and the Simons Simplex Consortium (Simons) dataset (221 probands and 191 unaffected family members) – were analyzed. All individuals of each dataset were projected to biological pathways, and each sample’s Mahalanobis distance from a pooled centroid was calculated to compare the number of case and control outliers for each pathway. Results Analysis of a set of blood gene expression profiles from 70 ASD and 60 unrelated controls revealed three pathways whose outliers were significantly overrepresented in the ASD cases: neuron development including axonogenesis and neurite development (29% of ASD, 3% of control), nitric oxide signaling (29%, 3%), and skeletal development (27%, 3%). Overall, 50% of cases and 8% of controls were outliers in one of these three pathways, which could not be identified using group comparison or gene-level outlier methods. In an independently collected data set consisting of 221 ASD and 191 unaffected family members, outliers in the neurogenesis pathway were heavily biased towards cases (20.8% of ASD, 12.0% of control). Interestingly, neurogenesis outliers were more common among unaffected family members (Simons) than unrelated controls (TGen), but the statistical significance of this effect was marginal (Chi squared P < 0.09). Conclusions Unlike group difference approaches, our analysis identified the samples within the case and control groups that manifested each expression signal, and showed that outlier groups were distinct for each implicated pathway. Moreover, our results suggest that by seeking heterogeneity, pathway-based outlier analysis can reveal expression signals that are not apparent when considering only shared group differences. PMID:24063311
A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

USGS Publications Warehouse

Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.

2013-01-01

he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
Outlier Detection for Patient Monitoring and Alerting

PubMed Central

Hauskrecht, Milos; Batal, Iyad; Valko, Michal; Visweswaran, Shyam; Cooper, Gregory F.; Clermont, Gilles

2012-01-01

We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management decisions using past patient cases stored in electronic health records (EHRs). Our hypothesis is that a patient-management decision that is unusual with respect to past patient care may be due to an error and that it is worthwhile to generate an alert if such a decision is encountered. We evaluate this hypothesis using data obtained from EHRs of 4,486 post-cardiac surgical patients and a subset of 222 alerts generated from the data. We base the evaluation on the opinions of a panel of experts. The results of the study support our hypothesis that the outlier-based alerting can lead to promising true alert rates. We observed true alert rates that ranged from 25% to 66% for a variety of patient-management actions, with 66% corresponding to the strongest outliers. PMID:22944172
A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

NASA Astrophysics Data System (ADS)

Cohn, T. A.; England, J. F.; Berenbrock, C. E.; Mason, R. R.; Stedinger, J. R.; Lamontagne, J. R.

2013-08-01

The Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as "less-than" values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
Adaptive population divergence and directional gene flow across steep elevational gradients in a climate‐sensitive mammal

USGS Publications Warehouse

Waterhouse, Matthew D.; Erb, Liesl P.; Beever, Erik; Russello, Michael A.

2018-01-01

The American pika is a thermally sensitive, alpine lagomorph species. Recent climate-associated population extirpations and genetic signatures of reduced population sizes range-wide indicate the viability of this species is sensitive to climate change. To test for potential adaptive responses to climate stress, we sampled pikas along two elevational gradients (each ~470 to 1640 m) and employed three outlier detection methods, BAYESCAN, LFMM, and BAYPASS, to scan for genotype-environment associations in samples genotyped at 30,763 SNP loci. We resolved 173 loci with robust evidence of natural selection detected by either two independent analyses or replicated in both transects. A BLASTN search of these outlier loci revealed several genes associated with metabolic function and oxygen transport, indicating natural selection from thermal stress and hypoxia. We also found evidence of directional gene flow primarily downslope from large high-elevation populations and reduced gene flow at outlier loci, a pattern suggesting potential impediments to the upward elevational movement of adaptive alleles in response to contemporary climate change. Finally, we documented evidence of reduced genetic diversity associated the south-facing transect and an increase in corticosterone stress levels associated with inbreeding. This study suggests the American pika is already undergoing climate-associated natural selection at multiple genomic regions. Further analysis is needed to determine if the rate of climate adaptation in the American pika and other thermally sensitive species will be able to keep pace with rapidly changing climate conditions.
Patient classification as an outlier detection problem: An application of the One-Class Support Vector Machine

PubMed Central

Mourão-Miranda, Janaina; Hardoon, David R.; Hahn, Tim; Marquand, Andre F.; Williams, Steve C.R.; Shawe-Taylor, John; Brammer, Michael

2011-01-01

Pattern recognition approaches, such as the Support Vector Machine (SVM), have been successfully used to classify groups of individuals based on their patterns of brain activity or structure. However these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In the present work we propose an application of the one-class SVM (OC-SVM) to investigate if patterns of fMRI response to sad facial expressions in depressed patients would be classified as outliers in relation to patterns of healthy control subjects. We defined features based on whole brain voxels and anatomical regions. In both cases we found a significant correlation between the OC-SVM predictions and the patients' Hamilton Rating Scale for Depression (HRSD), i.e. the more depressed the patients were the more of an outlier they were. In addition the OC-SVM split the patient groups into two subgroups whose membership was associated with future response to treatment. When applied to region-based features the OC-SVM classified 52% of patients as outliers. However among the patients classified as outliers 70% did not respond to treatment and among those classified as non-outliers 89% responded to treatment. In addition 89% of the healthy controls were classified as non-outliers. PMID:21723950
Outlier Analysis Defines Zinc Finger Gene Family DNA Methylation in Tumors and Saliva of Head and Neck Cancer Patients.

PubMed

Gaykalova, Daria A; Vatapalli, Rajita; Wei, Yingying; Tsai, Hua-Ling; Wang, Hao; Zhang, Chi; Hennessey, Patrick T; Guo, Theresa; Tan, Marietta; Li, Ryan; Ahn, Julie; Khan, Zubair; Westra, William H; Bishop, Justin A; Zaboli, David; Koch, Wayne M; Khan, Tanbir; Ochs, Michael F; Califano, Joseph A

2015-01-01

Head and Neck Squamous Cell Carcinoma (HNSCC) is the fifth most common cancer, annually affecting over half a million people worldwide. Presently, there are no accepted biomarkers for clinical detection and surveillance of HNSCC. In this work, a comprehensive genome-wide analysis of epigenetic alterations in primary HNSCC tumors was employed in conjunction with cancer-specific outlier statistics to define novel biomarker genes which are differentially methylated in HNSCC. The 37 identified biomarker candidates were top-scoring outlier genes with prominent differential methylation in tumors, but with no signal in normal tissues. These putative candidates were validated in independent HNSCC cohorts from our institution and TCGA (The Cancer Genome Atlas). Using the top candidates, ZNF14, ZNF160, and ZNF420, an assay was developed for detection of HNSCC cancer in primary tissue and saliva samples with 100% specificity when compared to normal control samples. Given the high detection specificity, the analysis of ZNF DNA methylation in combination with other DNA methylation biomarkers may be useful in the clinical setting for HNSCC detection and surveillance, particularly in high-risk patients. Several additional candidates identified through this work can be further investigated toward future development of a multi-gene panel of biomarkers for the surveillance and detection of HNSCC.
GraphPrints: Towards a Graph Analytic Method for Network Anomaly Detection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harshaw, Chris R; Bridges, Robert A; Iannacone, Michael D

This paper introduces a novel graph-analytic approach for detecting anomalies in network flow data called \\textit{GraphPrints}. Building on foundational network-mining techniques, our method represents time slices of traffic as a graph, then counts graphlets\\textemdash small induced subgraphs that describe local topology. By performing outlier detection on the sequence of graphlet counts, anomalous intervals of traffic are identified, and furthermore, individual IPs experiencing abnormal behavior are singled-out. Initial testing of GraphPrints is performed on real network data with an implanted anomaly. Evaluation shows false positive rates bounded by 2.84\\% at the time-interval level, and 0.05\\% at the IP-level with 100\\% truemore » positive rates at both.« less
Lesion identification using unified segmentation-normalisation models and fuzzy clustering

PubMed Central

Seghier, Mohamed L.; Ramlackhansingh, Anil; Crinion, Jenny; Leff, Alexander P.; Price, Cathy J.

2008-01-01

In this paper, we propose a new automated procedure for lesion identification from single images based on the detection of outlier voxels. We demonstrate the utility of this procedure using artificial and real lesions. The scheme rests on two innovations: First, we augment the generative model used for combined segmentation and normalization of images, with an empirical prior for an atypical tissue class, which can be optimised iteratively. Second, we adopt a fuzzy clustering procedure to identify outlier voxels in normalised gray and white matter segments. These two advances suppress misclassification of voxels and restrict lesion identification to gray/white matter lesions respectively. Our analyses show a high sensitivity for detecting and delineating brain lesions with different sizes, locations, and textures. Our approach has important implications for the generation of lesion overlap maps of a given population and the assessment of lesion-deficit mappings. From a clinical perspective, our method should help to compute the total volume of lesion or to trace precisely lesion boundaries that might be pertinent for surgical or diagnostic purposes. PMID:18482850
ROKU: a novel method for identification of tissue-specific genes.

PubMed

Kadota, Koji; Ye, Jiazhen; Nakai, Yuji; Terada, Tohru; Shimizu, Kentaro

2006-06-12

One of the important goals of microarray research is the identification of genes whose expression is considerably higher or lower in some tissues than in others. We would like to have ways of identifying such tissue-specific genes. We describe a method, ROKU, which selects tissue-specific patterns from gene expression data for many tissues and thousands of genes. ROKU ranks genes according to their overall tissue specificity using Shannon entropy and detects tissues specific to each gene if any exist using an outlier detection method. We evaluated the capacity for the detection of various specific expression patterns using synthetic and real data. We observed that ROKU was superior to a conventional entropy-based method in its ability to rank genes according to overall tissue specificity and to detect genes whose expression pattern are specific only to objective tissues. ROKU is useful for the detection of various tissue-specific expression patterns. The framework is also directly applicable to the selection of diagnostic markers for molecular classification of multiple classes.
Enhancement Strategies for Frame-To Uas Stereo Visual Odometry

NASA Astrophysics Data System (ADS)

Kersten, J.; Rodehorst, V.

2016-06-01

Autonomous navigation of indoor unmanned aircraft systems (UAS) requires accurate pose estimations usually obtained from indirect measurements. Navigation based on inertial measurement units (IMU) is known to be affected by high drift rates. The incorporation of cameras provides complementary information due to the different underlying measurement principle. The scale ambiguity problem for monocular cameras is avoided when a light-weight stereo camera setup is used. However, also frame-to-frame stereo visual odometry (VO) approaches are known to accumulate pose estimation errors over time. Several valuable real-time capable techniques for outlier detection and drift reduction in frame-to-frame VO, for example robust relative orientation estimation using random sample consensus (RANSAC) and bundle adjustment, are available. This study addresses the problem of choosing appropriate VO components. We propose a frame-to-frame stereo VO method based on carefully selected components and parameters. This method is evaluated regarding the impact and value of different outlier detection and drift-reduction strategies, for example keyframe selection and sparse bundle adjustment (SBA), using reference benchmark data as well as own real stereo data. The experimental results demonstrate that our VO method is able to estimate quite accurate trajectories. Feature bucketing and keyframe selection are simple but effective strategies which further improve the VO results. Furthermore, introducing the stereo baseline constraint in pose graph optimization (PGO) leads to significant improvements.
Robustly Aligning a Shape Model and Its Application to Car Alignment of Unknown Pose.

PubMed

Li, Yan; Gu, Leon; Kanade, Takeo

2011-09-01

Precisely localizing in an image a set of feature points that form a shape of an object, such as car or face, is called alignment. Previous shape alignment methods attempted to fit a whole shape model to the observed data, based on the assumption of Gaussian observation noise and the associated regularization process. However, such an approach, though able to deal with Gaussian noise in feature detection, turns out not to be robust or precise because it is vulnerable to gross feature detection errors or outliers resulting from partial occlusions or spurious features from the background or neighboring objects. We address this problem by adopting a randomized hypothesis-and-test approach. First, a Bayesian inference algorithm is developed to generate a shape-and-pose hypothesis of the object from a partial shape or a subset of feature points. For alignment, a large number of hypotheses are generated by randomly sampling subsets of feature points, and then evaluated to find the one that minimizes the shape prediction error. This method of randomized subset-based matching can effectively handle outliers and recover the correct object shape. We apply this approach on a challenging data set of over 5,000 different-posed car images, spanning a wide variety of car types, lighting, background scenes, and partial occlusions. Experimental results demonstrate favorable improvements over previous methods on both accuracy and robustness.
The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

NASA Astrophysics Data System (ADS)

Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

2014-09-01

In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

The stopping rules for winsorized tree

NASA Astrophysics Data System (ADS)

Ch'ng, Chee Keong; Mahat, Nor Idayu

2017-11-01

Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree. It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier. If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met. The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting. The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain.
Genome scans for divergent selection in natural populations of the widespread hardwood species Eucalyptus grandis (Myrtaceae) using microsatellites

PubMed Central

Song, Zhijiao; Zhang, Miaomiao; Li, Fagen; Weng, Qijie; Zhou, Chanpin; Li, Mei; Li, Jie; Huang, Huanhua; Mo, Xiaoyong; Gan, Siming

2016-01-01

Identification of loci or genes under natural selection is important for both understanding the genetic basis of local adaptation and practical applications, and genome scans provide a powerful means for such identification purposes. In this study, genome-wide simple sequence repeats markers (SSRs) were used to scan for molecular footprints of divergent selection in Eucalyptus grandis, a hardwood species occurring widely in costal areas from 32° S to 16° S in Australia. High population diversity levels and weak population structure were detected with putatively neutral genomic SSRs. Using three FST outlier detection methods, a total of 58 outlying SSRs were collectively identified as loci under divergent selection against three non-correlated climatic variables, namely, mean annual temperature, isothermality and annual precipitation. Using a spatial analysis method, nine significant associations were revealed between FST outlier allele frequencies and climatic variables, involving seven alleles from five SSR loci. Of the five significant SSRs, two (EUCeSSR1044 and Embra394) contained alleles of putative genes with known functional importance for response to climatic factors. Our study presents critical information on the population diversity and structure of the important woody species E. grandis and provides insight into the adaptive responses of perennial trees to climatic variations. PMID:27748400
A Simulation-Based Comparison of Several Stochastic Linear Regression Methods in the Presence of Outliers.

ERIC Educational Resources Information Center

Rule, David L.

Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…
A Hybrid One-Way ANOVA Approach for the Robust and Efficient Estimation of Differential Gene Expression with Multiple Patterns

PubMed Central

Mollah, Mohammad Manir Hossain; Jamal, Rahman; Mokhtar, Norfilza Mohd; Harun, Roslan; Mollah, Md. Nurul Haque

2015-01-01

Background Identifying genes that are differentially expressed (DE) between two or more conditions with multiple patterns of expression is one of the primary objectives of gene expression data analysis. Several statistical approaches, including one-way analysis of variance (ANOVA), are used to identify DE genes. However, most of these methods provide misleading results for two or more conditions with multiple patterns of expression in the presence of outlying genes. In this paper, an attempt is made to develop a hybrid one-way ANOVA approach that unifies the robustness and efficiency of estimation using the minimum β-divergence method to overcome some problems that arise in the existing robust methods for both small- and large-sample cases with multiple patterns of expression. Results The proposed method relies on a β-weight function, which produces values between 0 and 1. The β-weight function with β = 0.2 is used as a measure of outlier detection. It assigns smaller weights (≥ 0) to outlying expressions and larger weights (≤ 1) to typical expressions. The distribution of the β-weights is used to calculate the cut-off point, which is compared to the observed β-weight of an expression to determine whether that gene expression is an outlier. This weight function plays a key role in unifying the robustness and efficiency of estimation in one-way ANOVA. Conclusion Analyses of simulated gene expression profiles revealed that all eight methods (ANOVA, SAM, LIMMA, EBarrays, eLNN, KW, robust BetaEB and proposed) perform almost identically for m = 2 conditions in the absence of outliers. However, the robust BetaEB method and the proposed method exhibited considerably better performance than the other six methods in the presence of outliers. In this case, the BetaEB method exhibited slightly better performance than the proposed method for the small-sample cases, but the the proposed method exhibited much better performance than the BetaEB method for both the small- and large-sample cases in the presence of more than 50% outlying genes. The proposed method also exhibited better performance than the other methods for m > 2 conditions with multiple patterns of expression, where the BetaEB was not extended for this condition. Therefore, the proposed approach would be more suitable and reliable on average for the identification of DE genes between two or more conditions with multiple patterns of expression. PMID:26413858
Hybrid online sensor error detection and functional redundancy for systems with time-varying parameters.

PubMed

Feng, Jianyuan; Turksoy, Kamuran; Samadi, Sediqeh; Hajizadeh, Iman; Littlejohn, Elizabeth; Cinar, Ali

2017-12-01

Supervision and control systems rely on signals from sensors to receive information to monitor the operation of a system and adjust manipulated variables to achieve the control objective. However, sensor performance is often limited by their working conditions and sensors may also be subjected to interference by other devices. Many different types of sensor errors such as outliers, missing values, drifts and corruption with noise may occur during process operation. A hybrid online sensor error detection and functional redundancy system is developed to detect errors in online signals, and replace erroneous or missing values detected with model-based estimates. The proposed hybrid system relies on two techniques, an outlier-robust Kalman filter (ORKF) and a locally-weighted partial least squares (LW-PLS) regression model, which leverage the advantages of automatic measurement error elimination with ORKF and data-driven prediction with LW-PLS. The system includes a nominal angle analysis (NAA) method to distinguish between signal faults and large changes in sensor values caused by real dynamic changes in process operation. The performance of the system is illustrated with clinical data continuous glucose monitoring (CGM) sensors from people with type 1 diabetes. More than 50,000 CGM sensor errors were added to original CGM signals from 25 clinical experiments, then the performance of error detection and functional redundancy algorithms were analyzed. The results indicate that the proposed system can successfully detect most of the erroneous signals and substitute them with reasonable estimated values computed by functional redundancy system.
A Robust Method for Ego-Motion Estimation in Urban Environment Using Stereo Camera.

PubMed

Ci, Wenyan; Huang, Yingping

2016-10-17

Visual odometry estimates the ego-motion of an agent (e.g., vehicle and robot) using image information and is a key component for autonomous vehicles and robotics. This paper proposes a robust and precise method for estimating the 6-DoF ego-motion, using a stereo rig with optical flow analysis. An objective function fitted with a set of feature points is created by establishing the mathematical relationship between optical flow, depth and camera ego-motion parameters through the camera's 3-dimensional motion and planar imaging model. Accordingly, the six motion parameters are computed by minimizing the objective function, using the iterative Levenberg-Marquard method. One of key points for visual odometry is that the feature points selected for the computation should contain inliers as much as possible. In this work, the feature points and their optical flows are initially detected by using the Kanade-Lucas-Tomasi (KLT) algorithm. A circle matching is followed to remove the outliers caused by the mismatching of the KLT algorithm. A space position constraint is imposed to filter out the moving points from the point set detected by the KLT algorithm. The Random Sample Consensus (RANSAC) algorithm is employed to further refine the feature point set, i.e., to eliminate the effects of outliers. The remaining points are tracked to estimate the ego-motion parameters in the subsequent frames. The approach presented here is tested on real traffic videos and the results prove the robustness and precision of the method.
A Robust Method for Ego-Motion Estimation in Urban Environment Using Stereo Camera

PubMed Central

Ci, Wenyan; Huang, Yingping

2016-01-01

Visual odometry estimates the ego-motion of an agent (e.g., vehicle and robot) using image information and is a key component for autonomous vehicles and robotics. This paper proposes a robust and precise method for estimating the 6-DoF ego-motion, using a stereo rig with optical flow analysis. An objective function fitted with a set of feature points is created by establishing the mathematical relationship between optical flow, depth and camera ego-motion parameters through the camera’s 3-dimensional motion and planar imaging model. Accordingly, the six motion parameters are computed by minimizing the objective function, using the iterative Levenberg–Marquard method. One of key points for visual odometry is that the feature points selected for the computation should contain inliers as much as possible. In this work, the feature points and their optical flows are initially detected by using the Kanade–Lucas–Tomasi (KLT) algorithm. A circle matching is followed to remove the outliers caused by the mismatching of the KLT algorithm. A space position constraint is imposed to filter out the moving points from the point set detected by the KLT algorithm. The Random Sample Consensus (RANSAC) algorithm is employed to further refine the feature point set, i.e., to eliminate the effects of outliers. The remaining points are tracked to estimate the ego-motion parameters in the subsequent frames. The approach presented here is tested on real traffic videos and the results prove the robustness and precision of the method. PMID:27763508
Robust and Adaptive Online Time Series Prediction with Long Short-Term Memory

PubMed Central

Tao, Qing

2017-01-01

Online time series prediction is the mainstream method in a wide range of fields, ranging from speech analysis and noise cancelation to stock market analysis. However, the data often contains many outliers with the increasing length of time series in real world. These outliers can mislead the learned model if treated as normal points in the process of prediction. To address this issue, in this paper, we propose a robust and adaptive online gradient learning method, RoAdam (Robust Adam), for long short-term memory (LSTM) to predict time series with outliers. This method tunes the learning rate of the stochastic gradient algorithm adaptively in the process of prediction, which reduces the adverse effect of outliers. It tracks the relative prediction error of the loss function with a weighted average through modifying Adam, a popular stochastic gradient method algorithm for training deep neural networks. In our algorithm, the large value of the relative prediction error corresponds to a small learning rate, and vice versa. The experiments on both synthetic data and real time series show that our method achieves better performance compared to the existing methods based on LSTM. PMID:29391864
Robust and Adaptive Online Time Series Prediction with Long Short-Term Memory.

PubMed

Yang, Haimin; Pan, Zhisong; Tao, Qing

2017-01-01

Online time series prediction is the mainstream method in a wide range of fields, ranging from speech analysis and noise cancelation to stock market analysis. However, the data often contains many outliers with the increasing length of time series in real world. These outliers can mislead the learned model if treated as normal points in the process of prediction. To address this issue, in this paper, we propose a robust and adaptive online gradient learning method, RoAdam (Robust Adam), for long short-term memory (LSTM) to predict time series with outliers. This method tunes the learning rate of the stochastic gradient algorithm adaptively in the process of prediction, which reduces the adverse effect of outliers. It tracks the relative prediction error of the loss function with a weighted average through modifying Adam, a popular stochastic gradient method algorithm for training deep neural networks. In our algorithm, the large value of the relative prediction error corresponds to a small learning rate, and vice versa. The experiments on both synthetic data and real time series show that our method achieves better performance compared to the existing methods based on LSTM.
Online gesture spotting from visual hull data.

PubMed

Peng, Bo; Qian, Gang

2011-06-01

This paper presents a robust framework for online full-body gesture spotting from visual hull data. Using view-invariant pose features as observations, hidden Markov models (HMMs) are trained for gesture spotting from continuous movement data streams. Two major contributions of this paper are 1) view-invariant pose feature extraction from visual hulls, and 2) a systematic approach to automatically detecting and modeling specific nongesture movement patterns and using their HMMs for outlier rejection in gesture spotting. The experimental results have shown the view-invariance property of the proposed pose features for both training poses and new poses unseen in training, as well as the efficacy of using specific nongesture models for outlier rejection. Using the IXMAS gesture data set, the proposed framework has been extensively tested and the gesture spotting results are superior to those reported on the same data set obtained using existing state-of-the-art gesture spotting methods.
Evaluation of an automated safety surveillance system using risk adjusted sequential probability ratio testing.

PubMed

Matheny, Michael E; Normand, Sharon-Lise T; Gross, Thomas P; Marinac-Dabic, Danica; Loyo-Berrios, Nilsa; Vidi, Venkatesan D; Donnelly, Sharon; Resnic, Frederic S

2011-12-14

Automated adverse outcome surveillance tools and methods have potential utility in quality improvement and medical product surveillance activities. Their use for assessing hospital performance on the basis of patient outcomes has received little attention. We compared risk-adjusted sequential probability ratio testing (RA-SPRT) implemented in an automated tool to Massachusetts public reports of 30-day mortality after isolated coronary artery bypass graft surgery. A total of 23,020 isolated adult coronary artery bypass surgery admissions performed in Massachusetts hospitals between January 1, 2002 and September 30, 2007 were retrospectively re-evaluated. The RA-SPRT method was implemented within an automated surveillance tool to identify hospital outliers in yearly increments. We used an overall type I error rate of 0.05, an overall type II error rate of 0.10, and a threshold that signaled if the odds of dying 30-days after surgery was at least twice than expected. Annual hospital outlier status, based on the state-reported classification, was considered the gold standard. An event was defined as at least one occurrence of a higher-than-expected hospital mortality rate during a given year. We examined a total of 83 hospital-year observations. The RA-SPRT method alerted 6 events among three hospitals for 30-day mortality compared with 5 events among two hospitals using the state public reports, yielding a sensitivity of 100% (5/5) and specificity of 98.8% (79/80). The automated RA-SPRT method performed well, detecting all of the true institutional outliers with a small false positive alerting rate. Such a system could provide confidential automated notification to local institutions in advance of public reporting providing opportunities for earlier quality improvement interventions.
Smartphone-Based Indoor Localization with Bluetooth Low Energy Beacons

PubMed Central

Zhuang, Yuan; Yang, Jun; Li, You; Qi, Longning; El-Sheimy, Naser

2016-01-01

Indoor wireless localization using Bluetooth Low Energy (BLE) beacons has attracted considerable attention after the release of the BLE protocol. In this paper, we propose an algorithm that uses the combination of channel-separate polynomial regression model (PRM), channel-separate fingerprinting (FP), outlier detection and extended Kalman filtering (EKF) for smartphone-based indoor localization with BLE beacons. The proposed algorithm uses FP and PRM to estimate the target’s location and the distances between the target and BLE beacons respectively. We compare the performance of distance estimation that uses separate PRM for three advertisement channels (i.e., the separate strategy) with that use an aggregate PRM generated through the combination of information from all channels (i.e., the aggregate strategy). The performance of FP-based location estimation results of the separate strategy and the aggregate strategy are also compared. It was found that the separate strategy can provide higher accuracy; thus, it is preferred to adopt PRM and FP for each BLE advertisement channel separately. Furthermore, to enhance the robustness of the algorithm, a two-level outlier detection mechanism is designed. Distance and location estimates obtained from PRM and FP are passed to the first outlier detection to generate improved distance estimates for the EKF. After the EKF process, the second outlier detection algorithm based on statistical testing is further performed to remove the outliers. The proposed algorithm was evaluated by various field experiments. Results show that the proposed algorithm achieved the accuracy of <2.56 m at 90% of the time with dense deployment of BLE beacons (1 beacon per 9 m), which performs 35.82% better than <3.99 m from the Propagation Model (PM) + EKF algorithm and 15.77% more accurate than <3.04 m from the FP + EKF algorithm. With sparse deployment (1 beacon per 18 m), the proposed algorithm achieves the accuracies of <3.88 m at 90% of the time, which performs 49.58% more accurate than <8.00 m from the PM + EKF algorithm and 21.41% better than <4.94 m from the FP + EKF algorithm. Therefore, the proposed algorithm is especially useful to improve the localization accuracy in environments with sparse beacon deployment. PMID:27128917
Smartphone-Based Indoor Localization with Bluetooth Low Energy Beacons.

PubMed

Zhuang, Yuan; Yang, Jun; Li, You; Qi, Longning; El-Sheimy, Naser

2016-04-26

Indoor wireless localization using Bluetooth Low Energy (BLE) beacons has attracted considerable attention after the release of the BLE protocol. In this paper, we propose an algorithm that uses the combination of channel-separate polynomial regression model (PRM), channel-separate fingerprinting (FP), outlier detection and extended Kalman filtering (EKF) for smartphone-based indoor localization with BLE beacons. The proposed algorithm uses FP and PRM to estimate the target's location and the distances between the target and BLE beacons respectively. We compare the performance of distance estimation that uses separate PRM for three advertisement channels (i.e., the separate strategy) with that use an aggregate PRM generated through the combination of information from all channels (i.e., the aggregate strategy). The performance of FP-based location estimation results of the separate strategy and the aggregate strategy are also compared. It was found that the separate strategy can provide higher accuracy; thus, it is preferred to adopt PRM and FP for each BLE advertisement channel separately. Furthermore, to enhance the robustness of the algorithm, a two-level outlier detection mechanism is designed. Distance and location estimates obtained from PRM and FP are passed to the first outlier detection to generate improved distance estimates for the EKF. After the EKF process, the second outlier detection algorithm based on statistical testing is further performed to remove the outliers. The proposed algorithm was evaluated by various field experiments. Results show that the proposed algorithm achieved the accuracy of <2.56 m at 90% of the time with dense deployment of BLE beacons (1 beacon per 9 m), which performs 35.82% better than <3.99 m from the Propagation Model (PM) + EKF algorithm and 15.77% more accurate than <3.04 m from the FP + EKF algorithm. With sparse deployment (1 beacon per 18 m), the proposed algorithm achieves the accuracies of <3.88 m at 90% of the time, which performs 49.58% more accurate than <8.00 m from the PM + EKF algorithm and 21.41% better than <4.94 m from the FP + EKF algorithm. Therefore, the proposed algorithm is especially useful to improve the localization accuracy in environments with sparse beacon deployment.
Outlier Removal and the Relation with Reporting Errors and Quality of Psychological Research

PubMed Central

Bakker, Marjan; Wicherts, Jelte M.

2014-01-01

Background The removal of outliers to acquire a significant result is a questionable research practice that appears to be commonly used in psychology. In this study, we investigated whether the removal of outliers in psychology papers is related to weaker evidence (against the null hypothesis of no effect), a higher prevalence of reporting errors, and smaller sample sizes in these papers compared to papers in the same journals that did not report the exclusion of outliers from the analyses. Methods and Findings We retrieved a total of 2667 statistical results of null hypothesis significance tests from 153 articles in main psychology journals, and compared results from articles in which outliers were removed (N = 92) with results from articles that reported no exclusion of outliers (N = 61). We preregistered our hypotheses and methods and analyzed the data at the level of articles. Results show no significant difference between the two types of articles in median p value, sample sizes, or prevalence of all reporting errors, large reporting errors, and reporting errors that concerned the statistical significance. However, we did find a discrepancy between the reported degrees of freedom of t tests and the reported sample size in 41% of articles that did not report removal of any data values. This suggests common failure to report data exclusions (or missingness) in psychological articles. Conclusions We failed to find that the removal of outliers from the analysis in psychological articles was related to weaker evidence (against the null hypothesis of no effect), sample size, or the prevalence of errors. However, our control sample might be contaminated due to nondisclosure of excluded values in articles that did not report exclusion of outliers. Results therefore highlight the importance of more transparent reporting of statistical analyses. PMID:25072606
Development of a computerized monitoring program to identify narcotic diversion in a pediatric anesthesia practice.

PubMed

Brenn, B Randall; Kim, Margaret A; Hilmas, Elora

2015-08-15

Development of an operational reporting dashboard designed to correlate data from multiple sources to help detect potential drug diversion by automated dispensing cabinet (ADC) users is described. A commercial business intelligence platform was used to create a dashboard tool for rapid detection of unusual patterns of ADC transactions by anesthesia service providers at a large pediatric hospital. By linking information from the hospital's pharmacy information management system (PIMS) and anesthesia information management system (AIMS) in an associative data model, the "narcotic reconciliation dashboard" can generate various reports to help spot outlier activity associated with ADC dispensing of controlled substances and documentation of medication waste processing. The dashboard's utility was evaluated by "back-testing" the program with historical data on an actual episode of diversion by an anesthesia provider that had not been detected through traditional methods of PIMS and AIMS data monitoring. Dashboard-generated reports on key metrics (e.g., ADC transaction counts, discrepancies in dispensed versus reconciled amounts of narcotics, PIMS-AIMS documentation mismatches) over various time frames during the period of known diversion clearly indicated the diverter's outlier status relative to other authorized ADC users. A dashboard program for correlating ADC transaction data with pharmacy and patient care data may be an effective tool for detecting patterns of ADC use that suggest drug diversion. Copyright © 2015 by the American Society of Health-System Pharmacists, Inc. All rights reserved.
Simulation of a Geiger-Mode Imaging LADAR System for Performance Assessment

PubMed Central

Kim, Seongjoon; Lee, Impyeong; Kwon, Yong Joon

2013-01-01

As LADAR systems applications gradually become more diverse, new types of systems are being developed. When developing new systems, simulation studies are an essential prerequisite. A simulator enables performance predictions and optimal system parameters at the design level, as well as providing sample data for developing and validating application algorithms. The purpose of the study is to propose a method for simulating a Geiger-mode imaging LADAR system. We develop simulation software to assess system performance and generate sample data for the applications. The simulation is based on three aspects of modeling—the geometry, radiometry and detection. The geometric model computes the ranges to the reflection points of the laser pulses. The radiometric model generates the return signals, including the noises. The detection model determines the flight times of the laser pulses based on the nature of the Geiger-mode detector. We generated sample data using the simulator with the system parameters and analyzed the detection performance by comparing the simulated points to the reference points. The proportion of the outliers in the simulated points reached 25.53%, indicating the need for efficient outlier elimination algorithms. In addition, the false alarm rate and dropout rate of the designed system were computed as 1.76% and 1.06%, respectively. PMID:23823970
Adaptive population divergence and directional gene flow across steep elevational gradients in a climate-sensitive mammal.

PubMed

Waterhouse, Matthew D; Erb, Liesl P; Beever, Erik A; Russello, Michael A

2018-06-01

The ecological effects of climate change have been shown in most major taxonomic groups; however, the evolutionary consequences are less well-documented. Adaptation to new climatic conditions offers a potential long-term mechanism for species to maintain viability in rapidly changing environments, but mammalian examples remain scarce. The American pika (Ochotona princeps) has been impacted by recent climate-associated extirpations and range-wide reductions in population sizes, establishing it as a sentinel mammalian species for climate change. To investigate evidence for local adaptation and reconstruct patterns of genomic diversity and gene flow across rapidly changing environments, we used a space-for-time design and restriction site-associated DNA sequencing to genotype American pikas along two steep elevational gradients at 30,966 SNPs and employed independent outlier detection methods that scanned for genotype-environment associations. We identified 338 outlier SNPs detected by two separate analyses and/or replicated in both transects, several of which were annotated to genes involved in metabolic function and oxygen transport. Additionally, we found evidence of directional gene flow primarily downslope from high-elevation populations, along with reduced gene flow at outlier loci. If this trend continues, elevational range contractions in American pikas will likely be from local extirpation rather than upward movement of low-elevation individuals; this, in turn, could limit the potential for adaptation within this landscape. These findings are of particular relevance for future conservation and management of American pikas and other elevationally restricted, thermally sensitive species. © 2018 John Wiley & Sons Ltd.
Nonlinear optimization-based device-free localization with outlier link rejection.

PubMed

Xiao, Wendong; Song, Biao; Yu, Xiting; Chen, Peiyuan

2015-04-07

Device-free localization (DFL) is an emerging wireless technique for estimating the location of target that does not have any attached electronic device. It has found extensive use in Smart City applications such as healthcare at home and hospitals, location-based services at smart spaces, city emergency response and infrastructure security. In DFL, wireless devices are used as sensors that can sense the target by transmitting and receiving wireless signals collaboratively. Many DFL systems are implemented based on received signal strength (RSS) measurements and the location of the target is estimated by detecting the changes of the RSS measurements of the wireless links. Due to the uncertainty of the wireless channel, certain links may be seriously polluted and result in erroneous detection. In this paper, we propose a novel nonlinear optimization approach with outlier link rejection (NOOLR) for RSS-based DFL. It consists of three key strategies, including: (1) affected link identification by differential RSS detection; (2) outlier link rejection via geometrical positional relationship among links; (3) target location estimation by formulating and solving a nonlinear optimization problem. Experimental results demonstrate that NOOLR is robust to the fluctuation of the wireless signals with superior localization accuracy compared with the existing Radio Tomographic Imaging (RTI) approach.
SNP mining in Crassostrea gigas EST data: transferability to four other Crassostrea species, phylogenetic inferences and outlier SNPs under selection.

PubMed

Zhong, Xiaoxiao; Li, Qi; Yu, Hong; Kong, Lingfeng

2014-01-01

Oysters, with high levels of phenotypic plasticity and wide geographic distribution, are a challenging group for taxonomists and phylogenetics. Our study is intended to generate new EST-SNP markers and to evaluate their potential for cross-species utilization in phylogenetic study of the genus Crassostrea. In the study, 57 novel SNPs were developed from an EST database of C. gigas by the HRM (high-resolution melting) method. Transferability of 377 SNPs developed for C. gigas was examined on four other Crassostrea species: C. sikamea, C. angulata, C. hongkongensis and C. ariakensis. Among the 377 primer pairs tested, 311 (82.5%) primers showed amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 (67.4%) in C. hongkongensis and 253 (67.1%) in C. ariakensis. A total of 214 SNPs were found to be transferable to all four species. Phylogenetic analyses showed that C. hongkongensis was a sister species of C. ariakensis and that this clade was sister to the clade containing C. sikamea, C. angulata and C. gigas. Within this clade, C. gigas and C. angulata had the closest relationship, with C. sikamea being the sister group. In addition, we detected eight SNPs as potentially being under selection by two outlier tests (fdist and hierarchical methods). The SNPs studied here should be useful for genetic diversity, comparative mapping and phylogenetic studies across species in Crassostrea and the candidate outlier SNPs are worth exploring in more detail regarding association genetics and functional studies.
Comparison of cardiac TnI outliers using a contemporary and a high-sensitivity assay on the Abbott Architect platform.

PubMed

Ryan, J B; Southby, S J; Stuart, L A; Mackay, R; Florkowski, C M; George, P M

2014-07-01

Assays for cardiac troponin (cTn) have undergone improvements in sensitivity and precision in recent years. Increased rates of outliers, however, have been reported on various cTn platforms, typically giving irreproducible, falsely higher results. We aimed to evaluate the outlier rate occurring in patients with elevated cTnI using a contemporary and high-sensitivity assay. All patients with elevated cTnI (up to 300 ng/L) performed over a 21-month period were assayed in duplicate. A contemporary assay (Abbott STAT Troponin-I) was used for the first part of the study and subsequently a high-sensitivity assay (Abbott STAT High-Sensitive Troponin-I) was used. Outliers exceeded a calculated critical difference (CD) (CD = z × √2 × SDAnalytical) where z = 3.5 (for probability of 0.0005) and critical outliers also were on a different side of the decision level. The respective outlier and critical outlier rates were 0.22% and 0.10% for the contemporary assay (n = 4009) and 0.18% and 0.13% for the high-sensitivity assay (n = 3878). There was no significant reduction in outlier rate between the two assays (χ(2) = 0.034, P = 0.854). Fifty-six percent of outliers occurred in samples where cTn was an 'add-on' test (and was stored and refrigerated prior to assay). Despite recent improvements in cTn methods, outliers (including critical outliers) still occur at a low rate in both a contemporary and high-sensitivity cTnI assay. Laboratory and clinical staff should be aware of this potential analytical error, particularly in samples with suboptimal sample handling such as add-on tests. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms

PubMed Central

2014-01-01

On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.

PubMed

Hu, Yi-Chung

2014-01-01

On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
Outlier analyses to test for local adaptation to breeding grounds in a migratory arctic seabird.

PubMed

Tigano, Anna; Shultz, Allison J; Edwards, Scott V; Robertson, Gregory J; Friesen, Vicki L

2017-04-01

Investigating the extent (or the existence) of local adaptation is crucial to understanding how populations adapt. When experiments or fitness measurements are difficult or impossible to perform in natural populations, genomic techniques allow us to investigate local adaptation through the comparison of allele frequencies and outlier loci along environmental clines. The thick-billed murre ( Uria lomvia ) is a highly philopatric colonial arctic seabird that occupies a significant environmental gradient, shows marked phenotypic differences among colonies, and has large effective population sizes. To test whether thick-billed murres from five colonies along the eastern Canadian Arctic coast show genomic signatures of local adaptation to their breeding grounds, we analyzed geographic variation in genome-wide markers mapped to a newly assembled thick-billed murre reference genome. We used outlier analyses to detect loci putatively under selection, and clustering analyses to investigate patterns of differentiation based on 2220 genomewide single nucleotide polymorphisms (SNPs) and 137 outlier SNPs. We found no evidence of population structure among colonies using all loci but found population structure based on outliers only, where birds from the two northernmost colonies (Minarets and Prince Leopold) grouped with birds from the southernmost colony (Gannet), and birds from Coats and Akpatok were distinct from all other colonies. Although results from our analyses did not support local adaptation along the latitudinal cline of breeding colonies, outlier loci grouped birds from different colonies according to their non-breeding distributions, suggesting that outliers may be informative about adaptation and/or demographic connectivity associated with their migration patterns or nonbreeding grounds.
Investigation of IRT-Based Equating Methods in the Presence of Outlier Common Items

ERIC Educational Resources Information Center

Hu, Huiqin; Rogers, W. Todd; Vukmirovic, Zarko

2008-01-01

Common items with inconsistent b-parameter estimates may have a serious impact on item response theory (IRT)--based equating results. To find a better way to deal with the outlier common items with inconsistent b-parameters, the current study investigated the comparability of 10 variations of four IRT-based equating methods (i.e., concurrent…
Adaptive distributed outlier detection for WSNs.

PubMed

De Paola, Alessandra; Gaglio, Salvatore; Lo Re, Giuseppe; Milazzo, Fabrizio; Ortolani, Marco

2015-05-01

The paradigm of pervasive computing is gaining more and more attention nowadays, thanks to the possibility of obtaining precise and continuous monitoring. Ease of deployment and adaptivity are typically implemented by adopting autonomous and cooperative sensory devices; however, for such systems to be of any practical use, reliability and fault tolerance must be guaranteed, for instance by detecting corrupted readings amidst the huge amount of gathered sensory data. This paper proposes an adaptive distributed Bayesian approach for detecting outliers in data collected by a wireless sensor network; our algorithm aims at optimizing classification accuracy, time complexity and communication complexity, and also considering externally imposed constraints on such conflicting goals. The performed experimental evaluation showed that our approach is able to improve the considered metrics for latency and energy consumption, with limited impact on classification accuracy.
Root System Water Consumption Pattern Identification on Time Series Data.

PubMed

Figueroa, Manuel; Pope, Christopher

2017-06-16

In agriculture, soil and meteorological sensors are used along low power networks to capture data, which allows for optimal resource usage and minimizing environmental impact. This study uses time series analysis methods for outliers' detection and pattern recognition on soil moisture sensor data to identify irrigation and consumption patterns and to improve a soil moisture prediction and irrigation system. This study compares three new algorithms with the current detection technique in the project; the results greatly decrease the number of false positives detected. The best result is obtained by the Series Strings Comparison (SSC) algorithm averaging a precision of 0.872 on the testing sets, vastly improving the current system's 0.348 precision.
ROKU: a novel method for identification of tissue-specific genes

PubMed Central

Kadota, Koji; Ye, Jiazhen; Nakai, Yuji; Terada, Tohru; Shimizu, Kentaro

2006-01-01

Background One of the important goals of microarray research is the identification of genes whose expression is considerably higher or lower in some tissues than in others. We would like to have ways of identifying such tissue-specific genes. Results We describe a method, ROKU, which selects tissue-specific patterns from gene expression data for many tissues and thousands of genes. ROKU ranks genes according to their overall tissue specificity using Shannon entropy and detects tissues specific to each gene if any exist using an outlier detection method. We evaluated the capacity for the detection of various specific expression patterns using synthetic and real data. We observed that ROKU was superior to a conventional entropy-based method in its ability to rank genes according to overall tissue specificity and to detect genes whose expression pattern are specific only to objective tissues. Conclusion ROKU is useful for the detection of various tissue-specific expression patterns. The framework is also directly applicable to the selection of diagnostic markers for molecular classification of multiple classes. PMID:16764735
How immunogenetically different are domestic pigs from wild boars: a perspective from single-nucleotide polymorphisms of 19 immunity-related candidate genes.

PubMed

Chen, Shanyuan; Gomes, Rui; Costa, Vânia; Santos, Pedro; Charneca, Rui; Zhang, Ya-ping; Liu, Xue-hong; Wang, Shao-qing; Bento, Pedro; Nunes, Jose-Luis; Buzgó, József; Varga, Gyula; Anton, István; Zsolnai, Attila; Beja-Pereira, Albano

2013-10-01

The coexistence of wild boars and domestic pigs across Eurasia makes it feasible to conduct comparative genetic or genomic analyses for addressing how genetically different a domestic species is from its wild ancestor. To test whether there are differences in patterns of genetic variability between wild and domestic pigs at immunity-related genes and to detect outlier loci putatively under selection that may underlie differences in immune responses, here we analyzed 54 single-nucleotide polymorphisms (SNPs) of 19 immunity-related candidate genes on 11 autosomes in three pairs of wild boar and domestic pig populations from China, Iberian Peninsula, and Hungary. Our results showed no statistically significant differences in allele frequency and heterozygosity across SNPs between three pairs of wild and domestic populations. This observation was more likely due to the widespread and long-lasting gene flow between wild boars and domestic pigs across Eurasia. In addition, we detected eight coding SNPs from six genes as outliers being under selection consistently by three outlier tests (BayeScan2.1, FDIST2, and Arlequin3.5). Among four non-synonymous outlier SNPs, one from TLR4 gene was identified as being subject to positive (diversifying) selection and three each from CD36, IFNW1, and IL1B genes were suggested as under balancing selection. All of these four non-synonymous variants were predicted as being benign by PolyPhen-2. Our results were supported by other independent lines of evidence for positive selection or balancing selection acting on these four immune genes (CD36, IFNW1, IL1B, and TLR4). Our study showed an example applying a candidate gene approach to identify functionally important mutations (i.e., outlier loci) in wild and domestic pigs for subsequent functional experiments.
An Unsupervised Anomalous Event Detection and Interactive Analysis Framework for Large-scale Satellite Data

NASA Astrophysics Data System (ADS)

LIU, Q.; Lv, Q.; Klucik, R.; Chen, C.; Gallaher, D. W.; Grant, G.; Shang, L.

2016-12-01

Due to the high volume and complexity of satellite data, computer-aided tools for fast quality assessments and scientific discovery are indispensable for scientists in the era of Big Data. In this work, we have developed a framework for automated anomalous event detection in massive satellite data. The framework consists of a clustering-based anomaly detection algorithm and a cloud-based tool for interactive analysis of detected anomalies. The algorithm is unsupervised and requires no prior knowledge of the data (e.g., expected normal pattern or known anomalies). As such, it works for diverse data sets, and performs well even in the presence of missing and noisy data. The cloud-based tool provides an intuitive mapping interface that allows users to interactively analyze anomalies using multiple features. As a whole, our framework can (1) identify outliers in a spatio-temporal context, (2) recognize and distinguish meaningful anomalous events from individual outliers, (3) rank those events based on "interestingness" (e.g., rareness or total number of outliers) defined by users, and (4) enable interactively query, exploration, and analysis of those anomalous events. In this presentation, we will demonstrate the effectiveness and efficiency of our framework in the application of detecting data quality issues and unusual natural events using two satellite datasets. The techniques and tools developed in this project are applicable for a diverse set of satellite data and will be made publicly available for scientists in early 2017.
Maximum Likelihood Methods in Treating Outliers and Symmetrically Heavy-Tailed Distributions for Nonlinear Structural Equation Models with Missing Data

ERIC Educational Resources Information Center

Lee, Sik-Yum; Xia, Ye-Mao

2006-01-01

By means of more than a dozen user friendly packages, structural equation models (SEMs) are widely used in behavioral, education, social, and psychological research. As the underlying theory and methods in these packages are vulnerable to outliers and distributions with longer-than-normal tails, a fundamental problem in the field is the…
Bayesian methods to determine performance differences and to quantify variability among centers in multi-center trials: the IHAST trial.

PubMed

Bayman, Emine O; Chaloner, Kathryn M; Hindman, Bradley J; Todd, Michael M

2013-01-16

To quantify the variability among centers and to identify centers whose performance are potentially outside of normal variability in the primary outcome and to propose a guideline that they are outliers. Novel statistical methodology using a Bayesian hierarchical model is used. Bayesian methods for estimation and outlier detection are applied assuming an additive random center effect on the log odds of response: centers are similar but different (exchangeable). The Intraoperative Hypothermia for Aneurysm Surgery Trial (IHAST) is used as an example. Analyses were adjusted for treatment, age, gender, aneurysm location, World Federation of Neurological Surgeons scale, Fisher score and baseline NIH stroke scale scores. Adjustments for differences in center characteristics were also examined. Graphical and numerical summaries of the between-center standard deviation (sd) and variability, as well as the identification of potential outliers are implemented. In the IHAST, the center-to-center variation in the log odds of favorable outcome at each center is consistent with a normal distribution with posterior sd of 0.538 (95% credible interval: 0.397 to 0.726) after adjusting for the effects of important covariates. Outcome differences among centers show no outlying centers. Four potential outlying centers were identified but did not meet the proposed guideline for declaring them as outlying. Center characteristics (number of subjects enrolled from the center, geographical location, learning over time, nitrous oxide, and temporary clipping use) did not predict outcome, but subject and disease characteristics did. Bayesian hierarchical methods allow for determination of whether outcomes from a specific center differ from others and whether specific clinical practices predict outcome, even when some centers/subgroups have relatively small sample sizes. In the IHAST no outlying centers were found. The estimated variability between centers was moderately large.
Raman fiber-optical method for colon cancer detection: Cross-validation and outlier identification approach

NASA Astrophysics Data System (ADS)

Petersen, D.; Naveed, P.; Ragheb, A.; Niedieker, D.; El-Mashtoly, S. F.; Brechmann, T.; Kötting, C.; Schmiegel, W. H.; Freier, E.; Pox, C.; Gerwert, K.

2017-06-01

Endoscopy plays a major role in early recognition of cancer which is not externally accessible and therewith in increasing the survival rate. Raman spectroscopic fiber-optical approaches can help to decrease the impact on the patient, increase objectivity in tissue characterization, reduce expenses and provide a significant time advantage in endoscopy. In gastroenterology an early recognition of malign and precursor lesions is relevant. Instantaneous and precise differentiation between adenomas as precursor lesions for cancer and hyperplastic polyps on the one hand and between high and low-risk alterations on the other hand is important. Raman fiber-optical measurements of colon biopsy samples taken during colonoscopy were carried out during a clinical study, and samples of adenocarcinoma (22), tubular adenomas (141), hyperplastic polyps (79) and normal tissue (101) from 151 patients were analyzed. This allows us to focus on the bioinformatic analysis and to set stage for Raman endoscopic measurements. Since spectral differences between normal and cancerous biopsy samples are small, special care has to be taken in data analysis. Using a leave-one-patient-out cross-validation scheme, three different outlier identification methods were investigated to decrease the influence of systematic errors, like a residual risk in misplacement of the sample and spectral dilution of marker bands (esp. cancerous tissue) and therewith optimize the experimental design. Furthermore other validations methods like leave-one-sample-out and leave-one-spectrum-out cross-validation schemes were compared with leave-one-patient-out cross-validation. High-risk lesions were differentiated from low-risk lesions with a sensitivity of 79%, specificity of 74% and an accuracy of 77%, cancer and normal tissue with a sensitivity of 79%, specificity of 83% and an accuracy of 81%. Additionally applied outlier identification enabled us to improve the recognition of neoplastic biopsy samples.
Data mining spacecraft telemetry: towards generic solutions to automatic health monitoring and status characterisation

NASA Astrophysics Data System (ADS)

Royer, P.; De Ridder, J.; Vandenbussche, B.; Regibo, S.; Huygen, R.; De Meester, W.; Evans, D. J.; Martinez, J.; Korte-Stapff, M.

2016-07-01

We present the first results of a study aimed at finding new and efficient ways to automatically process spacecraft telemetry for automatic health monitoring. The goal is to reduce the load on the flight control team while extending the "checkability" to the entire telemetry database, and provide efficient, robust and more accurate detection of anomalies in near real time. We present a set of effective methods to (a) detect outliers in the telemetry or in its statistical properties, (b) uncover and visualise special properties of the telemetry and (c) detect new behavior. Our results are structured around two main families of solutions. For parameters visiting a restricted set of signal values, i.e. all status parameters and about one third of all the others, we focus on a transition analysis, exploiting properties of Poincare plots. For parameters with an arbitrarily high number of possible signal values, we describe the statistical properties of the signal via its Kernel Density Estimate. We demonstrate that this allows for a generic and dynamic approach of the soft-limit definition. Thanks to a much more accurate description of the signal and of its time evolution, we are more sensitive and more responsive to outliers than the traditional checks against hard limits. Our methods were validated on two years of Venus Express telemetry. They are generic for assisting in health monitoring of any complex system with large amounts of diagnostic sensor data. Not only spacecraft systems but also present-day astronomical observatories can benefit from them.
Robust power spectral estimation for EEG data

PubMed Central

Melman, Tamar; Victor, Jonathan D.

2016-01-01

Background Typical electroencephalogram (EEG) recordings often contain substantial artifact. These artifacts, often large and intermittent, can interfere with quantification of the EEG via its power spectrum. To reduce the impact of artifact, EEG records are typically cleaned by a preprocessing stage that removes individual segments or components of the recording. However, such preprocessing can introduce bias, discard available signal, and be labor-intensive. With this motivation, we present a method that uses robust statistics to reduce dependence on preprocessing by minimizing the effect of large intermittent outliers on the spectral estimates. New method Using the multitaper method[1] as a starting point, we replaced the final step of the standard power spectrum calculation with a quantile-based estimator, and the Jackknife approach to confidence intervals with a Bayesian approach. The method is implemented in provided MATLAB modules, which extend the widely used Chronux toolbox. Results Using both simulated and human data, we show that in the presence of large intermittent outliers, the robust method produces improved estimates of the power spectrum, and that the Bayesian confidence intervals yield close-to-veridical coverage factors. Comparison to existing method The robust method, as compared to the standard method, is less affected by artifact: inclusion of outliers produces fewer changes in the shape of the power spectrum as well as in the coverage factor. Conclusion In the presence of large intermittent outliers, the robust method can reduce dependence on data preprocessing as compared to standard methods of spectral estimation. PMID:27102041
Automatic detection of a hand-held needle in ultrasound via phased-based analysis of the tremor motion

NASA Astrophysics Data System (ADS)

Beigi, Parmida; Salcudean, Septimiu E.; Rohling, Robert; Ng, Gary C.

2016-03-01

This paper presents an automatic localization method for a standard hand-held needle in ultrasound based on temporal motion analysis of spatially decomposed data. Subtle displacement arising from tremor motion has a periodic pattern which is usually imperceptible in the intensity image but may convey information in the phase image. Our method aims to detect such periodic motion of a hand-held needle and distinguish it from intrinsic tissue motion, using a technique inspired by video magnification. Complex steerable pyramids allow specific design of the wavelets' orientations according to the insertion angle as well as the measurement of the local phase. We therefore use steerable pairs of even and odd Gabor wavelets to decompose the ultrasound B-mode sequence into various spatial frequency bands. Variations of the local phase measurements in the spatially decomposed input data is then temporally analyzed using a finite impulse response bandpass filter to detect regions with a tremor motion pattern. Results obtained from different pyramid levels are then combined and thresholded to generate the binary mask input for the Hough transform, which determines an estimate of the direction angle and discards some of the outliers. Polynomial fitting is used at the final stage to remove any remaining outliers and improve the trajectory detection. The detected needle is finally added back to the input sequence as an overlay of a cloud of points. We demonstrate the efficiency of our approach to detect the needle using subtle tremor motion in an agar phantom and in-vivo porcine cases where intrinsic motion is also present. The localization accuracy was calculated by comparing to expert manual segmentation, and presented in (mean, standard deviation and root-mean-square error) of (0.93°, 1.26° and 0.87°) and (1.53 mm, 1.02 mm and 1.82 mm) for the trajectory and the tip, respectively.
Use of Mahalanobis Distance for Detecting Outliers and Outlier Clusters in Markedly Non-Normal Data: A Vehicular Traffic Example

DTIC Science & Technology

2011-06-01

usually walking on the right of on-coming people, and cars discouraged from passing on the right of a car traveling in the same direction. “Usually...forces a loss of detail due to horizontal compression: Valleys or troughs are squeezed into oblivion . To enable valleys to be seen, Figures 20 and 21...Volume. Left Panel: North- bound Traffic. Right Panel: Southbound Traffic. Northbound and Southbound Volume Ranges are Different 5.5 Fractional
Genomic Changes Associated with Reproductive and Migratory Ecotypes in Sockeye Salmon (Oncorhynchus nerka)

PubMed Central

Veale, Andrew J.

2017-01-01

Mechanisms underlying adaptive evolution can best be explored using paired populations displaying similar phenotypic divergence, illuminating the genomic changes associated with specific life history traits. Here, we used paired migratory [anadromous vs. resident (kokanee)] and reproductive [shore- vs. stream-spawning] ecotypes of sockeye salmon (Oncorhynchus nerka) sampled from seven lakes and two rivers spanning three catchments (Columbia, Fraser, and Skeena) in British Columbia, Canada to investigate the patterns and processes underlying their divergence. Restriction-site associated DNA sequencing was used to genotype this sampling at 7,347 single nucleotide polymorphisms, 334 of which were identified as outlier loci and candidates for divergent selection within at least one ecotype comparison. Sixty-eight of these outliers were present in two or more comparisons, with 33 detected across multiple catchments. Of particular note, one locus was detected as the most significant outlier between shore and stream-spawning ecotypes in multiple comparisons and across catchments (Columbia, Fraser, and Snake). We also detected several genomic islands of divergence, some shared among comparisons, potentially showing linked signals of differential selection. The single nucleotide polymorphisms and genomic regions identified in our study offer a range of mechanistic hypotheses associated with the genetic basis of O. nerka life history variation and provide novel tools for informing fisheries management. PMID:29045601
New Quality Control Algorithm Based on GNSS Sensing Data for a Bridge Health Monitoring System

PubMed Central

Lee, Jae Kang; Lee, Jae One; Kim, Jung Ok

2016-01-01

This research introduces an improvement plan for the reliability of Global Navigation Satellite System (GNSS) positioning solutions. It should be considered the most suitable methodology in terms of the adjustment and positioning of GNSS in order to maximize the utilization of GNSS applications. Though various studies have been conducted with regards to Bridge Health Monitoring System (BHMS) based on GNSS, the outliers which depend on the signal reception environment could not be considered until now. Since these outliers may be connected to GNSS data collected from major bridge members, which can reduce the reliability of a whole monitoring system through the delivery of false information, they should be detected and eliminated in the previous adjustment stage. In this investigation, the Detection, Identification, Adaptation (DIA) technique was applied and implemented through an algorithm. Moreover, it can be directly applied to GNSS data collected from long span cable stayed bridges and most of outliers were efficiently detected and eliminated simultaneously. By these effects, the reliability of GNSS should be enormously improved. Improvement on GNSS positioning accuracy is directly linked to the safety of bridges itself, and at the same time, the reliability of monitoring systems in terms of the system operation can also be increased. PMID:27240375
New Quality Control Algorithm Based on GNSS Sensing Data for a Bridge Health Monitoring System.

PubMed

Lee, Jae Kang; Lee, Jae One; Kim, Jung Ok

2016-05-27

This research introduces an improvement plan for the reliability of Global Navigation Satellite System (GNSS) positioning solutions. It should be considered the most suitable methodology in terms of the adjustment and positioning of GNSS in order to maximize the utilization of GNSS applications. Though various studies have been conducted with regards to Bridge Health Monitoring System (BHMS) based on GNSS, the outliers which depend on the signal reception environment could not be considered until now. Since these outliers may be connected to GNSS data collected from major bridge members, which can reduce the reliability of a whole monitoring system through the delivery of false information, they should be detected and eliminated in the previous adjustment stage. In this investigation, the Detection, Identification, Adaptation (DIA) technique was applied and implemented through an algorithm. Moreover, it can be directly applied to GNSS data collected from long span cable stayed bridges and most of outliers were efficiently detected and eliminated simultaneously. By these effects, the reliability of GNSS should be enormously improved. Improvement on GNSS positioning accuracy is directly linked to the safety of bridges itself, and at the same time, the reliability of monitoring systems in terms of the system operation can also be increased.
Onboard Robust Visual Tracking for UAVs Using a Reliable Global-Local Object Model

PubMed Central

Fu, Changhong; Duan, Ran; Kircali, Dogan; Kayacan, Erdal

2016-01-01

In this paper, we present a novel onboard robust visual algorithm for long-term arbitrary 2D and 3D object tracking using a reliable global-local object model for unmanned aerial vehicle (UAV) applications, e.g., autonomous tracking and chasing a moving target. The first main approach in this novel algorithm is the use of a global matching and local tracking approach. In other words, the algorithm initially finds feature correspondences in a way that an improved binary descriptor is developed for global feature matching and an iterative Lucas–Kanade optical flow algorithm is employed for local feature tracking. The second main module is the use of an efficient local geometric filter (LGF), which handles outlier feature correspondences based on a new forward-backward pairwise dissimilarity measure, thereby maintaining pairwise geometric consistency. In the proposed LGF module, a hierarchical agglomerative clustering, i.e., bottom-up aggregation, is applied using an effective single-link method. The third proposed module is a heuristic local outlier factor (to the best of our knowledge, it is utilized for the first time to deal with outlier features in a visual tracking application), which further maximizes the representation of the target object in which we formulate outlier feature detection as a binary classification problem with the output features of the LGF module. Extensive UAV flight experiments show that the proposed visual tracker achieves real-time frame rates of more than thirty-five frames per second on an i7 processor with 640 × 512 image resolution and outperforms the most popular state-of-the-art trackers favorably in terms of robustness, efficiency and accuracy. PMID:27589769

Brain tissues volume measurements from 2D MRI using parametric approach

NASA Astrophysics Data System (ADS)

L'vov, A. A.; Toropova, O. A.; Litovka, Yu. V.

2018-04-01

The purpose of the paper is to propose a fully automated method of volume assessment of structures within human brain. Our statistical approach uses maximum interdependency principle for decision making process of measurements consistency and unequal observations. Detecting outliers performed using maximum normalized residual test. We propose a statistical model which utilizes knowledge of tissues distribution in human brain and applies partial data restoration for precision improvement. The approach proposes completed computationally efficient and independent from segmentation algorithm used in the application.
The search for loci under selection: trends, biases and progress.

PubMed

Ahrens, Collin W; Rymer, Paul D; Stow, Adam; Bragg, Jason; Dillon, Shannon; Umbers, Kate D L; Dudaniec, Rachael Y

2018-03-01

Detecting genetic variants under selection using F ST outlier analysis (OA) and environmental association analyses (EAAs) are popular approaches that provide insight into the genetic basis of local adaptation. Despite the frequent use of OA and EAA approaches and their increasing attractiveness for detecting signatures of selection, their application to field-based empirical data have not been synthesized. Here, we review 66 empirical studies that use Single Nucleotide Polymorphisms (SNPs) in OA and EAA. We report trends and biases across biological systems, sequencing methods, approaches, parameters, environmental variables and their influence on detecting signatures of selection. We found striking variability in both the use and reporting of environmental data and statistical parameters. For example, linkage disequilibrium among SNPs and numbers of unique SNP associations identified with EAA were rarely reported. The proportion of putatively adaptive SNPs detected varied widely among studies, and decreased with the number of SNPs analysed. We found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection. OA identified a higher proportion of outliers when more individuals were sampled, but this was not the case for EAA. To facilitate repeatability, interpretation and synthesis of studies detecting selection, we recommend that future studies consistently report geographical coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure. Identifying standards for how OA and EAA studies are designed and reported will aid future transparency and comparability of SNP-based selection studies and help to progress landscape and evolutionary genomics. © 2018 John Wiley & Sons Ltd.
User Behavior Analytics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Turcotte, Melissa; Moore, Juston Shane

User Behaviour Analytics is the tracking, collecting and assessing of user data and activities. The goal is to detect misuse of user credentials by developing models for the normal behaviour of user credentials within a computer network and detect outliers with respect to their baseline.
Factors influencing hospital high length of stay outliers

PubMed Central

2012-01-01

Background The study of length of stay (LOS) outliers is important for the management and financing of hospitals. Our aim was to study variables associated with high LOS outliers and their evolution over time. Methods We used hospital administrative data from inpatient episodes in public acute care hospitals in the Portuguese National Health Service (NHS), with discharges between years 2000 and 2009, together with some hospital characteristics. The dependent variable, LOS outliers, was calculated for each diagnosis related group (DRG) using a trim point defined for each year by the geometric mean plus two standard deviations. Hospitals were classified on the basis of administrative, economic and teaching characteristics. We also studied the influence of comorbidities and readmissions. Logistic regression models, including a multivariable logistic regression, were used in the analysis. All the logistic regressions were fitted using generalized estimating equations (GEE). Results In near nine million inpatient episodes analysed we found a proportion of 3.9% high LOS outliers, accounting for 19.2% of total inpatient days. The number of hospital patient discharges increased between years 2000 and 2005 and slightly decreased after that. The proportion of outliers ranged between the lowest value of 3.6% (in years 2001 and 2002) and the highest value of 4.3% in 2009. Teaching hospitals with over 1,000 beds have significantly more outliers than other hospitals, even after adjustment to readmissions and several patient characteristics. Conclusions In the last years both average LOS and high LOS outliers are increasing in Portuguese NHS hospitals. As high LOS outliers represent an important proportion in the total inpatient days, this should be seen as an important alert for the management of hospitals and for national health policies. As expected, age, type of admission, and hospital type were significantly associated with high LOS outliers. The proportion of high outliers does not seem to be related to their financial coverage; they should be studied in order to highlight areas for further investigation. The increasing complexity of both hospitals and patients may be the single most important determinant of high LOS outliers and must therefore be taken into account by health managers when considering hospital costs. PMID:22906386
M-estimation for robust sparse unmixing of hyperspectral images

NASA Astrophysics Data System (ADS)

Toomik, Maria; Lu, Shijian; Nelson, James D. B.

2016-10-01

Hyperspectral unmixing methods often use a conventional least squares based lasso which assumes that the data follows the Gaussian distribution. The normality assumption is an approximation which is generally invalid for real imagery data. We consider a robust (non-Gaussian) approach to sparse spectral unmixing of remotely sensed imagery which reduces the sensitivity of the estimator to outliers and relaxes the linearity assumption. The method consists of several appropriate penalties. We propose to use an lp norm with 0 < p < 1 in the sparse regression problem, which induces more sparsity in the results, but makes the problem non-convex. On the other hand, the problem, though non-convex, can be solved quite straightforwardly with an extensible algorithm based on iteratively reweighted least squares. To deal with the huge size of modern spectral libraries we introduce a library reduction step, similar to the multiple signal classification (MUSIC) array processing algorithm, which not only speeds up unmixing but also yields superior results. In the hyperspectral setting we extend the traditional least squares method to the robust heavy-tailed case and propose a generalised M-lasso solution. M-estimation replaces the Gaussian likelihood with a fixed function ρ(e) that restrains outliers. The M-estimate function reduces the effect of errors with large amplitudes or even assigns the outliers zero weights. Our experimental results on real hyperspectral data show that noise with large amplitudes (outliers) often exists in the data. This ability to mitigate the influence of such outliers can therefore offer greater robustness. Qualitative hyperspectral unmixing results on real hyperspectral image data corroborate the efficacy of the proposed method.
Robust power spectral estimation for EEG data.

PubMed

Melman, Tamar; Victor, Jonathan D

2016-08-01

Typical electroencephalogram (EEG) recordings often contain substantial artifact. These artifacts, often large and intermittent, can interfere with quantification of the EEG via its power spectrum. To reduce the impact of artifact, EEG records are typically cleaned by a preprocessing stage that removes individual segments or components of the recording. However, such preprocessing can introduce bias, discard available signal, and be labor-intensive. With this motivation, we present a method that uses robust statistics to reduce dependence on preprocessing by minimizing the effect of large intermittent outliers on the spectral estimates. Using the multitaper method (Thomson, 1982) as a starting point, we replaced the final step of the standard power spectrum calculation with a quantile-based estimator, and the Jackknife approach to confidence intervals with a Bayesian approach. The method is implemented in provided MATLAB modules, which extend the widely used Chronux toolbox. Using both simulated and human data, we show that in the presence of large intermittent outliers, the robust method produces improved estimates of the power spectrum, and that the Bayesian confidence intervals yield close-to-veridical coverage factors. The robust method, as compared to the standard method, is less affected by artifact: inclusion of outliers produces fewer changes in the shape of the power spectrum as well as in the coverage factor. In the presence of large intermittent outliers, the robust method can reduce dependence on data preprocessing as compared to standard methods of spectral estimation. Copyright © 2016 Elsevier B.V. All rights reserved.
Automated data selection method to improve robustness of diffuse optical tomography for breast cancer imaging

PubMed Central

Vavadi, Hamed; Zhu, Quing

2016-01-01

Imaging-guided near infrared diffuse optical tomography (DOT) has demonstrated a great potential as an adjunct modality for differentiation of malignant and benign breast lesions and for monitoring treatment response of breast cancers. However, diffused light measurements are sensitive to artifacts caused by outliers and errors in measurements due to probe-tissue coupling, patient and probe motions, and tissue heterogeneity. In general, pre-processing of the measurements is needed by experienced users to manually remove these outliers and therefore reduce imaging artifacts. An automated method of outlier removal, data selection, and filtering for diffuse optical tomography is introduced in this manuscript. This method consists of multiple steps to first combine several data sets collected from the same patient at contralateral normal breast and form a single robust reference data set using statistical tests and linear fitting of the measurements. The second step improves the perturbation measurements by filtering out outliers from the lesion site measurements using model based analysis. The results of 20 malignant and benign cases show similar performance between manual data processing and automated processing and improvement in tissue characterization of malignant to benign ratio by about 27%. PMID:27867711
Outliers: Elementary Teachers Who Actually Teach Social Studies

ERIC Educational Resources Information Center

Anderson, Derek

2014-01-01

This mixed methods study identified six elementary teachers, who, despite the widespread marginalization of elementary social studies, spent considerable time on the subject. These six outliers from a sample of forty-six Michigan elementary teachers were interviewed, and their teaching was observed to better understand how and why they deviate…
Autoregressive model in the Lp norm space for EEG analysis.

PubMed

Li, Peiyang; Wang, Xurui; Li, Fali; Zhang, Rui; Ma, Teng; Peng, Yueheng; Lei, Xu; Tian, Yin; Guo, Daqing; Liu, Tiejun; Yao, Dezhong; Xu, Peng

2015-01-30

The autoregressive (AR) model is widely used in electroencephalogram (EEG) analyses such as waveform fitting, spectrum estimation, and system identification. In real applications, EEGs are inevitably contaminated with unexpected outlier artifacts, and this must be overcome. However, most of the current AR models are based on the L2 norm structure, which exaggerates the outlier effect due to the square property of the L2 norm. In this paper, a novel AR object function is constructed in the Lp (p≤1) norm space with the aim to compress the outlier effects on EEG analysis, and a fast iteration procedure is developed to solve this new AR model. The quantitative evaluation using simulated EEGs with outliers proves that the proposed Lp (p≤1) AR can estimate the AR parameters more robustly than the Yule-Walker, Burg and LS methods, under various simulated outlier conditions. The actual application to the resting EEG recording with ocular artifacts also demonstrates that Lp (p≤1) AR can effectively address the outliers and recover a resting EEG power spectrum that is more consistent with its physiological basis. Copyright © 2014 Elsevier B.V. All rights reserved.
Locality-constrained anomaly detection for hyperspectral imagery

NASA Astrophysics Data System (ADS)

Liu, Jiabin; Li, Wei; Du, Qian; Liu, Kui

2015-12-01

Detecting a target with low-occurrence-probability from unknown background in a hyperspectral image, namely anomaly detection, is of practical significance. Reed-Xiaoli (RX) algorithm is considered as a classic anomaly detector, which calculates the Mahalanobis distance between local background and the pixel under test. Local RX, as an adaptive RX detector, employs a dual-window strategy to consider pixels within the frame between inner and outer windows as local background. However, the detector is sensitive if such a local region contains anomalous pixels (i.e., outliers). In this paper, a locality-constrained anomaly detector is proposed to remove outliers in the local background region before employing the RX algorithm. Specifically, a local linear representation is designed to exploit the internal relationship between linearly correlated pixels in the local background region and the pixel under test and its neighbors. Experimental results demonstrate that the proposed detector improves the original local RX algorithm.
Combined CT-based and image-free navigation systems in TKA reduces postoperative outliers of rotational alignment of the tibial component.

PubMed

Mitsuhashi, Shota; Akamatsu, Yasushi; Kobayashi, Hideo; Kusayama, Yoshihiro; Kumagai, Ken; Saito, Tomoyuki

2018-02-01

Rotational malpositioning of the tibial component can lead to poor functional outcome in TKA. Although various surgical techniques have been proposed, precise rotational placement of the tibial component was difficult to accomplish even with the use of a navigation system. The purpose of this study is to assess whether combined CT-based and image-free navigation systems replicate accurately the rotational alignment of tibial component that was preoperatively planned on CT, compared with the conventional method. We compared the number of outliers for rotational alignment of the tibial component using combined CT-based and image-free navigation systems (navigated group) with those of conventional method (conventional group). Seventy-two TKAs were performed between May 2012 and December 2014. In the navigated group, the anteroposterior axis was prepared using CT-based navigation system and the tibial component was positioned under control of the navigation. In the conventional group, the tibial component was placed with reference to the Akagi line that was determined visually. Fisher's exact probability test was performed to evaluate the results. There was a significant difference between the two groups with regard to the number of outliers: 3 outliers in the navigated group compared with 12 outliers in the conventional group (P < 0.01). We concluded that combined CT-based and image-free navigation systems decreased the number of rotational outliers of tibial component, and was helpful for the replication of the accurate rotational alignment of the tibial component that was preoperatively planned.
A new algorithm for automatic Outlier Detection in GPS Time Series

NASA Astrophysics Data System (ADS)

Cannavo', Flavio; Mattia, Mario; Rossi, Massimo; Palano, Mimmo; Bruno, Valentina

2010-05-01

Nowadays continuous GPS time series are considered a crucial product of GPS permanent networks, useful in many geo-science fields, such as active tectonics, seismology, crustal deformation and volcano monitoring (Altamimi et al. 2002, Elósegui et al. 2006, Aloisi et al. 2009). Although the GPS data elaboration software has increased in reliability, the time series are still affected by different kind of noise, from the intrinsic noise (e.g. thropospheric delay) to the un-modeled noise (e.g. cycle slips, satellite faults, parameters changing). Typically GPS Time Series present characteristic noise that is a linear combination of white noise and correlated colored noise, and this characteristic is fractal in the sense that is evident for every considered time scale or sampling rate. The un-modeled noise sources result in spikes, outliers and steps. These kind of errors can appreciably influence the estimation of velocities of the monitored sites. The outlier detection in generic time series is a widely treated problem in literature (Wei, 2005), while is not fully developed for the specific kind of GPS series. We propose a robust automatic procedure for cleaning the GPS time series from the outliers and, especially for long daily series, steps due to strong seismic or volcanic events or merely instrumentation changing such as antenna and receiver upgrades. The procedure is basically divided in two steps: a first step for the colored noise reduction and a second step for outlier detection through adaptive series segmentation. Both algorithms present novel ideas and are nearly unsupervised. In particular, we propose an algorithm to estimate an autoregressive model for colored noise in GPS time series in order to subtract the effect of non Gaussian noise on the series. This step is useful for the subsequent step (i.e. adaptive segmentation) which requires the hypothesis of Gaussian noise. The proposed algorithms are tested in a benchmark case study and the results confirm that the algorithms are effective and reasonable. Bibliography - Aloisi M., A. Bonaccorso, F. Cannavò, S. Gambino, M. Mattia, G. Puglisi, E. Boschi, A new dyke intrusion style for the Mount Etna May 2008 eruption modelled through continuous tilt and GPS data, Terra Nova, Volume 21 Issue 4 , Pages 316 - 321, doi: 10.1111/j.1365-3121.2009.00889.x (August 2009) - Altamimi Z., Sillard P., Boucher C., ITRF2000: A new release of the International Terrestrial Reference frame for earth science applications, J Geophys Res-Solid Earth, 107 (B10): art. no.-2214, (Oct 2002) - Elósegui, P., J. L. Davis, D. Oberlander, R. Baena, and G. Ekström , Accuracy of high-rate GPS for seismology, Geophys. Res. Lett., 33, L11308, doi:10.1029/2006GL026065 (2006) - Wei W. S., Time Series Analysis: Univariate and Multivariate Methods, Addison Wesley (2 edition), ISBN-10: 0321322169 (July, 2005)
A practical guide to environmental association analysis in landscape genomics.

PubMed

Rellstab, Christian; Gugerli, Felix; Eckert, Andrew J; Hancock, Angela M; Holderegger, Rolf

2015-09-01

Landscape genomics is an emerging research field that aims to identify the environmental factors that shape adaptive genetic variation and the gene variants that drive local adaptation. Its development has been facilitated by next-generation sequencing, which allows for screening thousands to millions of single nucleotide polymorphisms in many individuals and populations at reasonable costs. In parallel, data sets describing environmental factors have greatly improved and increasingly become publicly accessible. Accordingly, numerous analytical methods for environmental association studies have been developed. Environmental association analysis identifies genetic variants associated with particular environmental factors and has the potential to uncover adaptive patterns that are not discovered by traditional tests for the detection of outlier loci based on population genetic differentiation. We review methods for conducting environmental association analysis including categorical tests, logistic regressions, matrix correlations, general linear models and mixed effects models. We discuss the advantages and disadvantages of different approaches, provide a list of dedicated software packages and their specific properties, and stress the importance of incorporating neutral genetic structure in the analysis. We also touch on additional important aspects such as sampling design, environmental data preparation, pooled and reduced-representation sequencing, candidate-gene approaches, linearity of allele-environment associations and the combination of environmental association analyses with traditional outlier detection tests. We conclude by summarizing expected future directions in the field, such as the extension of statistical approaches, environmental association analysis for ecological gene annotation, and the need for replication and post hoc validation studies. © 2015 John Wiley & Sons Ltd.
Diagnostics for generalized linear hierarchical models in network meta-analysis.

PubMed

Zhao, Hong; Hodges, James S; Carlin, Bradley P

2017-09-01

Network meta-analysis (NMA) combines direct and indirect evidence comparing more than 2 treatments. Inconsistency arises when these 2 information sources differ. Previous work focuses on inconsistency detection, but little has been done on how to proceed after identifying inconsistency. The key issue is whether inconsistency changes an NMA's substantive conclusions. In this paper, we examine such discrepancies from a diagnostic point of view. Our methods seek to detect influential and outlying observations in NMA at a trial-by-arm level. These observations may have a large effect on the parameter estimates in NMA, or they may deviate markedly from other observations. We develop formal diagnostics for a Bayesian hierarchical model to check the effect of deleting any observation. Diagnostics are specified for generalized linear hierarchical NMA models and investigated for both published and simulated datasets. Results from our example dataset using either contrast- or arm-based models and from the simulated datasets indicate that the sources of inconsistency in NMA tend not to be influential, though results from the example dataset suggest that they are likely to be outliers. This mimics a familiar result from linear model theory, in which outliers with low leverage are not influential. Future extensions include incorporating baseline covariates and individual-level patient data. Copyright © 2017 John Wiley & Sons, Ltd.
Genomic Changes Associated with Reproductive and Migratory Ecotypes in Sockeye Salmon (Oncorhynchus nerka).

PubMed

Veale, Andrew J; Russello, Michael A

2017-10-01

Mechanisms underlying adaptive evolution can best be explored using paired populations displaying similar phenotypic divergence, illuminating the genomic changes associated with specific life history traits. Here, we used paired migratory [anadromous vs. resident (kokanee)] and reproductive [shore- vs. stream-spawning] ecotypes of sockeye salmon (Oncorhynchus nerka) sampled from seven lakes and two rivers spanning three catchments (Columbia, Fraser, and Skeena) in British Columbia, Canada to investigate the patterns and processes underlying their divergence. Restriction-site associated DNA sequencing was used to genotype this sampling at 7,347 single nucleotide polymorphisms, 334 of which were identified as outlier loci and candidates for divergent selection within at least one ecotype comparison. Sixty-eight of these outliers were present in two or more comparisons, with 33 detected across multiple catchments. Of particular note, one locus was detected as the most significant outlier between shore and stream-spawning ecotypes in multiple comparisons and across catchments (Columbia, Fraser, and Snake). We also detected several genomic islands of divergence, some shared among comparisons, potentially showing linked signals of differential selection. The single nucleotide polymorphisms and genomic regions identified in our study offer a range of mechanistic hypotheses associated with the genetic basis of O. nerka life history variation and provide novel tools for informing fisheries management. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Data Analytics for Smart Parking Applications.

PubMed

Piovesan, Nicola; Turi, Leo; Toigo, Enrico; Martinez, Borja; Rossi, Michele

2016-09-23

We consider real-life smart parking systems where parking lot occupancy data are collected from field sensor devices and sent to backend servers for further processing and usage for applications. Our objective is to make these data useful to end users, such as parking managers, and, ultimately, to citizens. To this end, we concoct and validate an automated classification algorithm having two objectives: (1) outlier detection: to detect sensors with anomalous behavioral patterns, i.e., outliers; and (2) clustering: to group the parking sensors exhibiting similar patterns into distinct clusters. We first analyze the statistics of real parking data, obtaining suitable simulation models for parking traces. We then consider a simple classification algorithm based on the empirical complementary distribution function of occupancy times and show its limitations. Hence, we design a more sophisticated algorithm exploiting unsupervised learning techniques (self-organizing maps). These are tuned following a supervised approach using our trace generator and are compared against other clustering schemes, namely expectation maximization, k-means clustering and DBSCAN, considering six months of data from a real sensor deployment. Our approach is found to be superior in terms of classification accuracy, while also being capable of identifying all of the outliers in the dataset.
Data Analytics for Smart Parking Applications

PubMed Central

Piovesan, Nicola; Turi, Leo; Toigo, Enrico; Martinez, Borja; Rossi, Michele

2016-01-01

We consider real-life smart parking systems where parking lot occupancy data are collected from field sensor devices and sent to backend servers for further processing and usage for applications. Our objective is to make these data useful to end users, such as parking managers, and, ultimately, to citizens. To this end, we concoct and validate an automated classification algorithm having two objectives: (1) outlier detection: to detect sensors with anomalous behavioral patterns, i.e., outliers; and (2) clustering: to group the parking sensors exhibiting similar patterns into distinct clusters. We first analyze the statistics of real parking data, obtaining suitable simulation models for parking traces. We then consider a simple classification algorithm based on the empirical complementary distribution function of occupancy times and show its limitations. Hence, we design a more sophisticated algorithm exploiting unsupervised learning techniques (self-organizing maps). These are tuned following a supervised approach using our trace generator and are compared against other clustering schemes, namely expectation maximization, k-means clustering and DBSCAN, considering six months of data from a real sensor deployment. Our approach is found to be superior in terms of classification accuracy, while also being capable of identifying all of the outliers in the dataset. PMID:27669259
DOE Office of Scientific and Technical Information (OSTI.GOV)

Delaney, Alexander R., E-mail: a.delaney@vumc.nl; Tol, Jim P.; Dahele, Max

Purpose: RapidPlan, a commercial knowledge-based planning solution, uses a model library containing the geometry and associated dosimetry of existing plans. This model predicts achievable dosimetry for prospective patients that can be used to guide plan optimization. However, it is unknown how suboptimal model plans (outliers) influence the predictions or resulting plans. We investigated the effect of, first, removing outliers from the model (cleaning it) and subsequently adding deliberate dosimetric outliers. Methods and Materials: Clinical plans from 70 head and neck cancer patients comprised the uncleaned (UC) Model{sub UC}, from which outliers were cleaned (C) to create Model{sub C}. The lastmore » 5 to 40 patients of Model{sub C} were replanned with no attempt to spare the salivary glands. These substantial dosimetric outliers were reintroduced to the model in increments of 5, creating Model{sub 5} to Model{sub 40} (Model{sub 5-40}). These models were used to create plans for a 10-patient evaluation group. Plans from Model{sub UC} and Model{sub C}, and Model{sub C} and Model{sub 5-40} were compared on the basis of boost (B) and elective (E) target volume homogeneity indexes (HI{sub B}/HI{sub E}) and mean doses to oral cavity, composite salivary glands (comp{sub sal}) and swallowing (comp{sub swal}) structures. Results: On average, outlier removal (Model{sub C} vs Model{sub UC}) had minimal effects on HI{sub B}/HI{sub E} (0%-0.4%) and sparing of organs at risk (mean dose difference to oral cavity and comp{sub sal}/comp{sub swal} were ≤0.4 Gy). Model{sub 5-10} marginally improved comp{sub sal} sparing, whereas adding a larger number of outliers (Model{sub 20-40}) led to deteriorations in comp{sub sal} up to 3.9 Gy, on average. These increases are modest compared to the 14.9 Gy dose increases in the added outlier plans, due to the placement of optimization objectives below the inferior boundary of the dose-volume histogram-predicted range. Conclusions: Overall, dosimetric outlier removal from or addition of 5 to 10 outliers to a 70-patient model had marginal effects on resulting plan quality. Although the addition of >20 outliers deteriorated plan quality, the effect was modest. In this study, RapidPlan demonstrated robustness for moderate proportions of salivary gland dosimetric outliers.« less
Performance of digital RGB reflectance color extraction for plaque lesion

NASA Astrophysics Data System (ADS)

Hashim, Hadzli; Taib, Mohd Nasir; Jailani, Rozita; Sulaiman, Saadiah; Baba, Roshidah

2005-01-01

Several clinical psoriasis lesion groups are been studied for digital RGB color features extraction. Previous works have used samples size that included all the outliers lying beyond the standard deviation factors from the peak histograms. This paper described the statistical performances of the RGB model with and without removing these outliers. Plaque lesion is experimented with other types of psoriasis. The statistical tests are compared with respect to three samples size; the original 90 samples, the first size reduction by removing outliers from 2 standard deviation distances (2SD) and the second size reduction by removing outliers from 1 standard deviation distance (1SD). Quantification of data images through the normal/direct and differential of the conventional reflectance method is considered. Results performances are concluded by observing the error plots with 95% confidence interval and findings of the inference T-tests applied. The statistical tests outcomes have shown that B component for conventional differential method can be used to distinctively classify plaque from the other psoriasis groups in consistent with the error plots finding with an improvement in p-value greater than 0.5.
Robust Correlation Analyses: False Positive and Power Validation Using a New Open Source Matlab Toolbox

PubMed Central

Pernet, Cyril R.; Wilcox, Rand; Rousselet, Guillaume A.

2012-01-01

Pearson’s correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab(R) based toolbox (http://sourceforge.net/projects/robustcorrtool/) that computes robust measures of association between two or more random variables: the percentage-bend correlation and skipped-correlations. After illustrating how to use the toolbox, we show that robust methods, where outliers are down weighted or removed and accounted for in significance testing, provide better estimates of the true association with accurate false positive control and without loss of power. The different correlation methods were tested with normal data and normal data contaminated with marginal or bivariate outliers. We report estimates of effect size, false positive rate and power, and advise on which technique to use depending on the data at hand. PMID:23335907

Robust correlation analyses: false positive and power validation using a new open source matlab toolbox.

PubMed

Pernet, Cyril R; Wilcox, Rand; Rousselet, Guillaume A

2012-01-01

Pearson's correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab((R)) based toolbox (http://sourceforge.net/projects/robustcorrtool/) that computes robust measures of association between two or more random variables: the percentage-bend correlation and skipped-correlations. After illustrating how to use the toolbox, we show that robust methods, where outliers are down weighted or removed and accounted for in significance testing, provide better estimates of the true association with accurate false positive control and without loss of power. The different correlation methods were tested with normal data and normal data contaminated with marginal or bivariate outliers. We report estimates of effect size, false positive rate and power, and advise on which technique to use depending on the data at hand.
Robust curb detection with fusion of 3D-Lidar and camera data.

PubMed

Tan, Jun; Li, Jian; An, Xiangjing; He, Hangen

2014-05-21

Curb detection is an essential component of Autonomous Land Vehicles (ALV), especially important for safe driving in urban environments. In this paper, we propose a fusion-based curb detection method through exploiting 3D-Lidar and camera data. More specifically, we first fuse the sparse 3D-Lidar points and high-resolution camera images together to recover a dense depth image of the captured scene. Based on the recovered dense depth image, we propose a filter-based method to estimate the normal direction within the image. Then, by using the multi-scale normal patterns based on the curb's geometric property, curb point features fitting the patterns are detected in the normal image row by row. After that, we construct a Markov Chain to model the consistency of curb points which utilizes the continuous property of the curb, and thus the optimal curb path which links the curb points together can be efficiently estimated by dynamic programming. Finally, we perform post-processing operations to filter the outliers, parameterize the curbs and give the confidence scores on the detected curbs. Extensive evaluations clearly show that our proposed method can detect curbs with strong robustness at real-time speed for both static and dynamic scenes.
Robust estimation procedure in panel data model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shariff, Nurul Sima Mohamad; Hamzah, Nor Aishah

2014-06-19

The panel data modeling has received a great attention in econometric research recently. This is due to the availability of data sources and the interest to study cross sections of individuals observed over time. However, the problems may arise in modeling the panel in the presence of cross sectional dependence and outliers. Even though there are few methods that take into consideration the presence of cross sectional dependence in the panel, the methods may provide inconsistent parameter estimates and inferences when outliers occur in the panel. As such, an alternative method that is robust to outliers and cross sectional dependencemore » is introduced in this paper. The properties and construction of the confidence interval for the parameter estimates are also considered in this paper. The robustness of the procedure is investigated and comparisons are made to the existing method via simulation studies. Our results have shown that robust approach is able to produce an accurate and reliable parameter estimates under the condition considered.« less
Ground-based remote sensing of HDO/H2O ratio profiles: introduction and validation of an innovative retrieval approach

NASA Astrophysics Data System (ADS)

Schneider, M.; Hase, F.; Blumenstock, T.

2006-10-01

We propose an innovative approach for analysing ground-based FTIR spectra which allows us to detect variabilities of lower and middle/upper tropospheric HDO/H2O ratios. We show that the proposed method is superior to common approaches. We estimate that lower tropospheric HDO/H2O ratios can be detected with a noise to signal ratio of 15% and middle/upper tropospheric ratios with a noise to signal ratio of 50%. The method requires the inversion to be performed on a logarithmic scale and to introduce an inter-species constraint. While common methods calculate the isotope ratio posterior to an independent, optimal estimation of the HDO and H2O profile, the proposed approach is an optimal estimator for the ratio itself. We apply the innovative approach to spectra measured continuously during 15 months and present, for the first time, an annual cycle of tropospheric HDO/H2O ratio profiles as detected by ground-based measurements. Outliers in the detected middle/upper tropospheric ratios are interpreted by backward trajectories.
Ground-based remote sensing of HDO/H2O ratio profiles: introduction and validation of an innovative retrieval approach

NASA Astrophysics Data System (ADS)

Schneider, M.; Hase, F.; Blumenstock, T.

2006-06-01

We propose an innovative approach for analysing ground-based FTIR spectra which allows us to detect variabilities of lower and middle/upper tropospheric HDO/H2O ratios. We show that the proposed method is superior to common approaches. We estimate that lower tropospheric HDO/H2O ratios can be detected with a noise to signal ratio of 15% and middle/upper tropospheric ratios with a noise to signal ratio of 50%. The method requires the inversion to be performed on a logarithmic scale and to introduce an inter-species constraint. While common methods calculate the isotope ratio posterior to an independent, optimal estimation of the HDO and H2O profile, the proposed approach is an optimal estimator for the ratio itself. We apply the innovative approach to spectra measured continuously during 15 months and present, for the first time, an annual cycle of tropospheric HDO/H2O ratio profiles as detected by ground-based measurements. Outliers in the detected middle/upper tropospheric ratios are interpreted by backward trajectories.
A Comprehensive review of group level model performance in the presence of heteroscedasticity: Can a single model control Type I errors in the presence of outliers?

PubMed Central

Mumford, Jeanette A.

2017-01-01

Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL’s Flame 1 and FSL’s outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL’s Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall’s Tau. Additionally, subject omission using the Cook’s Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed. PMID:28030782
Anomaly Detection for Beam Loss Maps in the Large Hadron Collider

NASA Astrophysics Data System (ADS)

Valentino, Gianluca; Bruce, Roderik; Redaelli, Stefano; Rossi, Roberto; Theodoropoulos, Panagiotis; Jaster-Merz, Sonja

2017-07-01

In the LHC, beam loss maps are used to validate collimator settings for cleaning and machine protection. This is done by monitoring the loss distribution in the ring during infrequent controlled loss map campaigns, as well as in standard operation. Due to the complexity of the system, consisting of more than 50 collimators per beam, it is difficult to identify small changes in the collimation hierarchy, which may be due to setting errors or beam orbit drifts with such methods. A technique based on Principal Component Analysis and Local Outlier Factor is presented to detect anomalies in the loss maps and therefore provide an automatic check of the collimation hierarchy.
The observational case for Jupiter being a typical massive planet.

PubMed

Lineweaver, Charles H; Grether, Daniel

2002-01-01

We identify a subsample of the recently detected extrasolar planets that is minimally affected by the selection effects of the Doppler detection method. With a simple analysis we quantify trends in the surface density of this subsample in the period-Msin(i) plane. A modest extrapolation of these trends puts Jupiter in the most densely occupied region of this parameter space, thus indicating that Jupiter is a typical massive planet rather than an outlier. Our analysis suggests that Jupiter is more typical than indicated by previous analyses. For example, instead of MJup mass exoplanets being twice as common as 2 MJup exoplanets, we find they are three times as common.
Automated rice leaf disease detection using color image analysis

NASA Astrophysics Data System (ADS)

Pugoy, Reinald Adrian D. L.; Mariano, Vladimir Y.

2011-06-01

In rice-related institutions such as the International Rice Research Institute, assessing the health condition of a rice plant through its leaves, which is usually done as a manual eyeball exercise, is important to come up with good nutrient and disease management strategies. In this paper, an automated system that can detect diseases present in a rice leaf using color image analysis is presented. In the system, the outlier region is first obtained from a rice leaf image to be tested using histogram intersection between the test and healthy rice leaf images. Upon obtaining the outlier, it is then subjected to a threshold-based K-means clustering algorithm to group related regions into clusters. Then, these clusters are subjected to further analysis to finally determine the suspected diseases of the rice leaf.
The cause of outliers in electromagnetic pulse (EMP) locations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fenimore, Edward E.

2014-10-02

We present methods to calculate the location of EMP pulses when observed by 5 or more satellites. Simulations show that, even with a good initial guess and fitting a location to all of the data, there are sometime outlier results whose locations are much worse than most cases. By comparing simulations using different ionospheric transfer functions (ITFs), it appears that the outliers are caused by not including the additional path length due to refraction rather than being caused by not including higher order terms in the Appleton-Hartree equation. We suggest ways that the outliers can be corrected. These correction methodsmore » require one to use an electron density profile along the line of sight from the event to the satellite rather than using the total electron content (TEC) to characterize the ionosphere.« less
Genetic variation of loci potentially under selection confounds species-genetic diversity correlations in a fragmented habitat.

PubMed

Bertin, Angeline; Gouin, Nicolas; Baumel, Alex; Gianoli, Ernesto; Serratosa, Juan; Osorio, Rodomiro; Manel, Stephanie

2017-01-01

Positive species-genetic diversity correlations (SGDCs) are often thought to result from the parallel influence of neutral processes on genetic and species diversity. Yet, confounding effects of non-neutral mechanisms have not been explored. Here, we investigate the impact of non-neutral genetic diversity on SGDCs in high Andean wetlands. We compare correlations between plant species diversity and genetic diversity (GD) calculated with and without loci potentially under selection (outlier loci). The study system includes 2188 specimens from five species (three common aquatic macroinvertebrate and two dominant plant species) that were genotyped for 396 amplified fragment length polymorphism loci. We also appraise the importance of neutral processes on SGDCs by investigating the influence of habitat fragmentation features. Significant positive SGDCs were detected for all five species (mean SGDC = 0.52 ± 0.05). While only a few outlier loci were detected in each species, they resulted in significant decreases in GD and in SGDCs. This supports the hypothesis that neutral processes drive species-genetic diversity relationships in high Andean wetlands. Unexpectedly, the effects on genetic diversity GD of the habitat fragmentation characteristics in this study increased with the presence of outlier loci in two species. Overall, our results reveal pitfalls in using habitat features to infer processes driving SGDCs and show that a few loci potentially under selection are enough to cause a significant downward bias in SGDC. Investigating confounding effects of outlier loci thus represents a useful approach to evidence the contribution of neutral processes on species-genetic diversity relationships. © 2016 John Wiley & Sons Ltd.
Genome scanning for detecting adaptive genes along environmental gradients in the Japanese conifer, Cryptomeria japonica.

PubMed

Tsumura, Y; Uchiyama, K; Moriguchi, Y; Ueno, S; Ihara-Ujino, T

2012-12-01

Local adaptation is important in evolutionary processes and speciation. We used multiple tests to identify several candidate genes that may be involved in local adaptation from 1026 loci in 14 natural populations of Cryptomeria japonica, the most economically important forestry tree in Japan. We also studied the relationships between genotypes and environmental variables to obtain information on the selective pressures acting on individual populations. Outlier loci were mapped onto a linkage map, and the positions of loci associated with specific environmental variables are considered. The outlier loci were not randomly distributed on the linkage map; linkage group 11 was identified as a genomic island of divergence. Three loci in this region were also associated with environmental variables such as mean annual temperature, daily maximum temperature, maximum snow depth, and so on. Outlier loci identified with high significance levels will be essential for conservation purposes and for future work on molecular breeding.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

PubMed

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

NASA Astrophysics Data System (ADS)

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Appending Limited Clinical Data to an Administrative Database for Acute Myocardial Infarction Patients: The Impact on the Assessment of Hospital Quality.

PubMed

Hannan, Edward L; Samadashvili, Zaza; Cozzens, Kimberly; Jacobs, Alice K; Venditti, Ferdinand J; Holmes, David R; Berger, Peter B; Stamato, Nicholas J; Hughes, Suzanne; Walford, Gary

2016-05-01

Hospitals' risk-standardized mortality rates and outlier status (significantly higher/lower rates) are reported by the Centers for Medicare and Medicaid Services (CMS) for acute myocardial infarction (AMI) patients using Medicare claims data. New York now has AMI claims data with blood pressure and heart rate added. The objective of this study was to see whether the appended database yields different hospital assessments than standard claims data. New York State clinically appended claims data for AMI were used to create 2 different risk models based on CMS methods: 1 with and 1 without the added clinical data. Model discrimination was compared, and differences between the models in hospital outlier status and tertile status were examined. Mean arterial pressure and heart rate were both significant predictors of mortality in the clinically appended model. The C statistic for the model with the clinical variables added was significantly higher (0.803 vs. 0.773, P<0.001). The model without clinical variables identified 10 low outliers and all of them were percutaneous coronary intervention hospitals. When clinical variables were included in the model, only 6 of those 10 hospitals were low outliers, but there were 2 new low outliers. The model without clinical variables had only 3 high outliers, and the model with clinical variables included identified 2 new high outliers. Appending even a small number of clinical data elements to administrative data resulted in a difference in the assessment of hospital mortality outliers for AMI. The strategy of adding limited but important clinical data elements to administrative datasets should be considered when evaluating hospital quality for procedures and other medical conditions.
Outlier Detection for Sensor Systems (ODSS): A MATLAB Macro for Evaluating Microphone Sensor Data Quality.

PubMed

Vasta, Robert; Crandell, Ian; Millican, Anthony; House, Leanna; Smith, Eric

2017-10-13

Microphone sensor systems provide information that may be used for a variety of applications. Such systems generate large amounts of data. One concern is with microphone failure and unusual values that may be generated as part of the information collection process. This paper describes methods and a MATLAB graphical interface that provides rapid evaluation of microphone performance and identifies irregularities. The approach and interface are described. An application to a microphone array used in a wind tunnel is used to illustrate the methodology.
Power enhancement via multivariate outlier testing with gene expression arrays.

PubMed

Asare, Adam L; Gao, Zhong; Carey, Vincent J; Wang, Richard; Seyfert-Margolis, Vicki

2009-01-01

As the use of microarrays in human studies continues to increase, stringent quality assurance is necessary to ensure accurate experimental interpretation. We present a formal approach for microarray quality assessment that is based on dimension reduction of established measures of signal and noise components of expression followed by parametric multivariate outlier testing. We applied our approach to several data resources. First, as a negative control, we found that the Affymetrix and Illumina contributions to MAQC data were free from outliers at a nominal outlier flagging rate of alpha=0.01. Second, we created a tunable framework for artificially corrupting intensity data from the Affymetrix Latin Square spike-in experiment to allow investigation of sensitivity and specificity of quality assurance (QA) criteria. Third, we applied the procedure to 507 Affymetrix microarray GeneChips processed with RNA from human peripheral blood samples. We show that exclusion of arrays by this approach substantially increases inferential power, or the ability to detect differential expression, in large clinical studies. http://bioconductor.org/packages/2.3/bioc/html/arrayMvout.html and http://bioconductor.org/packages/2.3/bioc/html/affyContam.html affyContam (credentials: readonly/readonly)
The Space-Time Variation of Global Crop Yields, Detecting Simultaneous Outliers and Identifying the Teleconnections with Climatic Patterns

NASA Astrophysics Data System (ADS)

Najafi, E.; Devineni, N.; Pal, I.; Khanbilvardi, R.

2017-12-01

An understanding of the climate factors that influence the space-time variability of crop yields is important for food security purposes and can help us predict global food availability. In this study, we address how the crop yield trends of countries globally were related to each other during the last several decades and the main climatic variables that triggered high/low crop yields simultaneously across the world. Robust Principal Component Analysis (rPCA) is used to identify the primary modes of variation in wheat, maize, sorghum, rice, soybeans, and barley yields. Relations between these modes of variability and important climatic variables, especially anomalous sea surface temperature (SSTa), are examined from 1964 to 2010. rPCA is also used to identify simultaneous outliers in each year, i.e. systematic high/low crop yields across the globe. The results demonstrated spatiotemporal patterns of these crop yields and the climate-related events that caused them as well as the connection of outliers with weather extremes. We find that among climatic variables, SST has had the most impact on creating simultaneous crop yields variability and yield outliers in many countries. An understanding of this phenomenon can benefit global crop trade networks.
Genomic signatures of positive selection in humans and the limits of outlier approaches.

PubMed

Kelley, Joanna L; Madeoy, Jennifer; Calhoun, John C; Swanson, Willie; Akey, Joshua M

2006-08-01

Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.
Detection of Anomalies in Hydrometric Data Using Artificial Intelligence Techniques

NASA Astrophysics Data System (ADS)

Lauzon, N.; Lence, B. J.

2002-12-01

This work focuses on the detection of anomalies in hydrometric data sequences, such as 1) outliers, which are individual data having statistical properties that differ from those of the overall population; 2) shifts, which are sudden changes over time in the statistical properties of the historical records of data; and 3) trends, which are systematic changes over time in the statistical properties. For the purpose of the design and management of water resources systems, it is important to be aware of these anomalies in hydrometric data, for they can induce a bias in the estimation of water quantity and quality parameters. These anomalies may be viewed as specific patterns affecting the data, and therefore pattern recognition techniques can be used for identifying them. However, the number of possible patterns is very large for each type of anomaly and consequently large computing capacities are required to account for all possibilities using the standard statistical techniques, such as cluster analysis. Artificial intelligence techniques, such as the Kohonen neural network and fuzzy c-means, are clustering techniques commonly used for pattern recognition in several areas of engineering and have recently begun to be used for the analysis of natural systems. They require much less computing capacity than the standard statistical techniques, and therefore are well suited for the identification of outliers, shifts and trends in hydrometric data. This work constitutes a preliminary study, using synthetic data representing hydrometric data that can be found in Canada. The analysis of the results obtained shows that the Kohonen neural network and fuzzy c-means are reasonably successful in identifying anomalies. This work also addresses the problem of uncertainties inherent to the calibration procedures that fit the clusters to the possible patterns for both the Kohonen neural network and fuzzy c-means. Indeed, for the same database, different sets of clusters can be established with these calibration procedures. A simple method for analyzing uncertainties associated with the Kohonen neural network and fuzzy c-means is developed here. The method combines the results from several sets of clusters, either from the Kohonen neural network or fuzzy c-means, so as to provide an overall diagnosis as to the identification of outliers, shifts and trends. The results indicate an improvement in the performance for identifying anomalies when the method of combining cluster sets is used, compared with when only one cluster set is used.

Toward the detection of abnormal chest radiographs the way radiologists do it

NASA Astrophysics Data System (ADS)

Alzubaidi, Mohammad; Patel, Ameet; Panchanathan, Sethuraman; Black, John A., Jr.

2011-03-01

Computer Aided Detection (CADe) and Computer Aided Diagnosis (CADx) are relatively recent areas of research that attempt to employ feature extraction, pattern recognition, and machine learning algorithms to aid radiologists in detecting and diagnosing abnormalities in medical images. However, these computational methods are based on the assumption that there are distinct classes of abnormalities, and that each class has some distinguishing features that set it apart from other classes. However, abnormalities in chest radiographs tend to be very heterogeneous. The literature suggests that thoracic (chest) radiologists develop their ability to detect abnormalities by developing a sense of what is normal, so that anything that is abnormal attracts their attention. This paper discusses an approach to CADe that is based on a technique called anomaly detection (which aims to detect outliers in data sets) for the purpose of detecting atypical regions in chest radiographs. However, in order to apply anomaly detection to chest radiographs, it is necessary to develop a basis for extracting features from corresponding anatomical locations in different chest radiographs. This paper proposes a method for doing this, and describes how it can be used to support CADe.
Crowdtruth validation: a new paradigm for validating algorithms that rely on image correspondences.

PubMed

Maier-Hein, Lena; Kondermann, Daniel; Roß, Tobias; Mersmann, Sven; Heim, Eric; Bodenstedt, Sebastian; Kenngott, Hannes Götz; Sanchez, Alexandro; Wagner, Martin; Preukschas, Anas; Wekerle, Anna-Laura; Helfert, Stefanie; März, Keno; Mehrabi, Arianeb; Speidel, Stefanie; Stock, Christian

2015-08-01

Feature tracking and 3D surface reconstruction are key enabling techniques to computer-assisted minimally invasive surgery. One of the major bottlenecks related to training and validation of new algorithms is the lack of large amounts of annotated images that fully capture the wide range of anatomical/scene variance in clinical practice. To address this issue, we propose a novel approach to obtaining large numbers of high-quality reference image annotations at low cost in an extremely short period of time. The concept is based on outsourcing the correspondence search to a crowd of anonymous users from an online community (crowdsourcing) and comprises four stages: (1) feature detection, (2) correspondence search via crowdsourcing, (3) merging multiple annotations per feature by fitting Gaussian finite mixture models, (4) outlier removal using the result of the clustering as input for a second annotation task. On average, 10,000 annotations were obtained within 24 h at a cost of $100. The annotation of the crowd after clustering and before outlier removal was of expert quality with a median distance of about 1 pixel to a publically available reference annotation. The threshold for the outlier removal task directly determines the maximum annotation error, but also the number of points removed. Our concept is a novel and effective method for fast, low-cost and highly accurate correspondence generation that could be adapted to various other applications related to large-scale data annotation in medical image computing and computer-assisted interventions.
Automatic EEG spike detection.

PubMed

Harner, Richard

2009-10-01

Since the 1970s advances in science and technology during each succeeding decade have renewed the expectation of efficient, reliable automatic epileptiform spike detection (AESD). But even when reinforced with better, faster tools, clinically reliable unsupervised spike detection remains beyond our reach. Expert-selected spike parameters were the first and still most widely used for AESD. Thresholds for amplitude, duration, sharpness, rise-time, fall-time, after-coming slow waves, background frequency, and more have been used. It is still unclear which of these wave parameters are essential, beyond peak-peak amplitude and duration. Wavelet parameters are very appropriate to AESD but need to be combined with other parameters to achieve desired levels of spike detection efficiency. Artificial Neural Network (ANN) and expert-system methods may have reached peak efficiency. Support Vector Machine (SVM) technology focuses on outliers rather than centroids of spike and nonspike data clusters and should improve AESD efficiency. An exemplary spike/nonspike database is suggested as a tool for assessing parameters and methods for AESD and is available in CSV or Matlab formats from the author at brainvue@gmail.com. Exploratory Data Analysis (EDA) is presented as a graphic method for finding better spike parameters and for the step-wise evaluation of the spike detection process.
Classification of SD-OCT volumes for DME detection: an anomaly detection approach

NASA Astrophysics Data System (ADS)

Sankar, S.; Sidibé, D.; Cheung, Y.; Wong, T. Y.; Lamoureux, E.; Milea, D.; Meriaudeau, F.

2016-03-01

Diabetic Macular Edema (DME) is the leading cause of blindness amongst diabetic patients worldwide. It is characterized by accumulation of water molecules in the macula leading to swelling. Early detection of the disease helps prevent further loss of vision. Naturally, automated detection of DME from Optical Coherence Tomography (OCT) volumes plays a key role. To this end, a pipeline for detecting DME diseases in OCT volumes is proposed in this paper. The method is based on anomaly detection using Gaussian Mixture Model (GMM). It starts with pre-processing the B-scans by resizing, flattening, filtering and extracting features from them. Both intensity and Local Binary Pattern (LBP) features are considered. The dimensionality of the extracted features is reduced using PCA. As the last stage, a GMM is fitted with features from normal volumes. During testing, features extracted from the test volume are evaluated with the fitted model for anomaly and classification is made based on the number of B-scans detected as outliers. The proposed method is tested on two OCT datasets achieving a sensitivity and a specificity of 80% and 93% on the first dataset, and 100% and 80% on the second one. Moreover, experiments show that the proposed method achieves better classification performances than other recently published works.
Stealthy false data injection attacks using matrix recovery and independent component analysis in smart grid

NASA Astrophysics Data System (ADS)

JiWei, Tian; BuHong, Wang; FuTe, Shang; Shuaiqi, Liu

2017-05-01

Exact state estimation is vital important to maintain common operations of smart grids. Existing researches demonstrate that state estimation output could be compromised by malicious attacks. However, to construct the attack vectors, a usual presumption in most works is that the attacker has perfect information regarding the topology and so on even such information is difficult to acquire in practice. Recent research shows that Independent Component Analysis (ICA) can be used for inferring topology information which can be used to originate undetectable attacks and even to alter the price of electricity for the profits of attackers. However, we found that the above ICA-based blind attack tactics is merely feasible in the environment with Gaussian noises. If there are outliers (device malfunction and communication errors), the Bad Data Detector will easily detect the attack. Hence, we propose a robust ICA based blind attack strategy that one can use matrix recovery to circumvent the outlier problem and construct stealthy attack vectors. The proposed attack strategies are tested with IEEE representative 14-bus system. Simulations verify the feasibility of the proposed method.
LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research

NASA Astrophysics Data System (ADS)

Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team

2011-01-01

The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.
A global method for identifying dependences between helio-geophysical and biological series by filtering the precedents (outliers)

NASA Astrophysics Data System (ADS)

Ozheredov, V. A.; Breus, T. K.; Gurfinkel, Yu. I.; Matveeva, T. A.

2014-12-01

A new approach to finding the dependence between heliophysical and meteorological factors and physiological parameters is considered that is based on the preliminary filtering of precedents (outliers). The sought-after dependence is masked by extraneous influences which cannot be taken into account. Therefore, the typically calculated correlation between the external-influence ( x) and physiology ( y) parameters is extremely low and does not allow their interdependence to be conclusively proved. A robust method for removing the precedents (outliers) from the database is proposed that is based on the intelligent sorting of the polynomial curves of possible dependences y( x), followed by filtering out the precedents which are far away from y( x) and optimizing the coefficient of nonlinear correlation between the regular, i.e., remaining, precedents. This optimization problem is shown to be a search for a maximum in the absence of the concept of gradient and requires the use of a genetic algorithm based on the Gray code. The relationships between the various medical and biological parameters and characteristics of the space and terrestrial weather are obtained and verified using the cross-validation method. It is proven that, by filtering out no more than 20% of precedents, it is possible to obtain a nonlinear correlation coefficient of no less than 0.5. A juxtaposition of the proposed method for filtering precedents (outliers) and the least-square method (LSM) for determining the optimal polynomial using multiple independent tests (Monte Carlo method) of models, which are as close as possible to real dependences, has shown that the LSM determination loses much in comparison to the proposed method.
Identification of suitable fundus images using automated quality assessment methods.

PubMed

Şevik, Uğur; Köse, Cemal; Berber, Tolga; Erdöl, Hidayet

2014-04-01

Retinal image quality assessment (IQA) is a crucial process for automated retinal image analysis systems to obtain an accurate and successful diagnosis of retinal diseases. Consequently, the first step in a good retinal image analysis system is measuring the quality of the input image. We present an approach for finding medically suitable retinal images for retinal diagnosis. We used a three-class grading system that consists of good, bad, and outlier classes. We created a retinal image quality dataset with a total of 216 consecutive images called the Diabetic Retinopathy Image Database. We identified the suitable images within the good images for automatic retinal image analysis systems using a novel method. Subsequently, we evaluated our retinal image suitability approach using the Digital Retinal Images for Vessel Extraction and Standard Diabetic Retinopathy Database Calibration level 1 public datasets. The results were measured through the F1 metric, which is a harmonic mean of precision and recall metrics. The highest F1 scores of the IQA tests were 99.60%, 96.50%, and 85.00% for good, bad, and outlier classes, respectively. Additionally, the accuracy of our suitable image detection approach was 98.08%. Our approach can be integrated into any automatic retinal analysis system with sufficient performance scores.
Detection and monitoring of emerald ash borer populations: trap trees and the factors that may influence their effectiveness

Treesearch

Andrew J. Storer; Jessica A. Metzger; Ivich Fraser; Deborah G. McCullough; Therese M. Poland; Robert L. Heyd

2007-01-01

The exotic emerald ash borer (EAB), Agrilus planipennis Fairmaire, was first identified in Michigan in 2002, though it had likely been established there for a number of years prior to detection. A key to management of EAB populations is the ability to detect this insect in order to accurately describe its distribution and to locate new outlier...
Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study.

PubMed

Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P

2014-06-26

To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.
A New Quality Control Method base on IRMCD for Wind Profiler Observation towards Future Assimilation Application

NASA Astrophysics Data System (ADS)

Chen, Min; Zhang, Yu

2017-04-01

A wind profiler network with a total of 65 profiling radars was operated by the MOC/CMA in China until July 2015. In this study, a quality control procedure is constructed to incorporate the profiler data from the wind-profiling network into the local data assimilation and forecasting system (BJRUC). The procedure applies a blacklisting check that removes stations with gross errors and an outlier check that rejects data with large deviations from the background. Instead of the bi-weighting method, which has been commonly implemented in outlier elimination for one-dimensional scalar observations, an outlier elimination method is developed based on the iterated reweighted minimum covariance determinant (IRMCD) for multi-variate observations such as wind profiler data. A quality control experiment is separately performed for subsets containing profiler data tagged in parallel with/without rain flags at every 00UTC/12UTC from 20 June to 30 Sep 2015. From the results, we find that with the quality control, the frequency distributions of the differences between the observations and model background become more Gaussian-like and meet the requirements of a Gaussian distribution for data assimilation. Further intensive assessment for each quality control step reveals that the stations rejected by blacklisting contain poor data quality, and the IRMCD rejects outliers in a robust and physically reasonable manner.
Toward Automated Generation of Reservoir Water Elevation Changes From Satellite Radar Altimetry.

NASA Astrophysics Data System (ADS)

Okeowo, M. A.; Lee, H.; Hossain, F.

2015-12-01

Until now, processing satellite radar altimetry data over inland water bodies on a large scale has been a cumbersome task primarily due to contaminated measurements from their surrounding topography. It becomes more challenging if the size of the water body is small and thus the number of available high-rate measurements from the water surface is limited. A manual removal of outliers is time consuming which limits a global generation of reservoir elevation profiles. This has limited a global study of lakes and reservoir elevation profiles for monitoring storage changes and hydrologic modeling. We have proposed a new method to automatically generate a time-series information from raw satellite radar altimetry without user intervention. With this method, scientist with little knowledge of altimetry can now independently process radar altimetry for diverse purposes. The method is based on K-means clustering, backscatter coefficient and statistical analysis of the dataset for outlier detection. The result of this method will be validated using in-situ gauges from US, Indus and Bangladesh reservoirs. In addition, a sensitivity analysis will be done to ascertain the limitations of this algorithm based on the surrounding topography, and the length of altimetry track overlap with the lake/reservoir. Finally, a reservoir storage change will be estimated on the study sites using MODIS and Landsat water classification for estimating the area of reservoir and the height will be estimated using Jason-2 and SARAL/Altika satellites.
Robust Curb Detection with Fusion of 3D-Lidar and Camera Data

PubMed Central

Tan, Jun; Li, Jian; An, Xiangjing; He, Hangen

2014-01-01

Curb detection is an essential component of Autonomous Land Vehicles (ALV), especially important for safe driving in urban environments. In this paper, we propose a fusion-based curb detection method through exploiting 3D-Lidar and camera data. More specifically, we first fuse the sparse 3D-Lidar points and high-resolution camera images together to recover a dense depth image of the captured scene. Based on the recovered dense depth image, we propose a filter-based method to estimate the normal direction within the image. Then, by using the multi-scale normal patterns based on the curb's geometric property, curb point features fitting the patterns are detected in the normal image row by row. After that, we construct a Markov Chain to model the consistency of curb points which utilizes the continuous property of the curb, and thus the optimal curb path which links the curb points together can be efficiently estimated by dynamic programming. Finally, we perform post-processing operations to filter the outliers, parameterize the curbs and give the confidence scores on the detected curbs. Extensive evaluations clearly show that our proposed method can detect curbs with strong robustness at real-time speed for both static and dynamic scenes. PMID:24854364
Application of surface enhanced Raman scattering and competitive adaptive reweighted sampling on detecting furfural dissolved in transformer oil

NASA Astrophysics Data System (ADS)

Chen, Weigen; Zou, Jingxin; Wan, Fu; Fan, Zhou; Yang, Dingkun

2018-03-01

Detecting the dissolving furfural in mineral oil is an essential technical method to evaluate the ageing condition of oil-paper insulation and the degradation of mechanical properties. Compared with the traditional detection method, Raman spectroscopy is obviously convenient and timesaving in operation. This study explored the method of applying surface enhanced Raman scattering (SERS) on quantitative analysis of the furfural dissolved in oil. Oil solution with different concentration of furfural were prepared and calibrated by high-performance liquid chromatography. Confocal laser Raman spectroscopy (CLRS) and SERS technology were employed to acquire Raman spectral data. Monte Carlo cross validation (MCCV) was used to eliminate the outliers in sample set, then competitive adaptive reweighted sampling (CARS) was developed to select an optimal combination of informative variables that most reflect the chemical properties of concern. Based on selected Raman spectral features, support vector machine (SVM) combined with particle swarm algorithm (PSO) was used to set up a furfural quantitative analysis model. Finally, the generalization ability and prediction precision of the established method were verified by the samples made in lab. In summary, a new spectral method is proposed to quickly detect furfural in oil, which lays a foundation for evaluating the ageing of oil-paper insulation in oil immersed electrical equipment.
A fast automatic target detection method for detecting ships in infrared scenes

NASA Astrophysics Data System (ADS)

Özertem, Kemal Arda

2016-05-01

Automatic target detection in infrared scenes is a vital task for many application areas like defense, security and border surveillance. For anti-ship missiles, having a fast and robust ship detection algorithm is crucial for overall system performance. In this paper, a straight-forward yet effective ship detection method for infrared scenes is introduced. First, morphological grayscale reconstruction is applied to the input image, followed by an automatic thresholding onto the suppressed image. For the segmentation step, connected component analysis is employed to obtain target candidate regions. At this point, it can be realized that the detection is defenseless to outliers like small objects with relatively high intensity values or the clouds. To deal with this drawback, a post-processing stage is introduced. For the post-processing stage, two different methods are used. First, noisy detection results are rejected with respect to target size. Second, the waterline is detected by using Hough transform and the detection results that are located above the waterline with a small margin are rejected. After post-processing stage, there are still undesired holes remaining, which cause to detect one object as multi objects or not to detect an object as a whole. To improve the detection performance, another automatic thresholding is implemented only to target candidate regions. Finally, two detection results are fused and post-processing stage is repeated to obtain final detection result. The performance of overall methodology is tested with real world infrared test data.
A computational study on outliers in world music.

PubMed

Panteli, Maria; Benetos, Emmanouil; Dixon, Simon

2017-01-01

The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as 'outliers'. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the 'uniqueness' of the music of each country.
Automated artifact detection and removal for improved tensor estimation in motion-corrupted DTI data sets using the combination of local binary patterns and 2D partial least squares.

PubMed

Zhou, Zhenyu; Liu, Wei; Cui, Jiali; Wang, Xunheng; Arias, Diana; Wen, Ying; Bansal, Ravi; Hao, Xuejun; Wang, Zhishun; Peterson, Bradley S; Xu, Dongrong

2011-02-01

Signal variation in diffusion-weighted images (DWIs) is influenced both by thermal noise and by spatially and temporally varying artifacts, such as rigid-body motion and cardiac pulsation. Motion artifacts are particularly prevalent when scanning difficult patient populations, such as human infants. Although some motion during data acquisition can be corrected using image coregistration procedures, frequently individual DWIs are corrupted beyond repair by sudden, large amplitude motion either within or outside of the imaging plane. We propose a novel approach to identify and reject outlier images automatically using local binary patterns (LBP) and 2D partial least square (2D-PLS) to estimate diffusion tensors robustly. This method uses an enhanced LBP algorithm to extract texture features from a local texture feature of the image matrix from the DWI data. Because the images have been transformed to local texture matrices, we are able to extract discriminating information that identifies outliers in the data set by extending a traditional one-dimensional PLS algorithm to a two-dimension operator. The class-membership matrix in this 2D-PLS algorithm is adapted to process samples that are image matrix, and the membership matrix thus represents varying degrees of importance of local information within the images. We also derive the analytic form of the generalized inverse of the class-membership matrix. We show that this method can effectively extract local features from brain images obtained from a large sample of human infants to identify images that are outliers in their textural features, permitting their exclusion from further processing when estimating tensors using the DWIs. This technique is shown to be superior in performance when compared with visual inspection and other common methods to address motion-related artifacts in DWI data. This technique is applicable to correct motion artifact in other magnetic resonance imaging (MRI) techniques (e.g., the bootstrapping estimation) that use univariate or multivariate regression methods to fit MRI data to a pre-specified model. Copyright © 2011 Elsevier Inc. All rights reserved.
The LSST Data Mining Research Agenda

NASA Astrophysics Data System (ADS)

Borne, K.; Becla, J.; Davidson, I.; Szalay, A.; Tyson, J. A.

2008-12-01

We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night) multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.
Robust nonlinear system identification: Bayesian mixture of experts using the t-distribution

NASA Astrophysics Data System (ADS)

Baldacchino, Tara; Worden, Keith; Rowson, Jennifer

2017-02-01

A novel variational Bayesian mixture of experts model for robust regression of bifurcating and piece-wise continuous processes is introduced. The mixture of experts model is a powerful model which probabilistically splits the input space allowing different models to operate in the separate regions. However, current methods have no fail-safe against outliers. In this paper, a robust mixture of experts model is proposed which consists of Student-t mixture models at the gates and Student-t distributed experts, trained via Bayesian inference. The Student-t distribution has heavier tails than the Gaussian distribution, and so it is more robust to outliers, noise and non-normality in the data. Using both simulated data and real data obtained from the Z24 bridge this robust mixture of experts performs better than its Gaussian counterpart when outliers are present. In particular, it provides robustness to outliers in two forms: unbiased parameter regression models, and robustness to overfitting/complex models.
Discovering Structural Regularity in 3D Geometry

PubMed Central

Pauly, Mark; Mitra, Niloy J.; Wallner, Johannes; Pottmann, Helmut; Guibas, Leonidas J.

2010-01-01

We introduce a computational framework for discovering regular or repeated geometric structures in 3D shapes. We describe and classify possible regular structures and present an effective algorithm for detecting such repeated geometric patterns in point- or mesh-based models. Our method assumes no prior knowledge of the geometry or spatial location of the individual elements that define the pattern. Structure discovery is made possible by a careful analysis of pairwise similarity transformations that reveals prominent lattice structures in a suitable model of transformation space. We introduce an optimization method for detecting such uniform grids specifically designed to deal with outliers and missing elements. This yields a robust algorithm that successfully discovers complex regular structures amidst clutter, noise, and missing geometry. The accuracy of the extracted generating transformations is further improved using a novel simultaneous registration method in the spatial domain. We demonstrate the effectiveness of our algorithm on a variety of examples and show applications to compression, model repair, and geometry synthesis. PMID:21170292

Reduction of ZTD outliers through improved GNSS data processing and screening strategies

NASA Astrophysics Data System (ADS)

Stepniak, Katarzyna; Bock, Olivier; Wielgosz, Pawel

2018-03-01

Though Global Navigation Satellite System (GNSS) data processing has been significantly improved over the years, it is still commonly observed that zenith tropospheric delay (ZTD) estimates contain many outliers which are detrimental to meteorological and climatological applications. In this paper, we show that ZTD outliers in double-difference processing are mostly caused by sub-daily data gaps at reference stations, which cause disconnections of clusters of stations from the reference network and common mode biases due to the strong correlation between stations in short baselines. They can reach a few centimetres in ZTD and usually coincide with a jump in formal errors. The magnitude and sign of these biases are impossible to predict because they depend on different errors in the observations and on the geometry of the baselines. We elaborate and test a new baseline strategy which solves this problem and significantly reduces the number of outliers compared to the standard strategy commonly used for positioning (e.g. determination of national reference frame) in which the pre-defined network is composed of a skeleton of reference stations to which secondary stations are connected in a star-like structure. The new strategy is also shown to perform better than the widely used strategy maximizing the number of observations available in many GNSS programs. The reason is that observations are maximized before processing, whereas the final number of used observations can be dramatically lower because of data rejection (screening) during the processing. The study relies on the analysis of 1 year of GPS (Global Positioning System) data from a regional network of 136 GNSS stations processed using Bernese GNSS Software v.5.2. A post-processing screening procedure is also proposed to detect and remove a few outliers which may still remain due to short data gaps. It is based on a combination of range checks and outlier checks of ZTD and formal errors. The accuracy of the final screened GPS ZTD estimates is assessed by comparison to ERA-Interim reanalysis.
Prospective clinical validation of independent DVH prediction for plan QA in automatic treatment planning for prostate cancer patients.

PubMed

Wang, Yibing; Heijmen, Ben J M; Petit, Steven F

2017-12-01

To prospectively investigate the use of an independent DVH prediction tool to detect outliers in the quality of fully automatically generated treatment plans for prostate cancer patients. A plan QA tool was developed to predict rectum, anus and bladder DVHs, based on overlap volume histograms and principal component analysis (PCA). The tool was trained with 22 automatically generated, clinical plans, and independently validated with 21 plans. Its use was prospectively investigated for 50 new plans by replanning in case of detected outliers. For rectum D mean , V 65Gy , V 75Gy , anus D mean , and bladder D mean , the difference between predicted and achieved was within 0.4 Gy or 0.3% (SD within 1.8 Gy or 1.3%). Thirteen detected outliers were re-planned, leading to moderate but statistically significant improvements (mean, max): rectum D mean (1.3 Gy, 3.4 Gy), V 65Gy (2.7%, 4.2%), anus D mean (1.6 Gy, 6.9 Gy), and bladder D mean (1.5 Gy, 5.1 Gy). The rectum V 75Gy of the new plans slightly increased (0.2%, p = 0.087). A high accuracy DVH prediction tool was developed and used for independent QA of automatically generated plans. In 28% of plans, minor dosimetric deviations were observed that could be improved by plan adjustments. Larger gains are expected for manually generated plans. Copyright © 2017 Elsevier B.V. All rights reserved.
Prospective casemix-based funding, analysis and financial impact of cost outliers in all-patient refined diagnosis related groups in three Belgian general hospitals.

PubMed

Pirson, Magali; Martins, Dimitri; Jackson, Terri; Dramaix, Michèle; Leclercq, Pol

2006-03-01

This study examined the impact of cost outliers in term of hospital resources consumption, the financial impact of the outliers under the Belgium casemix-based system, and the validity of two "proxies" for costs: length of stay and charges. The cost of all hospital stays at three Belgian general hospitals were calculated for the year 2001. High resource use outliers were selected according to the following rule: 75th percentile +1.5 xinter-quartile range. The frequency of cost outliers varied from 7% to 8% across hospitals. Explanatory factors were: major or extreme severity of illness, longer length of stay, and intensive care unit stay. Cost outliers account for 22-30% of hospital costs. One-third of length-of-stay outliers are not cost outliers, and nearly one-quarter of charges outliers are not cost outliers. The current funding system in Belgium does not penalize hospitals having a high percentage of outliers. The billing generated by these patients largely compensates for costs generated. Length of stay and charges are not a good approximation to select cost outliers.
A subagging regression method for estimating the qualitative and quantitative state of groundwater

NASA Astrophysics Data System (ADS)

Jeong, J.; Park, E.; Choi, J.; Han, W. S.; Yun, S. T.

2016-12-01

A subagging regression (SBR) method for the analysis of groundwater data pertaining to the estimation of trend and the associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of the other methods and the uncertainties are reasonably estimated where the others have no uncertainty analysis option. To validate further, real quantitative and qualitative data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by SBR, whereas the GPR has limitations in representing the variability of non-Gaussian skewed data. From the implementations, it is determined that the SBR method has potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data.
Efficient robust doubly adaptive regularized regression with applications.

PubMed

Karunamuni, Rohana J; Kong, Linglong; Tu, Wei

2018-01-01

We consider the problem of estimation and variable selection for general linear regression models. Regularized regression procedures have been widely used for variable selection, but most existing methods perform poorly in the presence of outliers. We construct a new penalized procedure that simultaneously attains full efficiency and maximum robustness. Furthermore, the proposed procedure satisfies the oracle properties. The new procedure is designed to achieve sparse and robust solutions by imposing adaptive weights on both the decision loss and the penalty function. The proposed method of estimation and variable selection attains full efficiency when the model is correct and, at the same time, achieves maximum robustness when outliers are present. We examine the robustness properties using the finite-sample breakdown point and an influence function. We show that the proposed estimator attains the maximum breakdown point. Furthermore, there is no loss in efficiency when there are no outliers or the error distribution is normal. For practical implementation of the proposed method, we present a computational algorithm. We examine the finite-sample and robustness properties using Monte Carlo studies. Two datasets are also analyzed.
Unsupervised Outlier Profile Analysis

PubMed Central

Ghosh, Debashis; Li, Song

2014-01-01

In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we explore the use of C(α) tests, which have been applied in other genomic data settings. Their use for the outlier expression problem, in particular with continuous data, is problematic but nevertheless motivates new statistics that give an unsupervised analog to previously developed outlier profile analysis approaches. Some simulation studies are used to evaluate the proposal. A bivariate extension is described that can accommodate data from two platforms on matched samples. The proposed methods are applied to data from a prostate cancer study. PMID:25452686
Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood

PubMed Central

Bondell, Howard D.; Stefanski, Leonard A.

2013-01-01

Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805
Robustness of fit indices to outliers and leverage observations in structural equation modeling.

PubMed

Yuan, Ke-Hai; Zhong, Xiaoling

2013-06-01

Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data.

PubMed

Song, Hongchao; Jiang, Zhuqing; Men, Aidong; Yang, Bo

2017-01-01

Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k -nearest neighbor graphs- ( K -NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.
A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

PubMed Central

Jiang, Zhuqing; Men, Aidong; Yang, Bo

2017-01-01

Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity. PMID:29270197
DARHT Multi-intelligence Seismic and Acoustic Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stevens, Garrison Nicole; Van Buren, Kendra Lu; Hemez, Francois M.

The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, andmore » finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where uncertainty is added to the system through noise in the measurements.« less
Evaluation schemes for video and image anomaly detection algorithms

NASA Astrophysics Data System (ADS)

Parameswaran, Shibin; Harguess, Josh; Barngrover, Christopher; Shafer, Scott; Reese, Michael

2016-05-01

Video anomaly detection is a critical research area in computer vision. It is a natural first step before applying object recognition algorithms. There are many algorithms that detect anomalies (outliers) in videos and images that have been introduced in recent years. However, these algorithms behave and perform differently based on differences in domains and tasks to which they are subjected. In order to better understand the strengths and weaknesses of outlier algorithms and their applicability in a particular domain/task of interest, it is important to measure and quantify their performance using appropriate evaluation metrics. There are many evaluation metrics that have been used in the literature such as precision curves, precision-recall curves, and receiver operating characteristic (ROC) curves. In order to construct these different metrics, it is also important to choose an appropriate evaluation scheme that decides when a proposed detection is considered a true or a false detection. Choosing the right evaluation metric and the right scheme is very critical since the choice can introduce positive or negative bias in the measuring criterion and may favor (or work against) a particular algorithm or task. In this paper, we review evaluation metrics and popular evaluation schemes that are used to measure the performance of anomaly detection algorithms on videos and imagery with one or more anomalies. We analyze the biases introduced by these by measuring the performance of an existing anomaly detection algorithm.
Guidelines for determining flood flow frequency—Bulletin 17C

USGS Publications Warehouse

England, John F.; Cohn, Timothy A.; Faber, Beth A.; Stedinger, Jery R.; Thomas, Wilbert O.; Veilleux, Andrea G.; Kiang, Julie E.; Mason, Robert R.

2018-03-29

Accurate estimates of flood frequency and magnitude are a key component of any effective nationwide flood risk management and flood damage abatement program. In addition to accuracy, methods for estimating flood risk must be uniformly and consistently applied because management of the Nation’s water and related land resources is a collaborative effort involving multiple actors including most levels of government and the private sector.Flood frequency guidelines have been published in the United States since 1967, and have undergone periodic revisions. In 1967, the U.S. Water Resources Council presented a coherent approach to flood frequency with Bulletin 15, “A Uniform Technique for Determining Flood Flow Frequencies.” The method it recommended involved fitting the log-Pearson Type III distribution to annual peak flow data by the method of moments.The first extension and update of Bulletin 15 was published in 1976 as Bulletin 17, “Guidelines for Determining Flood Flow Frequency” (Guidelines). It extended the Bulletin 15 procedures by introducing methods for dealing with outliers, historical flood information, and regional skew. Bulletin 17A was published the following year to clarify the computation of weighted skew. The next revision of the Bulletin, the Bulletin 17B, provided a host of improvements and new techniques designed to address situations that often arise in practice, including better methods for estimating and using regional skew, weighting station and regional skew, detection of outliers, and use of the conditional probability adjustment.The current version of these Guidelines are presented in this document, denoted Bulletin 17C. It incorporates changes motivated by four of the items listed as “Future Work” in Bulletin 17B and 30 years of post-17B research on flood processes and statistical methods. The updates include: adoption of a generalized representation of flood data that allows for interval and censored data types; a new method, called the Expected Moments Algorithm, which extends the method of moments so that it can accommodate interval data; a generalized approach to identification of low outliers in flood data; and an improved method for computing confidence intervals.Federal agencies are requested to use these Guidelines in all planning activities involving water and related land resources. State, local, and private organizations are encouraged to use these Guidelines to assure uniformity in the flood frequency estimates that all agencies concerned with flood risk should use for Federal planning decisions.This revision is adopted with the knowledge and understanding that review of these procedures will be ongoing. Updated methods will be adopted when warranted by experience and by examination and testing of new techniques.
Detection of material property errors in handbooks and databases using artificial neural networks with hidden correlations

NASA Astrophysics Data System (ADS)

Zhang, Y. M.; Evans, J. R. G.; Yang, S. F.

2010-11-01

The authors have discovered a systematic, intelligent and potentially automatic method to detect errors in handbooks and stop their transmission using unrecognised relationships between materials properties. The scientific community relies on the veracity of scientific data in handbooks and databases, some of which have a long pedigree covering several decades. Although various outlier-detection procedures are employed to detect and, where appropriate, remove contaminated data, errors, which had not been discovered by established methods, were easily detected by our artificial neural network in tables of properties of the elements. We started using neural networks to discover unrecognised relationships between materials properties and quickly found that they were very good at finding inconsistencies in groups of data. They reveal variations from 10 to 900% in tables of property data for the elements and point out those that are most probably correct. Compared with the statistical method adopted by Ashby and co-workers [Proc. R. Soc. Lond. Ser. A 454 (1998) p. 1301, 1323], this method locates more inconsistencies and could be embedded in database software for automatic self-checking. We anticipate that our suggestion will be a starting point to deal with this basic problem that affects researchers in every field. The authors believe it may eventually moderate the current expectation that data field error rates will persist at between 1 and 5%.
Genetic Algorithm for Initial Orbit Determination with Too Short Arc (Continued)

NASA Astrophysics Data System (ADS)

Li, X. R.; Wang, X.

2016-03-01

When using the genetic algorithm to solve the problem of too-short-arc (TSA) determination, due to the difference of computing processes between the genetic algorithm and classical method, the methods for outliers editing are no longer applicable. In the genetic algorithm, the robust estimation is acquired by means of using different loss functions in the fitness function, then the outlier problem of TSAs is solved. Compared with the classical method, the application of loss functions in the genetic algorithm is greatly simplified. Through the comparison of results of different loss functions, it is clear that the methods of least median square and least trimmed square can greatly improve the robustness of TSAs, and have a high breakdown point.
Feature selection for examining behavior by pathology laboratories.

PubMed

Hawkins, S; Williams, G; Baxter, R

2001-08-01

Australia has a universal health insurance scheme called Medicare, which is managed by Australia's Health Insurance Commission. Medicare payments for pathology services generate voluminous transaction data on patients, doctors and pathology laboratories. The Health Insurance Commission (HIC) currently uses predictive models to monitor compliance with regulatory requirements. The HIC commissioned a project to investigate the generation of new features from the data. Feature generation has not appeared as an important step in the knowledge discovery in databases (KDD) literature. New interesting features for use in predictive modeling are generated. These features were summarized, visualized and used as inputs for clustering and outlier detection methods. Data organization and data transformation methods are described for the efficient access and manipulation of these new features.
Asynchronous P300 classification in a reactive brain-computer interface during an outlier detection task

NASA Astrophysics Data System (ADS)

Krumpe, Tanja; Walter, Carina; Rosenstiel, Wolfgang; Spüler, Martin

2016-08-01

Objective. In this study, the feasibility of detecting a P300 via an asynchronous classification mode in a reactive EEG-based brain-computer interface (BCI) was evaluated. The P300 is one of the most popular BCI control signals and therefore used in many applications, mostly for active communication purposes (e.g. P300 speller). As the majority of all systems work with a stimulus-locked mode of classification (synchronous), the field of applications is limited. A new approach needs to be applied in a setting in which a stimulus-locked classification cannot be used due to the fact that the presented stimuli cannot be controlled or predicted by the system. Approach. A continuous observation task requiring the detection of outliers was implemented to test such an approach. The study was divided into an offline and an online part. Main results. Both parts of the study revealed that an asynchronous detection of the P300 can successfully be used to detect single events with high specificity. It also revealed that no significant difference in performance was found between the synchronous and the asynchronous approach. Significance. The results encourage the use of an asynchronous classification approach in suitable applications without a potential loss in performance.
Signatures of positive selection and local adaptation to urbanization in white-footed mice (Peromyscus leucopus).

PubMed

Harris, Stephen E; Munshi-South, Jason

2017-11-01

Urbanization significantly alters natural ecosystems and has accelerated globally. Urban wildlife populations are often highly fragmented by human infrastructure, and isolated populations may adapt in response to local urban pressures. However, relatively few studies have identified genomic signatures of adaptation in urban animals. We used a landscape genomic approach to examine signatures of selection in urban populations of white-footed mice (Peromyscus leucopus) in New York City. We analysed 154,770 SNPs identified from transcriptome data from 48 P. leucopus individuals from three urban and three rural populations and used outlier tests to identify evidence of urban adaptation. We accounted for demography by simulating a neutral SNP data set under an inferred demographic history as a null model for outlier analysis. We also tested whether candidate genes were associated with environmental variables related to urbanization. In total, we detected 381 outlier loci and after stringent filtering, identified and annotated 19 candidate loci. Many of the candidate genes were involved in metabolic processes and have well-established roles in metabolizing lipids and carbohydrates. Our results indicate that white-footed mice in New York City are adapting at the biomolecular level to local selective pressures in urban habitats. Annotation of outlier loci suggests selection is acting on metabolic pathways in urban populations, likely related to novel diets in cities that differ from diets in less disturbed areas. © 2017 John Wiley & Sons Ltd.
Identification of piecewise affine systems based on fuzzy PCA-guided robust clustering technique

NASA Astrophysics Data System (ADS)

Khanmirza, Esmaeel; Nazarahari, Milad; Mousavi, Alireza

2016-12-01

Hybrid systems are a class of dynamical systems whose behaviors are based on the interaction between discrete and continuous dynamical behaviors. Since a general method for the analysis of hybrid systems is not available, some researchers have focused on specific types of hybrid systems. Piecewise affine (PWA) systems are one of the subsets of hybrid systems. The identification of PWA systems includes the estimation of the parameters of affine subsystems and the coefficients of the hyperplanes defining the partition of the state-input domain. In this paper, we have proposed a PWA identification approach based on a modified clustering technique. By using a fuzzy PCA-guided robust k-means clustering algorithm along with neighborhood outlier detection, the two main drawbacks of the well-known clustering algorithms, i.e., the poor initialization and the presence of outliers, are eliminated. Furthermore, this modified clustering technique enables us to determine the number of subsystems without any prior knowledge about system. In addition, applying the structure of the state-input domain, that is, considering the time sequence of input-output pairs, provides a more efficient clustering algorithm, which is the other novelty of this work. Finally, the proposed algorithm has been evaluated by parameter identification of an IGV servo actuator. Simulation together with experiment analysis has proved the effectiveness of the proposed method.
A SVM-based quantitative fMRI method for resting-state functional network detection.

PubMed

Song, Xiaomu; Chen, Nan-kuei

2014-09-01

Resting-state functional magnetic resonance imaging (fMRI) aims to measure baseline neuronal connectivity independent of specific functional tasks and to capture changes in the connectivity due to neurological diseases. Most existing network detection methods rely on a fixed threshold to identify functionally connected voxels under the resting state. Due to fMRI non-stationarity, the threshold cannot adapt to variation of data characteristics across sessions and subjects, and generates unreliable mapping results. In this study, a new method is presented for resting-state fMRI data analysis. Specifically, the resting-state network mapping is formulated as an outlier detection process that is implemented using one-class support vector machine (SVM). The results are refined by using a spatial-feature domain prototype selection method and two-class SVM reclassification. The final decision on each voxel is made by comparing its probabilities of functionally connected and unconnected instead of a threshold. Multiple features for resting-state analysis were extracted and examined using an SVM-based feature selection method, and the most representative features were identified. The proposed method was evaluated using synthetic and experimental fMRI data. A comparison study was also performed with independent component analysis (ICA) and correlation analysis. The experimental results show that the proposed method can provide comparable or better network detection performance than ICA and correlation analysis. The method is potentially applicable to various resting-state quantitative fMRI studies. Copyright © 2014 Elsevier Inc. All rights reserved.

Advanced Tie Feature Matching for the Registration of Mobile Mapping Imaging Data and Aerial Imagery

NASA Astrophysics Data System (ADS)

Jende, P.; Peter, M.; Gerke, M.; Vosselman, G.

2016-06-01

Mobile Mapping's ability to acquire high-resolution ground data is opposing unreliable localisation capabilities of satellite-based positioning systems in urban areas. Buildings shape canyons impeding a direct line-of-sight to navigation satellites resulting in a deficiency to accurately estimate the mobile platform's position. Consequently, acquired data products' positioning quality is considerably diminished. This issue has been widely addressed in the literature and research projects. However, a consistent compliance of sub-decimetre accuracy as well as a correction of errors in height remain unsolved. We propose a novel approach to enhance Mobile Mapping (MM) image orientation based on the utilisation of highly accurate orientation parameters derived from aerial imagery. In addition to that, the diminished exterior orientation parameters of the MM platform will be utilised as they enable the application of accurate matching techniques needed to derive reliable tie information. This tie information will then be used within an adjustment solution to correct affected MM data. This paper presents an advanced feature matching procedure as a prerequisite to the aforementioned orientation update. MM data is ortho-projected to gain a higher resemblance to aerial nadir data simplifying the images' geometry for matching. By utilising MM exterior orientation parameters, search windows may be used in conjunction with a selective keypoint detection and template matching. Originating from different sensor systems, however, difficulties arise with respect to changes in illumination, radiometry and a different original perspective. To respond to these challenges for feature detection, the procedure relies on detecting keypoints in only one image. Initial tests indicate a considerable improvement in comparison to classic detector/descriptor approaches in this particular matching scenario. This method leads to a significant reduction of outliers due to the limited availability of putative matches and the utilisation of templates instead of feature descriptors. In our experiments discussed in this paper, typical urban scenes have been used for evaluating the proposed method. Even though no additional outlier removal techniques have been used, our method yields almost 90% of correct correspondences. However, repetitive image patterns may still induce ambiguities which cannot be fully averted by this technique. Hence and besides, possible advancements will be briefly presented.
The Effect of Ionospheric Models on Electromagnetic Pulse Locations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fenimore, Edward E.; Triplett, Laurie A.

2014-07-01

Locations of electromagnetic pulses (EMPs) determined by time-of-arrival (TOA) often have outliers with significantly larger errors than expected. In the past, these errors were thought to arise from high order terms in the Appleton-Hartree equation. We simulated 1000 events randomly spread around the Earth into a constellation of 22 GPS satellites. We used four different ionospheres: “simple” where the time delay goes as the inverse of the frequency-squared, “full Appleton-Hartree”, the “BobRD integrals” and a full raytracing code. The simple and full Appleton-Hartree ionospheres do not show outliers whereas the BobRD and raytracing do. This strongly suggests that the causemore » of the outliers is not additional terms in the Appleton-Hartree equation, but rather is due to the additional path length due to refraction. A method to fix the outliers is suggested based on fitting a time to the delays calculated at the 5 GPS frequencies with BobRD and simple ionospheres. The difference in time is used as a correction to the TOAs.« less
Data Mining for Anomaly Detection

NASA Technical Reports Server (NTRS)

Biswas, Gautam; Mack, Daniel; Mylaraswamy, Dinkar; Bharadwaj, Raj

2013-01-01

The Vehicle Integrated Prognostics Reasoner (VIPR) program describes methods for enhanced diagnostics as well as a prognostic extension to current state of art Aircraft Diagnostic and Maintenance System (ADMS). VIPR introduced a new anomaly detection function for discovering previously undetected and undocumented situations, where there are clear deviations from nominal behavior. Once a baseline (nominal model of operations) is established, the detection and analysis is split between on-aircraft outlier generation and off-aircraft expert analysis to characterize and classify events that may not have been anticipated by individual system providers. Offline expert analysis is supported by data curation and data mining algorithms that can be applied in the contexts of supervised learning methods and unsupervised learning. In this report, we discuss efficient methods to implement the Kolmogorov complexity measure using compression algorithms, and run a systematic empirical analysis to determine the best compression measure. Our experiments established that the combination of the DZIP compression algorithm and CiDM distance measure provides the best results for capturing relevant properties of time series data encountered in aircraft operations. This combination was used as the basis for developing an unsupervised learning algorithm to define "nominal" flight segments using historical flight segments.
Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York

PubMed Central

Goovaerts, Pierre; Jacquez, Geoffrey M

2004-01-01

Background Complete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses. PMID:15272930
TrigDB for improving the reliability of the epicenter locations by considering the neighborhood station's trigger and cutting out of outliers in operation of Earthquake Early Warning System.

NASA Astrophysics Data System (ADS)

Chi, H. C.; Park, J. H.; Lim, I. S.; Seong, Y. J.

2016-12-01

TrigDB is initially developed for the discrimination of teleseismic-origin false alarm in the case with unreasonably associated triggers producing mis-located epicenters. We have applied TrigDB to the current EEWS(Earthquake Early Warning System) from 2014. During the early stage of testing EEWS from 2011, we adapted ElarmS from US Berkeley BSL to Korean seismic network and applied more than 5 years. We found out that the real-time testing results of EEWS in Korea showed that all events inside of seismic network with bigger than magnitude 3.0 were well detected. However, two events located at sea area gave false location results with magnitude over 4.0 due to the long period and relatively high amplitude signals related to the teleseismic waves or regional deep sources. These teleseismic-relevant false events were caused by logical co-relation during association procedure and the corresponding geometric distribution of associated stations is crescent-shaped. Seismic stations are not deployed uniformly, so the expected bias ratio varies with evaluated epicentral location. This ratio is calculated in advance and stored into database, called as TrigDB, for the discrimination of teleseismic-origin false alarm. We upgraded this method, so called `TrigDB back filling', updating location with supplementary association of stations comparing triggered times between sandwiched stations which was not associated previously based on predefined criteria such as travel-time. And we have tested a module to reject outlier trigger times by setting a criteria comparing statistical values(Sigma) to the triggered times. The criteria of cutting off the outlier is slightly slow to work until the number of stations more than 8, however, the result of location is very much improved.
Estimating multivariate response surface model with data outliers, case study in enhancing surface layer properties of an aircraft aluminium alloy

NASA Astrophysics Data System (ADS)

Widodo, Edy; Kariyam

2017-03-01

To determine the input variable settings that create the optimal compromise in response variable used Response Surface Methodology (RSM). There are three primary steps in the RSM problem, namely data collection, modelling, and optimization. In this study focused on the establishment of response surface models, using the assumption that the data produced is correct. Usually the response surface model parameters are estimated by OLS. However, this method is highly sensitive to outliers. Outliers can generate substantial residual and often affect the estimator models. Estimator models produced can be biased and could lead to errors in the determination of the optimal point of fact, that the main purpose of RSM is not reached. Meanwhile, in real life, the collected data often contain some response variable and a set of independent variables. Treat each response separately and apply a single response procedures can result in the wrong interpretation. So we need a development model for the multi-response case. Therefore, it takes a multivariate model of the response surface that is resistant to outliers. As an alternative, in this study discussed on M-estimation as a parameter estimator in multivariate response surface models containing outliers. As an illustration presented a case study on the experimental results to the enhancement of the surface layer of aluminium alloy air by shot peening.
Automated brainstem co-registration (ABC) for MRI.

PubMed

Napadow, Vitaly; Dhond, Rupali; Kennedy, David; Hui, Kathleen K S; Makris, Nikos

2006-09-01

Group data analysis in brainstem neuroimaging is predicated on accurate co-registration of anatomy. As the brainstem is comprised of many functionally heterogeneous nuclei densely situated adjacent to one another, relatively small errors in co-registration can manifest in increased variance or decreased sensitivity (or significance) in detecting activations. We have devised a 2-stage automated, reference mask guided registration technique (Automated Brainstem Co-registration, or ABC) for improved brainstem co-registration. Our approach utilized a brainstem mask dataset to weight an automated co-registration cost function. Our method was validated through measurement of RMS error at 12 manually defined landmarks. These landmarks were also used as guides for a secondary manual co-registration option, intended for outlier individuals that may not adequately co-register with our automated method. Our methodology was tested on 10 healthy human subjects and compared to traditional co-registration techniques (Talairach transform and automated affine transform to the MNI-152 template). We found that ABC had a significantly lower mean RMS error (1.22 +/- 0.39 mm) than Talairach transform (2.88 +/- 1.22 mm, mu +/- sigma) and the global affine (3.26 +/- 0.81 mm) method. Improved accuracy was also found for our manual-landmark-guided option (1.51 +/- 0.43 mm). Visualizing individual brainstem borders demonstrated more consistent and uniform overlap for ABC compared to traditional global co-registration techniques. Improved robustness (lower susceptibility to outliers) was demonstrated with ABC through lower inter-subject RMS error variance compared with traditional co-registration methods. The use of easily available and validated tools (AFNI and FSL) for this method should ease adoption by other investigators interested in brainstem data group analysis.
Dysmorphometrics: the modelling of morphological abnormalities.

PubMed

Claes, Peter; Daniels, Katleen; Walters, Mark; Clement, John; Vandermeulen, Dirk; Suetens, Paul

2012-02-06

The study of typical morphological variations using quantitative, morphometric descriptors has always interested biologists in general. However, unusual examples of form, such as abnormalities are often encountered in biomedical sciences. Despite the long history of morphometrics, the means to identify and quantify such unusual form differences remains limited. A theoretical concept, called dysmorphometrics, is introduced augmenting current geometric morphometrics with a focus on identifying and modelling form abnormalities. Dysmorphometrics applies the paradigm of detecting form differences as outliers compared to an appropriate norm. To achieve this, the likelihood formulation of landmark superimpositions is extended with outlier processes explicitly introducing a latent variable coding for abnormalities. A tractable solution to this augmented superimposition problem is obtained using Expectation-Maximization. The topography of detected abnormalities is encoded in a dysmorphogram. We demonstrate the use of dysmorphometrics to measure abrupt changes in time, asymmetry and discordancy in a set of human faces presenting with facial abnormalities. The results clearly illustrate the unique power to reveal unusual form differences given only normative data with clear applications in both biomedical practice & research.
Simultaneous estimation of cross-validation errors in least squares collocation applied for statistical testing and evaluation of the noise variance components

NASA Astrophysics Data System (ADS)

Behnabian, Behzad; Mashhadi Hossainali, Masoud; Malekzadeh, Ahad

2018-02-01

The cross-validation technique is a popular method to assess and improve the quality of prediction by least squares collocation (LSC). We present a formula for direct estimation of the vector of cross-validation errors (CVEs) in LSC which is much faster than element-wise CVE computation. We show that a quadratic form of CVEs follows Chi-squared distribution. Furthermore, a posteriori noise variance factor is derived by the quadratic form of CVEs. In order to detect blunders in the observations, estimated standardized CVE is proposed as the test statistic which can be applied when noise variances are known or unknown. We use LSC together with the methods proposed in this research for interpolation of crustal subsidence in the northern coast of the Gulf of Mexico. The results show that after detection and removing outliers, the root mean square (RMS) of CVEs and estimated noise standard deviation are reduced about 51 and 59%, respectively. In addition, RMS of LSC prediction error at data points and RMS of estimated noise of observations are decreased by 39 and 67%, respectively. However, RMS of LSC prediction error on a regular grid of interpolation points covering the area is only reduced about 4% which is a consequence of sparse distribution of data points for this case study. The influence of gross errors on LSC prediction results is also investigated by lower cutoff CVEs. It is indicated that after elimination of outliers, RMS of this type of errors is also reduced by 19.5% for a 5 km radius of vicinity. We propose a method using standardized CVEs for classification of dataset into three groups with presumed different noise variances. The noise variance components for each of the groups are estimated using restricted maximum-likelihood method via Fisher scoring technique. Finally, LSC assessment measures were computed for the estimated heterogeneous noise variance model and compared with those of the homogeneous model. The advantage of the proposed method is the reduction in estimated noise levels for those groups with the fewer number of noisy data points.
Oximeter for reliable clinical determination of blood oxygen saturation in a fetus

DOEpatents

Robinson, Mark R.; Haaland, David M.; Ward, Kenneth J.

1996-01-01

With the crude instrumentation now in use to continuously monitor the status of the fetus at delivery, the obstetrician and labor room staff not only over-recognize the possibility of fetal distress with the resultant rise in operative deliveries, but at times do not identify fetal distress which may result in preventable fetal neurological harm. The invention, which addresses these two basic problems, comprises a method and apparatus for non-invasive determination of blood oxygen saturation in the fetus. The apparatus includes a multiple frequency light source which is coupled to an optical fiber. The output of the fiber is used to illuminate blood containing tissue of the fetus. In the preferred embodiment, the reflected light is transmitted back to the apparatus where the light intensities are simultaneously detected at multiple frequencies. The resulting spectrum is then analyzed for determination of oxygen saturation. The analysis method uses multivariate calibration techniques that compensate for nonlinear spectral response, model interfering spectral responses and detect outlier data with high sensitivity.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jordan, Dirk C.; Deceglie, Michael G.; Kurtz, Sarah R.

What is the best method to determine long-term PV system performance and degradation rates? Ideally, one universally applicable methodology would be desirable so that a single number could be derived. However, data sets vary in their attributes and evidence is presented that defining two methodologies may be preferable. Monte Carlo simulations of artificial performance data allowed investigation of different methodologies and their respective confidence intervals. Tradeoffs between different approaches were delineated, elucidating as to why two separate approaches may need to be included in a standard. Regression approaches tend to be preferable when data sets are less contaminated by seasonality,more » noise and occurrence of outliers although robust regression can significantly improve the accuracy when outliers are present. In the presence of outliers, marked seasonality, or strong soiling events, year-on-year approaches tend to outperform regression approaches.« less
Robust PLS approach for KPI-related prediction and diagnosis against outliers and missing data

NASA Astrophysics Data System (ADS)

Yin, Shen; Wang, Guang; Yang, Xu

2014-07-01

In practical industrial applications, the key performance indicator (KPI)-related prediction and diagnosis are quite important for the product quality and economic benefits. To meet these requirements, many advanced prediction and monitoring approaches have been developed which can be classified into model-based or data-driven techniques. Among these approaches, partial least squares (PLS) is one of the most popular data-driven methods due to its simplicity and easy implementation in large-scale industrial process. As PLS is totally based on the measured process data, the characteristics of the process data are critical for the success of PLS. Outliers and missing values are two common characteristics of the measured data which can severely affect the effectiveness of PLS. To ensure the applicability of PLS in practical industrial applications, this paper introduces a robust version of PLS to deal with outliers and missing values, simultaneously. The effectiveness of the proposed method is finally demonstrated by the application results of the KPI-related prediction and diagnosis on an industrial benchmark of Tennessee Eastman process.
42 CFR 413.237 - Outliers.

Code of Federal Regulations, 2010 CFR

2010-10-01

...-only drugs effective January 1, 2014. (2) Adult predicted ESRD outlier services Medicare allowable... furnished to an adult beneficiary by an ESRD facility. (3) Pediatric predicted ESRD outlier services... outlier services furnished to a pediatric beneficiary by an ESRD facility. (4) Adult fixed dollar loss...
Multi-atlas segmentation enables robust multi-contrast MRI spleen segmentation for splenomegaly

NASA Astrophysics Data System (ADS)

Huo, Yuankai; Liu, Jiaqi; Xu, Zhoubing; Harrigan, Robert L.; Assad, Albert; Abramson, Richard G.; Landman, Bennett A.

2017-02-01

Non-invasive spleen volume estimation is essential in detecting splenomegaly. Magnetic resonance imaging (MRI) has been used to facilitate splenomegaly diagnosis in vivo. However, achieving accurate spleen volume estimation from MR images is challenging given the great inter-subject variance of human abdomens and wide variety of clinical images/modalities. Multi-atlas segmentation has been shown to be a promising approach to handle heterogeneous data and difficult anatomical scenarios. In this paper, we propose to use multi-atlas segmentation frameworks for MRI spleen segmentation for splenomegaly. To the best of our knowledge, this is the first work that integrates multi-atlas segmentation for splenomegaly as seen on MRI. To address the particular concerns of spleen MRI, automated and novel semi-automated atlas selection approaches are introduced. The automated approach interactively selects a subset of atlases using selective and iterative method for performance level estimation (SIMPLE) approach. To further control the outliers, semi-automated craniocaudal length based SIMPLE atlas selection (L-SIMPLE) is proposed to introduce a spatial prior in a fashion to guide the iterative atlas selection. A dataset from a clinical trial containing 55 MRI volumes (28 T1 weighted and 27 T2 weighted) was used to evaluate different methods. Both automated and semi-automated methods achieved median DSC > 0.9. The outliers were alleviated by the L-SIMPLE (≍1 min manual efforts per scan), which achieved 0.9713 Pearson correlation compared with the manual segmentation. The results demonstrated that the multi-atlas segmentation is able to achieve accurate spleen segmentation from the multi-contrast splenomegaly MRI scans.
Multi-atlas Segmentation Enables Robust Multi-contrast MRI Spleen Segmentation for Splenomegaly.

PubMed

Huo, Yuankai; Liu, Jiaqi; Xu, Zhoubing; Harrigan, Robert L; Assad, Albert; Abramson, Richard G; Landman, Bennett A

2017-02-11

Non-invasive spleen volume estimation is essential in detecting splenomegaly. Magnetic resonance imaging (MRI) has been used to facilitate splenomegaly diagnosis in vivo. However, achieving accurate spleen volume estimation from MR images is challenging given the great inter-subject variance of human abdomens and wide variety of clinical images/modalities. Multi-atlas segmentation has been shown to be a promising approach to handle heterogeneous data and difficult anatomical scenarios. In this paper, we propose to use multi-atlas segmentation frameworks for MRI spleen segmentation for splenomegaly. To the best of our knowledge, this is the first work that integrates multi-atlas segmentation for splenomegaly as seen on MRI. To address the particular concerns of spleen MRI, automated and novel semi-automated atlas selection approaches are introduced. The automated approach interactively selects a subset of atlases using selective and iterative method for performance level estimation (SIMPLE) approach. To further control the outliers, semi-automated craniocaudal length based SIMPLE atlas selection (L-SIMPLE) is proposed to introduce a spatial prior in a fashion to guide the iterative atlas selection. A dataset from a clinical trial containing 55 MRI volumes (28 T1 weighted and 27 T2 weighted) was used to evaluate different methods. Both automated and semi-automated methods achieved median DSC > 0.9. The outliers were alleviated by the L-SIMPLE (≈1 min manual efforts per scan), which achieved 0.9713 Pearson correlation compared with the manual segmentation. The results demonstrated that the multi-atlas segmentation is able to achieve accurate spleen segmentation from the multi-contrast splenomegaly MRI scans.
Chemical quality of bottom sediments in selected streams, Jefferson County, Kentucky, April-July 1992

USGS Publications Warehouse

Moore, B.L.; Evaldi, R.D.

1995-01-01

Bottom sediments from 25 stream sites in Jefferson County, Ky., were analyzed for percent volatile solids and concentrations of nutrients, major metals, trace elements, miscellaneous inorganic compounds, and selected organic compounds. Statistical high outliers of the constituent concentrations analyzed for in the bottom sediments were defined as a measure of possible elevated concentrations. Statistical high outliers were determined for at least 1 constituent at each of 12 sampling sites in Jefferson County. Of the 10 stream basins sampled in Jefferson County, the Middle Fork Beargrass Basin, Cedar Creek Basin, and Harrods Creek Basin were the only three basins where a statistical high outlier was not found for any of the measured constituents. In the Pennsylvania Run Basin, total volatile solids, nitrate plus nitrite, and endrin constituents were statistical high outliers. Pond Creek was the only basin where five constituents were statistical high outliers-barium, beryllium, cadmium, chromium, and silver. Nitrate plus nitrite and copper constituents were the only statistical high outliers found in the Mill Creek Basin. In the Floyds Fork Basin, nitrate plus nitrite, phosphorus, mercury, and silver constituents were the only statistical high outliers. Ammonia was the only statistical high outlier found in the South Fork Beargrass Basin. In the Goose Creek Basin, mercury and silver constituents were the only statistical high outliers. Cyanide was the only statistical high outlier in the Muddy Fork Basin.
Application of least median of squared orthogonal distance (LMD) and LMD-based reweighted least squares (RLS) methods on the stock-recruitment relationship

NASA Astrophysics Data System (ADS)

Wang, Yan-Jun; Liu, Qun

1999-03-01

Analysis of stock-recruitment (SR) data is most often done by fitting various SR relationship curves to the data. Fish population dynamics data often have stochastic variations and measurement errors, which usually result in a biased regression analysis. This paper presents a robust regression method, least median of squared orthogonal distance (LMD), which is insensitive to abnormal values in the dependent and independent variables in a regression analysis. Outliers that have significantly different variance from the rest of the data can be identified in a residual analysis. Then, the least squares (LS) method is applied to the SR data with defined outliers being down weighted. The application of LMD and LMD-based Reweighted Least Squares (RLS) method to simulated and real fisheries SR data is explored.
Genetic Algorithm for Initial Orbit Determination with Too Short Arc (Continued)

NASA Astrophysics Data System (ADS)

Li, Xin-ran; Wang, Xin

2017-04-01

When the genetic algorithm is used to solve the problem of too short-arc (TSA) orbit determination, due to the difference of computing process between the genetic algorithm and the classical method, the original method for outlier deletion is no longer applicable. In the genetic algorithm, the robust estimation is realized by introducing different loss functions for the fitness function, then the outlier problem of the TSA orbit determination is solved. Compared with the classical method, the genetic algorithm is greatly simplified by introducing in different loss functions. Through the comparison on the calculations of multiple loss functions, it is found that the least median square (LMS) estimation and least trimmed square (LTS) estimation can greatly improve the robustness of the TSA orbit determination, and have a high breakdown point.
Identification of moisture content in tobacco plant leaves using outlier sample eliminating algorithms and hyperspectral data.

PubMed

Sun, Jun; Zhou, Xin; Wu, Xiaohong; Zhang, Xiaodong; Li, Qinglin

2016-02-26

Fast identification of moisture content in tobacco plant leaves plays a key role in the tobacco cultivation industry and benefits the management of tobacco plant in the farm. In order to identify moisture content of tobacco plant leaves in a fast and nondestructive way, a method involving Mahalanobis distance coupled with Monte Carlo cross validation(MD-MCCV) was proposed to eliminate outlier sample in this study. The hyperspectral data of 200 tobacco plant leaf samples of 20 moisture gradients were obtained using FieldSpc(®) 3 spectrometer. Savitzky-Golay smoothing(SG), roughness penalty smoothing(RPS), kernel smoothing(KS) and median smoothing(MS) were used to preprocess the raw spectra. In addition, Mahalanobis distance(MD), Monte Carlo cross validation(MCCV) and Mahalanobis distance coupled to Monte Carlo cross validation(MD-MCCV) were applied to select the outlier sample of the raw spectrum and four smoothing preprocessing spectra. Successive projections algorithm (SPA) was used to extract the most influential wavelengths. Multiple Linear Regression (MLR) was applied to build the prediction models based on preprocessed spectra feature in characteristic wavelengths. The results showed that the preferably four prediction model were MD-MCCV-SG (Rp(2) = 0.8401 and RMSEP = 0.1355), MD-MCCV-RPS (Rp(2) = 0.8030 and RMSEP = 0.1274), MD-MCCV-KS (Rp(2) = 0.8117 and RMSEP = 0.1433), MD-MCCV-MS (Rp(2) = 0.9132 and RMSEP = 0.1162). MD-MCCV algorithm performed best among MD algorithm, MCCV algorithm and the method without sample pretreatment algorithm in the eliminating outlier sample from 20 different moisture gradients of tobacco plant leaves and MD-MCCV can be used to eliminate outlier sample in the spectral preprocessing. Copyright © 2016 Elsevier Inc. All rights reserved.
No genetic adaptation of the Mediterranean keystone shrub Cistus ladanifer in response to experimental fire and extreme drought.

PubMed

Torres, Iván; Parra, Antonio; Moreno, José M; Durka, Walter

2018-01-01

In Mediterranean ecosystems, climate change is projected to increase fire danger and summer drought, thus reducing post-fire recruitment of obligate seeder species, and possibly affecting the population genetic structure. We performed a genome-wide genetic marker study, using AFLP markers, on individuals from one Central Spain population of the obligate post-fire seeder Cistus ladanifer L. that established after experimental fire and survived during four subsequent years under simulated drought implemented with a rainout shelter system. We explored the effects of the treatments on marker diversity, spatial genetic structure and presence of outlier loci suggestive of selection. We found no effect of fire or drought on any of the genetic diversity metrics. Analysis of Molecular Variance showed very low genetic differentiation among treatments. Neither fire nor drought altered the small-scale spatial genetic structure of the population. Only one locus was significantly associated with the fire treatment, but inconsistently across outlier detection methods. Neither fire nor drought are likely to affect the genetic makeup of emerging C. ladanifer, despite reduced recruitment caused by drought. The lack of genetic change suggests that reduced recruitment is a random, non-selective process with no genome-wide consequences on this keystone, drought- and fire tolerant Mediterranean species.

Bayesian hierarchical modeling for subject-level response classification in peptide microarray immunoassays

PubMed Central

Imholte, Gregory; Gottardo, Raphael

2017-01-01

Summary The peptide microarray immunoassay simultaneously screens sample serum against thousands of peptides, determining the presence of antibodies bound to array probes. Peptide microarrays tiling immunogenic regions of pathogens (e.g. envelope proteins of a virus) are an important high throughput tool for querying and mapping antibody binding. Because of the assay’s many steps, from probe synthesis to incubation, peptide microarray data can be noisy with extreme outliers. In addition, subjects may produce different antibody profiles in response to an identical vaccine stimulus or infection, due to variability among subjects’ immune systems. We present a robust Bayesian hierarchical model for peptide microarray experiments, pepBayes, to estimate the probability of antibody response for each subject/peptide combination. Heavy-tailed error distributions accommodate outliers and extreme responses, and tailored random effect terms automatically incorporate technical effects prevalent in the assay. We apply our model to two vaccine trial datasets to demonstrate model performance. Our approach enjoys high sensitivity and specificity when detecting vaccine induced antibody responses. A simulation study shows an adaptive thresholding classification method has appropriate false discovery rate control with high sensitivity, and receiver operating characteristics generated on vaccine trial data suggest that pepBayes clearly separates responses from non-responses. PMID:27061097
Pointwise probability reinforcements for robust statistical inference.

PubMed

Frénay, Benoît; Verleysen, Michel

2014-02-01

Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Exceptional responders in conservation.

PubMed

Post, Gerald; Geldmann, Jonas

2017-08-30

Conservation operates within complex systems with incomplete knowledge of the system and the interventions utilized. This frequently results in the inability to find generally applicable methods to alleviate threats to Earth's vanishing wildlife. One approach used in medicine and the social sciences has been to develop a deeper understanding of positive outliers. Where such outliers share similar characteristics, they may be considered exceptional responders. We devised a 4-step framework for identifying exceptional responders in conservation: identification of the study system, identification of the response structure, identification of the threshold for exceptionalism, and identification of commonalities among outliers. Evaluation of exceptional responders provides additional information that is often ignored in randomized controlled trials and before-after control-intervention experiments. Interrogating the contextual factors that contribute to an exceptional outcome allow exceptional responders to become valuable pieces of information leading to unexpected discoveries and novel hypotheses. © 2017 Society for Conservation Biology.
Patient-specific instrumentation improved mechanical alignment, while early clinical outcome was comparable to conventional instrumentation in TKA.

PubMed

Anderl, Werner; Pauzenberger, Leo; Kölblinger, Roman; Kiesselbach, Gabriele; Brandl, Georg; Laky, Brenda; Kriegleder, Bernhard; Heuberer, Philipp; Schwameis, Eva

2016-01-01

The aim of this prospective study was to compare early clinical outcome, radiological limb alignment, and three-dimensional (3D)-component positioning between conventional and computed tomography (CT)-based patient-specific instrumentation (PSI) in primary mobile-bearing total knee arthroplasty (TKA). Two hundred ninety consecutive patients (300 knees) with severe, debilitating osteoarthritis scheduled for TKA were included in this study using either conventional instrumentation (CVI, n = 150) or PSI (n = 150). Patients were clinically assessed before and 2 years after surgery according to the Knee-Society-Score (KSS) and the visual-analog-scale for pain (VAS). Additionally, the Western Ontario McMaster Universities Osteoarthritis Index (WOMAC) and the Oxford-Knee-Score (OKS) were collected at follow-up. To evaluate accuracy of CVI and PSI, hip-knee-ankle angle (HKA) and 3D-component positioning were assessed on postoperative radiographs and CT. Data of 222 knees (CVI: n = 108, PSI: n = 114) were available for analysis after a mean follow-up of 28.6 ± 5.2 months. At the early follow-up, clinical outcome (KSS, VAS, WOMAC, OKS) was comparable between the two groups. Mean HKA-deviation from the targeted neutral mechanical axis (CVI: 2.2° ± 1.7°; PSI: 1.5° ± 1.4°; p < 0.001), rates of outliers (CVI: 22.2%; PSI: 9.6%; p = 0.016), and 3D-component positioning outliers were significantly lower in the PSI group. Non-outliers (HKA: 180° ± 3°) showed better clinical results than outliers at the 2-year follow-up. CT-based PSI compared with CVI improves accuracy of mechanical alignment restoration and 3D-component positioning in primary TKA. While clinical outcome was comparable between the two instrumentation groups at early follow-up, significantly inferior outcome was detected in the subgroup of HKA-outliers. Prospective comparative study, Level II.
Biomarker profiling in reef corals of Tonga’s Ha’apai and Vava’u archipelagos

PubMed Central

Chen, Chii-Shiarng; Dempsey, Alexandra C.

2017-01-01

Given the significant threats towards Earth’s coral reefs, there is an urgent need to document the current physiological condition of the resident organisms, particularly the reef-building scleractinians themselves. Unfortunately, most of the planet’s reefs are understudied, and some have yet to be seen. For instance, the Kingdom of Tonga possesses an extensive reef system, with thousands of hectares of unobserved reefs; little is known about their ecology, nor is there any information on the health of the resident corals. Given such knowledge deficiencies, 59 reefs across three Tongan archipelagos were surveyed herein, and pocilloporid corals were sampled from approximately half of these surveyed sites; 10 molecular-scale response variable were assessed in 88 of the sampled colonies, and 12 colonies were found to be outliers based on employment of a multivariate statistics-based aberrancy detection system. These outliers differed from the statistically normally behaving colonies in having not only higher RNA/DNA ratios but also elevated expression levels of three genes: 1) Symbiodinium zinc-induced facilitator-like 1-like, 2) host coral copper-zinc superoxide dismutase, and 3) host green fluorescent protein-like chromoprotein. Outliers were also characterized by significantly higher variation amongst the molecular response variables assessed, and the response variables that contributed most significantly to colonies being delineated as outliers differed between the two predominant reef coral species sampled, Pocillopora damicornis and P. acuta. These closely related species also displayed dissimilar temporal fluctuation patterns in their molecular physiologies, an observation that may have been driven by differences in their feeding strategies. Future works should attempt to determine whether corals displaying statistically aberrant molecular physiology, such as the 12 Tongan outliers identified herein, are indeed characterized by a diminished capacity for acclimating to the rapid changes in their abiotic milieu occurring as a result of global climate change. PMID:29091723
Lilac and honeysuckle phenology data 1956-2014.

PubMed

Rosemartin, Alyssa H; Denny, Ellen G; Weltzin, Jake F; Lee Marsh, R; Wilson, Bruce E; Mehdipoor, Hamed; Zurita-Milla, Raul; Schwartz, Mark D

2015-01-01

The dataset is comprised of leafing and flowering data collected across the continental United States from 1956 to 2014 for purple common lilac (Syringa vulgaris), a cloned lilac cultivar (S. x chinensis 'Red Rothomagensis') and two cloned honeysuckle cultivars (Lonicera tatarica 'Arnold Red' and L. korolkowii 'Zabeli'). Applications of this observational dataset range from detecting regional weather patterns to understanding the impacts of global climate change on the onset of spring at the national scale. While minor changes in methods have occurred over time, and some documentation is lacking, outlier analyses identified fewer than 3% of records as unusually early or late. Lilac and honeysuckle phenology data have proven robust in both model development and climatic research.
Generalized shrunken type-GM estimator and its application

NASA Astrophysics Data System (ADS)

Ma, C. Z.; Du, Y. L.

2014-03-01

The parameter estimation problem in linear model is considered when multicollinearity and outliers exist simultaneously. A class of new robust biased estimator, Generalized Shrunken Type-GM Estimation, with their calculated methods are established by combination of GM estimator and biased estimator include Ridge estimate, Principal components estimate and Liu estimate and so on. A numerical example shows that the most attractive advantage of these new estimators is that they can not only overcome the multicollinearity of coefficient matrix and outliers but also have the ability to control the influence of leverage points.
Outliers: A Potential Data Problem.

ERIC Educational Resources Information Center

Douzenis, Cordelia; Rakow, Ernest A.

Outliers, extreme data values relative to others in a sample, may distort statistics that assume internal levels of measurement and normal distribution. The outlier may be a valid value or an error. Several procedures are available for identifying outliers, and each may be applied to errors of prediction from the regression lines for utility in a…
Optoelectronic instrumentation enhancement using data mining feedback for a 3D measurement system

NASA Astrophysics Data System (ADS)

Flores-Fuentes, Wendy; Sergiyenko, Oleg; Gonzalez-Navarro, Félix F.; Rivas-López, Moisés; Hernandez-Balbuena, Daniel; Rodríguez-Quiñonez, Julio C.; Tyrsa, Vera; Lindner, Lars

2016-12-01

3D measurement by a cyber-physical system based on optoelectronic scanning instrumentation has been enhanced by outliers and regression data mining feedback. The prototype has applications in (1) industrial manufacturing systems that include: robotic machinery, embedded vision, and motion control, (2) health care systems for measurement scanning, and (3) infrastructure by providing structural health monitoring. This paper presents new research performed in data processing of a 3D measurement vision sensing database. Outliers from multivariate data have been detected and removal to improve artificial intelligence regression algorithm results. Physical measurement error regression data has been used for 3D measurements error correction. Concluding, that the joint of physical phenomena, measurement and computation is an effectiveness action for feedback loops in the control of industrial, medical and civil tasks.
Locating Structural Centers: A Density-Based Clustering Method for Community Detection

PubMed Central

Liu, Gongshen; Li, Jianhua; Nees, Jan P.

2017-01-01

Uncovering underlying community structures in complex networks has received considerable attention because of its importance in understanding structural attributes and group characteristics of networks. The algorithmic identification of such structures is a significant challenge. Local expanding methods have proven to be efficient and effective in community detection, but most methods are sensitive to initial seeds and built-in parameters. In this paper, we present a local expansion method by density-based clustering, which aims to uncover the intrinsic network communities by locating the structural centers of communities based on a proposed structural centrality. The structural centrality takes into account local density of nodes and relative distance between nodes. The proposed algorithm expands a community from the structural center to the border with a single local search procedure. The local expanding procedure follows a heuristic strategy as allowing it to find complete community structures. Moreover, it can identify different node roles (cores and outliers) in communities by defining a border region. The experiments involve both on real-world and artificial networks, and give a comparison view to evaluate the proposed method. The result of these experiments shows that the proposed method performs more efficiently with a comparative clustering performance than current state of the art methods. PMID:28046030
Fast clustering using adaptive density peak detection.

PubMed

Wang, Xiao-Feng; Xu, Yifan

2017-12-01

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
Unbalance detection in rotor systems with active bearings using self-sensing piezoelectric actuators

NASA Astrophysics Data System (ADS)

Ambur, Ramakrishnan; Rinderknecht, Stephan

2018-03-01

Machines which are developed today are highly automated due to increased use of mechatronic systems. To ensure their reliable operation, fault detection and isolation (FDI) is an important feature along with a better control. This research work aims to achieve and integrate both these functions with minimum number of components in a mechatronic system. This article investigates a rotating machine with active bearings equipped with piezoelectric actuators. There is an inherent coupling between their electrical and mechanical properties because of which they can also be used as sensors. Mechanical deflection can be reconstructed from these self-sensing actuators from measured voltage and current signals. These virtual sensor signals are utilised to detect unbalance in a rotor system. Parameters of unbalance such as its magnitude and phase are detected by parametric estimation method in frequency domain. Unbalance location has been identified using hypothesis of localization of faults. Robustness of the estimates against outliers in measurements is improved using weighted least squares method. Unbalances are detected in a real test bench apart from simulation using its model. Experiments are performed in stationary as well as in transient case. As a further step unbalances are estimated during simultaneous actuation of actuators in closed loop with an adaptive algorithm for vibration minimisation. This strategy could be used in systems which aim for both fault detection and control action.
Assessing significance in a Markov chain without mixing.

PubMed

Chikina, Maria; Frieze, Alan; Pegden, Wesley

2017-03-14

We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a [Formula: see text] value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a [Formula: see text] outlier compared with the sampled ranks (its rank is in the bottom [Formula: see text] of sampled ranks), then this observation should correspond to a [Formula: see text] value of [Formula: see text] This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an [Formula: see text]-outlier on the walk is significant at [Formula: see text] under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at [Formula: see text] is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting.
Assessing significance in a Markov chain without mixing

PubMed Central

Chikina, Maria; Frieze, Alan; Pegden, Wesley

2017-01-01

We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a p value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a 0.1% outlier compared with the sampled ranks (its rank is in the bottom 0.1% of sampled ranks), then this observation should correspond to a p value of 0.001. This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an ε-outlier on the walk is significant at p=2ε under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at p≈ε is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting. PMID:28246331
An approach to the analysis of SDSS spectroscopic outliers based on self-organizing maps. Designing the outlier analysis software package for the next Gaia survey

NASA Astrophysics Data System (ADS)

Fustes, D.; Manteiga, M.; Dafonte, C.; Arcay, B.; Ulla, A.; Smith, K.; Borrachero, R.; Sordo, R.

2013-11-01

Aims: A new method applied to the segmentation and further analysis of the outliers resulting from the classification of astronomical objects in large databases is discussed. The method is being used in the framework of the Gaia satellite Data Processing and Analysis Consortium (DPAC) activities to prepare automated software tools that will be used to derive basic astrophysical information that is to be included in final Gaia archive. Methods: Our algorithm has been tested by means of simulated Gaia spectrophotometry, which is based on SDSS observations and theoretical spectral libraries covering a wide sample of astronomical objects. Self-organizing maps networks are used to organize the information in clusters of objects, as homogeneously as possible according to their spectral energy distributions, and to project them onto a 2D grid where the data structure can be visualized. Results: We demonstrate the usefulness of the method by analyzing the spectra that were rejected by the SDSS spectroscopic classification pipeline and thus classified as "UNKNOWN". First, our method can help distinguish between astrophysical objects and instrumental artifacts. Additionally, the application of our algorithm to SDSS objects of unknown nature has allowed us to identify classes of objects with similar astrophysical natures. In addition, the method allows for the potential discovery of hundreds of new objects, such as white dwarfs and quasars. Therefore, the proposed method is shown to be very promising for data exploration and knowledge discovery in very large astronomical databases, such as the archive from the upcoming Gaia mission.
Gravity gradient preprocessing at the GOCE HPF

NASA Astrophysics Data System (ADS)

Bouman, J.; Rispens, S.; Gruber, T.; Schrama, E.; Visser, P.; Tscherning, C. C.; Veicherts, M.

2009-04-01

One of the products derived from the GOCE observations are the gravity gradients. These gravity gradients are provided in the Gradiometer Reference Frame (GRF) and are calibrated in-flight using satellite shaking and star sensor data. In order to use these gravity gradients for application in Earth sciences and gravity field analysis, additional pre-processing needs to be done, including corrections for temporal gravity field signals to isolate the static gravity field part, screening for outliers, calibration by comparison with existing external gravity field information and error assessment. The temporal gravity gradient corrections consist of tidal and non-tidal corrections. These are all generally below the gravity gradient error level, which is predicted to show a 1/f behaviour for low frequencies. In the outlier detection the 1/f error is compensated for by subtracting a local median from the data, while the data error is assessed using the median absolute deviation. The local median acts as a high-pass filter and it is robust as is the median absolute deviation. Three different methods have been implemented for the calibration of the gravity gradients. All three methods use a high-pass filter to compensate for the 1/f gravity gradient error. The baseline method uses state-of-the-art global gravity field models and the most accurate results are obtained if star sensor misalignments are estimated along with the calibration parameters. A second calibration method uses GOCE GPS data to estimate a low degree gravity field model as well as gravity gradient scale factors. Both methods allow to estimate gravity gradient scale factors down to the 10-3 level. The third calibration method uses high accurate terrestrial gravity data in selected regions to validate the gravity gradient scale factors, focussing on the measurement band. Gravity gradient scale factors may be estimated down to the 10-2 level with this method.
Preprocessing of gravity gradients at the GOCE high-level processing facility

NASA Astrophysics Data System (ADS)

Bouman, Johannes; Rispens, Sietse; Gruber, Thomas; Koop, Radboud; Schrama, Ernst; Visser, Pieter; Tscherning, Carl Christian; Veicherts, Martin

2009-07-01

One of the products derived from the gravity field and steady-state ocean circulation explorer (GOCE) observations are the gravity gradients. These gravity gradients are provided in the gradiometer reference frame (GRF) and are calibrated in-flight using satellite shaking and star sensor data. To use these gravity gradients for application in Earth scienes and gravity field analysis, additional preprocessing needs to be done, including corrections for temporal gravity field signals to isolate the static gravity field part, screening for outliers, calibration by comparison with existing external gravity field information and error assessment. The temporal gravity gradient corrections consist of tidal and nontidal corrections. These are all generally below the gravity gradient error level, which is predicted to show a 1/ f behaviour for low frequencies. In the outlier detection, the 1/ f error is compensated for by subtracting a local median from the data, while the data error is assessed using the median absolute deviation. The local median acts as a high-pass filter and it is robust as is the median absolute deviation. Three different methods have been implemented for the calibration of the gravity gradients. All three methods use a high-pass filter to compensate for the 1/ f gravity gradient error. The baseline method uses state-of-the-art global gravity field models and the most accurate results are obtained if star sensor misalignments are estimated along with the calibration parameters. A second calibration method uses GOCE GPS data to estimate a low-degree gravity field model as well as gravity gradient scale factors. Both methods allow to estimate gravity gradient scale factors down to the 10-3 level. The third calibration method uses high accurate terrestrial gravity data in selected regions to validate the gravity gradient scale factors, focussing on the measurement band. Gravity gradient scale factors may be estimated down to the 10-2 level with this method.
Visual saliency detection based on modeling the spatial Gaussianity

NASA Astrophysics Data System (ADS)

Ju, Hongbin

2015-04-01

In this paper, a novel salient object detection method based on modeling the spatial anomalies is presented. The proposed framework is inspired by the biological mechanism that human eyes are sensitive to the unusual and anomalous objects among complex background. It is supposed that a natural image can be seen as a combination of some similar or dissimilar basic patches, and there is a direct relationship between its saliency and anomaly. Some patches share high degree of similarity and have a vast number of quantity. They usually make up the background of an image. On the other hand, some patches present strong rarity and specificity. We name these patches "anomalies". Generally, anomalous patch is a reflection of the edge or some special colors and textures in an image, and these pattern cannot be well "explained" by their surroundings. Human eyes show great interests in these anomalous patterns, and will automatically pick out the anomalous parts of an image as the salient regions. To better evaluate the anomaly degree of the basic patches and exploit their nonlinear statistical characteristics, a multivariate Gaussian distribution saliency evaluation model is proposed. In this way, objects with anomalous patterns usually appear as the outliers in the Gaussian distribution, and we identify these anomalous objects as salient ones. Experiments are conducted on the well-known MSRA saliency detection dataset. Compared with other recent developed visual saliency detection methods, our method suggests significant advantages.
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels

PubMed Central

Maulik, Ujjwal; Sarkar, Anasua

2013-01-01

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

PubMed

Maulik, Ujjwal; Sarkar, Anasua

2013-01-01

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.

75 FR 42835 - Medicare Program; Inpatient Rehabilitation Facility Prospective Payment System for Federal Fiscal...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-22

... estimated cost of the case exceeds the adjusted outlier threshold. We calculate the adjusted outlier... to 80 percent of the difference between the estimated cost of the case and the outlier threshold. In... Federal Prospective Payment Rates VI. Update to Payments for High-Cost Outliers under the IRF PPS A...
Methodology to assess clinical liver safety data.

PubMed

Merz, Michael; Lee, Kwan R; Kullak-Ublick, Gerd A; Brueckner, Andreas; Watkins, Paul B

2014-11-01

Analysis of liver safety data has to be multivariate by nature and needs to take into account time dependency of observations. Current standard tools for liver safety assessment such as summary tables, individual data listings, and narratives address these requirements to a limited extent only. Using graphics in the context of a systematic workflow including predefined graph templates is a valuable addition to standard instruments, helping to ensure completeness of evaluation, and supporting both hypothesis generation and testing. Employing graphical workflows interactively allows analysis in a team-based setting and facilitates identification of the most suitable graphics for publishing and regulatory reporting. Another important tool is statistical outlier detection, accounting for the fact that for assessment of Drug-Induced Liver Injury, identification and thorough evaluation of extreme values has much more relevance than measures of central tendency in the data. Taken together, systematical graphical data exploration and statistical outlier detection may have the potential to significantly improve assessment and interpretation of clinical liver safety data. A workshop was convened to discuss best practices for the assessment of drug-induced liver injury (DILI) in clinical trials.
Cycle bases to the rescue

NASA Astrophysics Data System (ADS)

Tóbiás, Roland; Furtenbacher, Tibor; Császár, Attila G.

2017-12-01

Cycle bases of graph theory are introduced for the analysis of transition data deposited in line-by-line rovibronic spectroscopic databases. The principal advantage of using cycle bases is that outlier transitions -almost always present in spectroscopic databases built from experimental data originating from many different sources- can be detected and identified straightforwardly and automatically. The data available for six water isotopologues, H216O, H217O, H218O, HD16O, HD17O, and HD18O, in the HITRAN2012 and GEISA2015 databases are used to demonstrate the utility of cycle-basis-based outlier-detection approaches. The spectroscopic databases appear to be sufficiently complete so that the great majority of the entries of the minimum cycle basis have the minimum possible length of four. More than 2000 transition conflicts have been identified for the isotopologue H216O in the HITRAN2012 database, the seven common conflict types are discussed. It is recommended to employ cycle bases, and especially a minimum cycle basis, for the analysis of transitions deposited in high-resolution spectroscopic databases.
Signatures of selection in the Iberian honey bee (Apis mellifera iberiensis) revealed by a genome scan analysis of single nucleotide polymorphisms.

PubMed

Chávez-Galarza, Julio; Henriques, Dora; Johnston, J Spencer; Azevedo, João C; Patton, John C; Muñoz, Irene; De la Rúa, Pilar; Pinto, M Alice

2013-12-01

Understanding the genetic mechanisms of adaptive population divergence is one of the most fundamental endeavours in evolutionary biology and is becoming increasingly important as it will allow predictions about how organisms will respond to global environmental crisis. This is particularly important for the honey bee, a species of unquestionable ecological and economical importance that has been exposed to increasing human-mediated selection pressures. Here, we conducted a single nucleotide polymorphism (SNP)-based genome scan in honey bees collected across an environmental gradient in Iberia and used four FST -based outlier tests to identify genomic regions exhibiting signatures of selection. Additionally, we analysed associations between genetic and environmental data for the identification of factors that might be correlated or act as selective pressures. With these approaches, 4.4% (17 of 383) of outlier loci were cross-validated by four FST -based methods, and 8.9% (34 of 383) were cross-validated by at least three methods. Of the 34 outliers, 15 were found to be strongly associated with one or more environmental variables. Further support for selection, provided by functional genomic information, was particularly compelling for SNP outliers mapped to different genes putatively involved in the same function such as vision, xenobiotic detoxification and innate immune response. This study enabled a more rigorous consideration of selection as the underlying cause of diversity patterns in Iberian honey bees, representing an important first step towards the identification of polymorphisms implicated in local adaptation and possibly in response to recent human-mediated environmental changes. © 2013 John Wiley & Sons Ltd.
Computational methods for evaluation of cell-based data assessment--Bioconductor.

PubMed

Le Meur, Nolwenn

2013-02-01

Recent advances in miniaturization and automation of technologies have enabled cell-based assay high-throughput screening, bringing along new challenges in data analysis. Automation, standardization, reproducibility have become requirements for qualitative research. The Bioconductor community has worked in that direction proposing several R packages to handle high-throughput data including flow cytometry (FCM) experiment. Altogether, these packages cover the main steps of a FCM analysis workflow, that is, data management, quality assessment, normalization, outlier detection, automated gating, cluster labeling, and feature extraction. Additionally, the open-source philosophy of R and Bioconductor, which offers room for new development, continuously drives research and improvement of theses analysis methods, especially in the field of clustering and data mining. This review presents the principal FCM packages currently available in R and Bioconductor, their advantages and their limits. Copyright © 2012 Elsevier Ltd. All rights reserved.
Structure-From-Motion in 3D Space Using 2D Lidars

PubMed Central

Choi, Dong-Geol; Bok, Yunsu; Kim, Jun-Sik; Shim, Inwook; Kweon, In So

2017-01-01

This paper presents a novel structure-from-motion methodology using 2D lidars (Light Detection And Ranging). In 3D space, 2D lidars do not provide sufficient information for pose estimation. For this reason, additional sensors have been used along with the lidar measurement. In this paper, we use a sensor system that consists of only 2D lidars, without any additional sensors. We propose a new method of estimating both the 6D pose of the system and the surrounding 3D structures. We compute the pose of the system using line segments of scan data and their corresponding planes. After discarding the outliers, both the pose and the 3D structures are refined via nonlinear optimization. Experiments with both synthetic and real data show the accuracy and robustness of the proposed method. PMID:28165372
Robust prediction of protein subcellular localization combining PCA and WSVMs.

PubMed

Tian, Jiang; Gu, Hong; Liu, Wenqi; Gao, Chiyang

2011-08-01

Automated prediction of protein subcellular localization is an important tool for genome annotation and drug discovery, and Support Vector Machines (SVMs) can effectively solve this problem in a supervised manner. However, the datasets obtained from real experiments are likely to contain outliers or noises, which can lead to poor generalization ability and classification accuracy. To explore this problem, we adopt strategies to lower the effect of outliers. First we design a method based on Weighted SVMs, different weights are assigned to different data points, so the training algorithm will learn the decision boundary according to the relative importance of the data points. Second we analyse the influence of Principal Component Analysis (PCA) on WSVM classification, propose a hybrid classifier combining merits of both PCA and WSVM. After performing dimension reduction operations on the datasets, kernel-based possibilistic c-means algorithm can generate more suitable weights for the training, as PCA transforms the data into a new coordinate system with largest variances affected greatly by the outliers. Experiments on benchmark datasets show promising results, which confirms the effectiveness of the proposed method in terms of prediction accuracy. Copyright © 2011 Elsevier Ltd. All rights reserved.
Exploring Innovative Approaches and Patient-Centered Outcomes from Positive Outliers in Childhood Obesity

PubMed Central

Sharifi, Mona; Marshall, Gareth; Goldman, Roberta; Rifas-Shiman, Sheryl L; Horan, Christine M; Koziol, Renata; Marshall, Richard; Sequist, Thomas D; Taveras, Elsie M

2015-01-01

Objective New approaches for obesity prevention and management can be gleaned from 'positive outliers', i.e., individuals who have succeeded in changing health behaviors and reducing their body mass index (BMI) in the context of adverse built and social environments. We explored perspectives and strategies of parents of positive outlier children living in high risk neighborhoods. Methods We collected up to five years of height/weight data from the electronic health records of 22,443 Massachusetts children, ages 6-12 years, seen for well-child care. We identified children with any history of BMI ≥95th percentile (n=4007) and generated a BMI z-score slope for each child using a linear mixed effects model. We recruited parents for focus groups from the sub-sample of children with negative slopes who also lived in zip codes where >15% of children were obese. We analyzed focus group transcripts using an immersion/crystallization approach. Results We reached thematic saturation after 5 focus groups with 41 parents. Commonly cited outcomes that mattered most to parents and motivated change were child inactivity, above-average clothing sizes, exercise intolerance, and negative peer interactions; few reported BMI as a motivator. Convergent strategies among positive outlier families were family-level changes, parent modeling, consistency, household rules/limits, and creativity in overcoming resistance. Parents voiced preferences for obesity interventions that include tailored education and support that extend outside clinical settings and are delivered by both health care professionals and successful peers. Conclusions Successful strategies learned from positive outlier families can be generalized and tested to accelerate progress in reducing childhood obesity. PMID:25439163
Abundant Topological Outliers in Social Media Data and Their Effect on Spatial Analysis.

PubMed

Westerholt, Rene; Steiger, Enrico; Resch, Bernd; Zipf, Alexander

2016-01-01

Twitter and related social media feeds have become valuable data sources to many fields of research. Numerous researchers have thereby used social media posts for spatial analysis, since many of them contain explicit geographic locations. However, despite its widespread use within applied research, a thorough understanding of the underlying spatial characteristics of these data is still lacking. In this paper, we investigate how topological outliers influence the outcomes of spatial analyses of social media data. These outliers appear when different users contribute heterogeneous information about different phenomena simultaneously from similar locations. As a consequence, various messages representing different spatial phenomena are captured closely to each other, and are at risk to be falsely related in a spatial analysis. Our results reveal indications for corresponding spurious effects when analyzing Twitter data. Further, we show how the outliers distort the range of outcomes of spatial analysis methods. This has significant influence on the power of spatial inferential techniques, and, more generally, on the validity and interpretability of spatial analysis results. We further investigate how the issues caused by topological outliers are composed in detail. We unveil that multiple disturbing effects are acting simultaneously and that these are related to the geographic scales of the involved overlapping patterns. Our results show that at some scale configurations, the disturbances added through overlap are more severe than at others. Further, their behavior turns into a volatile and almost chaotic fluctuation when the scales of the involved patterns become too different. Overall, our results highlight the critical importance of thoroughly considering the specific characteristics of social media data when analyzing them spatially.
Indicator saturation: a novel approach to detect multiple breaks in geodetic time series.

NASA Astrophysics Data System (ADS)

Jackson, L. P.; Pretis, F.; Williams, S. D. P.

2016-12-01

Geodetic time series can record long term trends, quasi-periodic signals at a variety of time scales from days to decades, and sudden breaks due to natural or anthropogenic causes. The causes of breaks range from instrument replacement to earthquakes to unknown (i.e. no attributable cause). Furthermore, breaks can be permanent or short-lived and range at least two orders of magnitude in size (mm to 100's mm). To account for this range of possible signal-characteristics requires a flexible time series method that can distinguish between true and false breaks, outliers and time-varying trends. One such method, Indicator Saturation (IS) comes from the field of econometrics where analysing stochastic signals in these terms is a common problem. The IS approach differs from alternative break detection methods by considering every point in the time series as a break until it is demonstrated statistically that it is not. A linear model is constructed with a break function at every point in time, and all but statistically significant breaks are removed through a general-to-specific model selection algorithm for more variables than observations. The IS method is flexible because it allows multiple breaks of different forms (e.g. impulses, shifts in the mean, and changing trends) to be detected, while simultaneously modelling any underlying variation driven by additional covariates. We apply the IS method to identify breaks in a suite of synthetic GPS time series used for the Detection of Offsets in GPS Experiments (DOGEX). We optimise the method to maximise the ratio of true-positive to false-positive detections, which improves estimates of errors in the long term rates of land motion currently required by the GPS community.
Outliers to the peak energy-isotropic energy relation in gamma-ray bursts

NASA Astrophysics Data System (ADS)

Nakar, Ehud; Piran, Tsvi

2005-06-01

The peak energy-isotropic energy (EpEi) relation is among the most intriguing recent discoveries concerning gamma-ray bursts (GRBs). It can have numerous implications for our understanding of the emission mechanism of the bursts and for the application of GRBs to cosmological studies. However, this relation has been verified only for a small sample of bursts with measured redshifts. We propose here a test of whether a burst with an unknown redshift can potentially satisfy the EpEi relation. Applying this test to a large sample of BATSE bursts, we find that a significant fraction of those bursts cannot satisfy this relation. Our test is sensitive only to dim and hard bursts, and therefore this relation might still hold as an inequality (i.e. there are no intrinsically bright and soft bursts). We conclude that the observed relation seen in the sample of bursts with known redshift might be influenced by observational biases and the inability to locate and to localize well hard and weak bursts that have only a small number of photons. In particular, we point out that the threshold for detection, localization and redshift measurement is essentially higher than the threshold for detection alone. We predict that Swift will detect some hard and weak bursts that would be outliers to the EpEi relation. However, we cannot quantify this prediction. We stress the importance of understanding the detection-localization-redshift threshold for the coming Swift detections.
Robust Kriged Kalman Filtering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baingana, Brian; Dall'Anese, Emiliano; Mateos, Gonzalo

2015-11-11

Although the kriged Kalman filter (KKF) has well-documented merits for prediction of spatial-temporal processes, its performance degrades in the presence of outliers due to anomalous events, or measurement equipment failures. This paper proposes a robust KKF model that explicitly accounts for presence of measurement outliers. Exploiting outlier sparsity, a novel l1-regularized estimator that jointly predicts the spatial-temporal process at unmonitored locations, while identifying measurement outliers is put forth. Numerical tests are conducted on a synthetic Internet protocol (IP) network, and real transformer load data. Test results corroborate the effectiveness of the novel estimator in joint spatial prediction and outlier identification.
Scalable Robust Principal Component Analysis Using Grassmann Averages.

PubMed

Hauberg, Sren; Feragen, Aasa; Enficiaud, Raffi; Black, Michael J

2016-11-01

In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average ( GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average ( TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.
Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic.

PubMed

Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B; Chen, Li; Wang, Yue; Clarke, Robert

2012-08-01

Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. xuan@vt.edu Supplementary data are available at Bioinformatics online.
Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic

PubMed Central

Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B.; Chen, Li; Wang, Yue; Clarke, Robert

2012-01-01

Motivation: Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive ‘noise’ in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. Results: In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. Availability and implementation: The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. Contact: xuan@vt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22595208
A quick method based on SIMPLISMA-KPLS for simultaneously selecting outlier samples and informative samples for model standardization in near infrared spectroscopy

NASA Astrophysics Data System (ADS)

Li, Li-Na; Ma, Chang-Ming; Chang, Ming; Zhang, Ren-Cheng

2017-12-01

A novel method based on SIMPLe-to-use Interactive Self-modeling Mixture Analysis (SIMPLISMA) and Kernel Partial Least Square (KPLS), named as SIMPLISMA-KPLS, is proposed in this paper for selection of outlier samples and informative samples simultaneously. It is a quick algorithm used to model standardization (or named as model transfer) in near infrared (NIR) spectroscopy. The NIR experiment data of the corn for analysis of the protein content is introduced to evaluate the proposed method. Piecewise direct standardization (PDS) is employed in model transfer. And the comparison of SIMPLISMA-PDS-KPLS and KS-PDS-KPLS is given in this research by discussion of the prediction accuracy of protein content and calculation speed of each algorithm. The conclusions include that SIMPLISMA-KPLS can be utilized as an alternative sample selection method for model transfer. Although it has similar accuracy to Kennard-Stone (KS), it is different from KS as it employs concentration information in selection program. This means that it ensures analyte information is involved in analysis, and the spectra (X) of the selected samples is interrelated with concentration (y). And it can be used for outlier sample elimination simultaneously by validation of calibration. According to the statistical data results of running time, it is clear that the sample selection process is more rapid when using KPLS. The quick algorithm of SIMPLISMA-KPLS is beneficial to improve the speed of online measurement using NIR spectroscopy.
Supervised linear dimensionality reduction with robust margins for object recognition

NASA Astrophysics Data System (ADS)

Dornaika, F.; Assoum, A.

2013-01-01

Linear Dimensionality Reduction (LDR) techniques have been increasingly important in computer vision and pattern recognition since they permit a relatively simple mapping of data onto a lower dimensional subspace, leading to simple and computationally efficient classification strategies. Recently, many linear discriminant methods have been developed in order to reduce the dimensionality of visual data and to enhance the discrimination between different groups or classes. Many existing linear embedding techniques relied on the use of local margins in order to get a good discrimination performance. However, dealing with outliers and within-class diversity has not been addressed by margin-based embedding method. In this paper, we explored the use of different margin-based linear embedding methods. More precisely, we propose to use the concepts of Median miss and Median hit for building robust margin-based criteria. Based on such margins, we seek the projection directions (linear embedding) such that the sum of local margins is maximized. Our proposed approach has been applied to the problem of appearance-based face recognition. Experiments performed on four public face databases show that the proposed approach can give better generalization performance than the classic Average Neighborhood Margin Maximization (ANMM). Moreover, thanks to the use of robust margins, the proposed method down-grades gracefully when label outliers contaminate the training data set. In particular, we show that the concept of Median hit was crucial in order to get robust performance in the presence of outliers.
Identification of unusual events in multi-channel bridge monitoring data

NASA Astrophysics Data System (ADS)

Omenzetter, Piotr; Brownjohn, James Mark William; Moyo, Pilate

2004-03-01

Continuously operating instrumented structural health monitoring (SHM) systems are becoming a practical alternative to replace visual inspection for assessment of condition and soundness of civil infrastructure such as bridges. However, converting large amounts of data from an SHM system into usable information is a great challenge to which special signal processing techniques must be applied. This study is devoted to identification of abrupt, anomalous and potentially onerous events in the time histories of static, hourly sampled strains recorded by a multi-sensor SHM system installed in a major bridge structure and operating continuously for a long time. Such events may result, among other causes, from sudden settlement of foundation, ground movement, excessive traffic load or failure of post-tensioning cables. A method of outlier detection in multivariate data has been applied to the problem of finding and localising sudden events in the strain data. For sharp discrimination of abrupt strain changes from slowly varying ones wavelet transform has been used. The proposed method has been successfully tested using known events recorded during construction of the bridge, and later effectively used for detection of anomalous post-construction events.
Missed detection of significant positive and negative shifts in gentamicin assay: implications for routine laboratory quality practices.

PubMed

Koerbin, Gus; Liu, Jiakai; Eigenstetter, Alex; Tan, Chin Hon; Badrick, Tony; Loh, Tze Ping

2018-02-15

A product recall was issued for the Roche/Hitachi Cobas Gentamicin II assays on 25 th May 2016 in Australia, after a 15 - 20% positive analytical shift was discovered. Laboratories were advised to employ the Thermo Fisher Gentamicin assay as an alternative. Following the reintroduction of the revised assay on 12 th September 2016, a second reagent recall was made on 20 th March 2017 after the discovery of a 20% negative analytical shift due to erroneous instrument adjustment factor. The practices of an index laboratory were examined to determine how the analytical shifts evaded detection by routine internal quality control (IQC) and external quality assurance (EQA) systems. The ability of the patient result-based approaches, including moving average (MovAvg) and moving sum of outliers (MovSO) approaches in detecting these shifts were examined. Internal quality control data of the index laboratory were acceptable prior to the product recall. The practice of adjusting IQC target following a change in assay method resulted in the missed negative shift when the revised Roche assay was reintroduced. While the EQA data of the Roche subgroup showed clear negative bias relative to other laboratory methods, the results were considered as possible 'matrix effect'. The MovAvg method detected the positive shift before the product recall. The MovSO did not detect the negative shift in the index laboratory but did so in another laboratory 5 days before the second product recall. There are gaps in current laboratory quality practices that leave room for analytical errors to evade detection.
Detection of calcification clusters in digital breast tomosynthesis slices at different dose levels utilizing a SRSAR reconstruction and JAFROC

NASA Astrophysics Data System (ADS)

Timberg, P.; Dustler, M.; Petersson, H.; Tingberg, A.; Zackrisson, S.

2015-03-01

Purpose: To investigate detection performance for calcification clusters in reconstructed digital breast tomosynthesis (DBT) slices at different dose levels using a Super Resolution and Statistical Artifact Reduction (SRSAR) reconstruction method. Method: Simulated calcifications with irregular profile (0.2 mm diameter) where combined to form clusters that were added to projection images (1-3 per abnormal image) acquired on a DBT system (Mammomat Inspiration, Siemens). The projection images were dose reduced by software to form 35 abnormal cases and 25 normal cases as if acquired at 100%, 75% and 50% dose level (AGD of approximately 1.6 mGy for a 53 mm standard breast, measured according to EUREF v0.15). A standard FBP and a SRSAR reconstruction method (utilizing IRIS (iterative reconstruction filters), and outlier detection using Maximum-Intensity Projections and Average-Intensity Projections) were used to reconstruct single central slices to be used in a Free-response task (60 images per observer and dose level). Six observers participated and their task was to detect the clusters and assign confidence rating in randomly presented images from the whole image set (balanced by dose level). Each trial was separated by one weeks to reduce possible memory bias. The outcome was analyzed for statistical differences using Jackknifed Alternative Free-response Receiver Operating Characteristics. Results: The results indicate that it is possible reduce the dose by 50% with SRSAR without jeopardizing cluster detection. Conclusions: The detection performance for clusters can be maintained at a lower dose level by using SRSAR reconstruction.

Robust Gaussian Graphical Modeling via l1 Penalization

PubMed Central

Sun, Hokeun; Li, Hongzhe

2012-01-01

Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified-likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re-estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso. PMID:23020775
Development of an ultrasonic nondestructive inspection method for impact damage detection in composite aircraft structures

NASA Astrophysics Data System (ADS)

Capriotti, M.; Kim, H. E.; Lanza di Scalea, F.; Kim, H.

2017-04-01

High Energy Wide Area Blunt Impact (HEWABI) due to ground service equipment can often occur in aircraft structures causing major damages. These Wide Area Impact Damages (WAID) can affect the internal components of the structure, hence are usually not visible nor detectable by typical one-sided NDE techniques and can easily compromise the structural safety of the aircraft. In this study, the development of an NDI method is presented together with its application to impacted aircraft frames. The HEWABI from a typical ground service scenario has been previously tested and the desired type of damages have been generated, so that the aircraft panels could become representative study cases. The need of the aircraft industry for a rapid, ramp-friendly system to detect such WAID is here approached with guided ultrasonic waves (GUW) and a scanning tool that accesses the whole structure from the exterior side only. The wide coverage of the specimen provided by GUW has been coupled to a differential detection approach and is aided by an outlier statistical analysis to be able to inspect and detect faults in the challenging composite material and complex structure. The results will be presented and discussed with respect to the detection capability of the system and its response to the different damage types. Receiving Operating Characteristics curves (ROC) are also produced to quantify and assess the performance of the proposed method. Ongoing work is currently aimed at the penetration of the inner components of the structure, such as shear ties and C-frames, exploiting different frequency ranges and signal processing techniques. From the hardware and tool development side, different transducers and coupling methods, such as air-coupled transducers, are under investigation together with the design of a more suitable scanning technique.
Patterns of Care for Biologic-Dosing Outliers and Nonoutliers in Biologic-Naive Patients with Rheumatoid Arthritis.

PubMed

Delate, Thomas; Meyer, Roxanne; Jenkins, Daniel

2017-08-01

Although most biologic medications for patients with rheumatoid arthritis (RA) have recommended fixed dosing, actual biologic dosing may vary among real-world patients, since some patients can receive higher (high-dose outliers) or lower (low-dose outliers) doses than what is recommended in medication package inserts. To describe the patterns of care for biologic-dosing outliers and nonoutliers in biologic-naive patients with RA. This was a retrospective, longitudinal cohort study of patients with RA who were not pregnant and were aged ≥ 18 and < 90 years from an integrated health care delivery system. Patients were newly initiated on adalimumab (ADA), etanercept (ETN), or infliximab (IFX) as index biologic therapy between July 1, 2006, and February 28, 2014. Outlier status was defined as a patient having received at least 1 dose < 90% or > 110% of the approved dose in the package insert at any time during the study period. Baseline patient profiles, treatment exposures, and outcomes were collected during the 180 days before and up to 2 years after biologic initiation and compared across index biologic outlier groups. Patients were followed for at least 1 year, with a subanalysis of those patients who remained as members for 2 years. This study included 434 RA patients with 1 year of follow-up and 372 RA patients with 2 years of follow-up. Overall, the vast majority of patients were female (≈75%) and had similar baseline characteristics. Approximately 10% of patients were outliers in both follow-up cohorts. ETN patients were least likely to become outliers, and ADA patients were most likely to become outliers. Of all outliers during the 1-year follow-up, patients were more likely to be a high-dose outlier (55%) than a low-dose outlier (45%). Median 1- and 2-year adjusted total biologic costs (based on wholesale acquisition costs) were higher for ADA and ETA nonoutliers than for IFX nonoutliers. Biologic persistence was highest for IFX patients. Charlson Comorbidity Index score, ETN and IFX index biologic, and treatment with a nonbiologic disease-modifying antirheumatic drug (DMARD) before biologic initiation were associated with becoming high- or low-dose outliers (c-statistic = 0.79). Approximately 1 in 10 study patients with RA was identified as a biologic-dosing outlier. Dosing outliers did not appear to have better clinical outcomes compared with nonoutliers. Before initiating outlier biologic dosing, health care providers may better serve their RA patients by prescribing alternate DMARD therapy. This study was sponsored by Janssen Scientific Affairs. It is the policy of Janssen Scientific Affairs to publish all sponsored studies unless they are exploratory studies or are determined a priori for internal use only (e.g., to inform business decisions). Meyer is an employee of Janssen Scientific Affairs and a stockholder in Johnson and Johnson, its parent company. Delate and Jenkins have nothing to disclose. Study concept and design were contributed by Delate and Meyer. Delate took the lead in data collection, along with Jenkins. All authors participated in data analysis. The manuscript was written primarily by Delate, along with Meyers and Jenkins, and was revised by Meyer, along with Delate and Jenkins.
Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce

PubMed Central

Namroud, Marie-Claire; Beaulieu, Jean; Juge, Nicolas; Laroche, Jérôme; Bousquet, Jean

2008-01-01

Conifers are characterized by a large genome size and a rapid decay of linkage disequilibrium, most often within gene limits. Genome scans based on noncoding markers are less likely to detect molecular adaptation linked to genes in these species. In this study, we assessed the effectiveness of a genome-wide single nucleotide polymorphism (SNP) scan focused on expressed genes in detecting local adaptation in a conifer species. Samples were collected from six natural populations of white spruce (Picea glauca) moderately differentiated for several quantitative characters. A total of 534 SNPs representing 345 expressed genes were analysed. Genes potentially under natural selection were identified by estimating the differentiation in SNP frequencies among populations (FST) and identifying outliers, and by estimating local differentiation using a Bayesian approach. Both average expected heterozygosity and population differentiation estimates (HE = 0.270 and FST = 0.006) were comparable to those obtained with other genetic markers. Of all genes, 5.5% were identified as outliers with FST at the 95% confidence level, while 14% were identified as candidates for local adaptation with the Bayesian method. There was some overlap between the two gene sets. More than half of the candidate genes for local adaptation were specific to the warmest population, about 20% to the most arid population, and 15% to the coldest and most humid higher altitude population. These adaptive trends were consistent with the genes’ putative functions and the divergence in quantitative traits noted among the populations. The results suggest that an approach separating the locus and population effects is useful to identify genes potentially under selection. These candidates are worth exploring in more details at the physiological and ecological levels. PMID:18662225
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wijesooriya, K; Seitter, K; Desai, V

Purpose: To present our single institution experience on catching errors with trajectory log file analysis. The reported causes of failures, probability of occurrences (O), severity of effects (S), and the probability of the failures to be undetected (D) could be added to guidelines of FMEA analysis. Methods: From March 2013 to March 2014, 19569 patient treatment fields/arcs were analyzed. This work includes checking all 131 treatment delivery parameters for all patients, all treatment sites and all treatment delivery fractions. TrueBeam trajectory log files for all treatment field types as well as all imaging types were accessed, read in every 20ms,more » and every control point (total of 37 million parameters) compared to the physician approved plan in the planning system. Results: Couch angle outlier occurrence: N= 327, range = −1.7 −1.2 deg; gantry angle outlier occurrence: N =59, range = 0.09 – 5.61 deg, collimator angle outlier occurrence: N = 13, range = −0.2 – 0.2 deg. VMAT cases have slightly larger variations in mechanical parameters. MLC: 3D single control point fields have a maximum deviation of 0.04 mm, 39 step and shoot IMRT cases have MLC −0.3 – 0.5 mm deviations, all (1286) VMAT cases have −0.9 – 0.7 mm deviations. Two possible serious errors were found: 1) A 4 cm isocenter shift for the PA beam of an AP-PA pair, under-dosing a portion of PTV by 25%. 2) Delivery with MLC leaves abutted behind the jaws as opposed to the midline as planned, leading to a under-dosing of a small volume of the PTV by 25%, by just the boost plan. Due to their error origin, neither of these errors could have been detected by pre-treatment verification. Conclusion: Performing Trajectory Log file analysis could catch typically undetected errors to avoid potentially adverse incidents.« less
Age-dependent biochemical quantities: an approach for calculating reference intervals.

PubMed

Bjerner, J

2007-01-01

A parametric method is often preferred when calculating reference intervals for biochemical quantities, as non-parametric methods are less efficient and require more observations/study subjects. Parametric methods are complicated, however, because of three commonly encountered features. First, biochemical quantities seldom display a Gaussian distribution, and there must either be a transformation procedure to obtain such a distribution or a more complex distribution has to be used. Second, biochemical quantities are often dependent on a continuous covariate, exemplified by rising serum concentrations of MUC1 (episialin, CA15.3) with increasing age. Third, outliers often exert substantial influence on parametric estimations and therefore need to be excluded before calculations are made. The International Federation of Clinical Chemistry (IFCC) currently recommends that confidence intervals be calculated for the reference centiles obtained. However, common statistical packages allowing for the adjustment of a continuous covariate do not make this calculation. In the method described in the current study, Tukey's fence is used to eliminate outliers and two-stage transformations (modulus-exponential-normal) in order to render Gaussian distributions. Fractional polynomials are employed to model functions for mean and standard deviations dependent on a covariate, and the model is selected by maximum likelihood. Confidence intervals are calculated for the fitted centiles by combining parameter estimation and sampling uncertainties. Finally, the elimination of outliers was made dependent on covariates by reiteration. Though a good knowledge of statistical theory is needed when performing the analysis, the current method is rewarding because the results are of practical use in patient care.
Lilac and honeysuckle phenology data 1956–2014

PubMed Central

Rosemartin, Alyssa H.; Denny, Ellen G.; Weltzin, Jake F.; Lee Marsh, R.; Wilson, Bruce E.; Mehdipoor, Hamed; Zurita-Milla, Raul; Schwartz, Mark D.

2015-01-01

The dataset is comprised of leafing and flowering data collected across the continental United States from 1956 to 2014 for purple common lilac (Syringa vulgaris), a cloned lilac cultivar (S. x chinensis ‘Red Rothomagensis’) and two cloned honeysuckle cultivars (Lonicera tatarica ‘Arnold Red’ and L. korolkowii ‘Zabeli’). Applications of this observational dataset range from detecting regional weather patterns to understanding the impacts of global climate change on the onset of spring at the national scale. While minor changes in methods have occurred over time, and some documentation is lacking, outlier analyses identified fewer than 3% of records as unusually early or late. Lilac and honeysuckle phenology data have proven robust in both model development and climatic research. PMID:26306204
Lilac and honeysuckle phenology data 1956–2014

USGS Publications Warehouse

Rosemartin, Alyssa H.; Denny, Ellen G.; Weltzin, Jake F.; Marsh, R. Lee; Wilson, Bruce E.; Mehdipoor, Hamed; Zurita-Milla, Raul; Schwartz, Mark D.

2015-01-01

The dataset is comprised of leafing and flowering data collected across the continental United States from 1956 to 2014 for purple common lilac (Syringa vulgaris), a cloned lilac cultivar (S. x chinensis ‘Red Rothomagensis’) and two cloned honeysuckle cultivars (Lonicera tatarica ‘Arnold Red’ and L. korolkowii ‘Zabeli’). Applications of this observational dataset range from detecting regional weather patterns to understanding the impacts of global climate change on the onset of spring at the national scale. While minor changes in methods have occurred over time, and some documentation is lacking, outlier analyses identified fewer than 3% of records as unusually early or late. Lilac and honeysuckle phenology data have proven robust in both model development and climatic research.
A Phenological Legacy: Leafing and flowering data for lilacs and honeysuckles 1956-2014

DOE PAGES

Rosemartin, Alyssa; Denny, Ellen G.; Weltzin, Jake F.; ...

2015-07-21

The dataset is comprised of leafing and flowering data collected across the continental United States from 1956 to 2014 for purple common lilac (Syringa vulgaris), a cloned lilac cultivar (S. x chinensis Red Rothomagensis ) and two cloned honeysuckle cultivars (Lonicera tartarica Arnold Red and L. korolkowii Zabeli ). Applications of this observational dataset range from detecting regional weather patterns to understanding the impacts of global climate change on the onset of spring at the national scale. While minor changes in methods have occurred over time, and some documentation is lacking, outlier analyses identified fewer than 3% of records asmore » unusually early or late. Lilac and honeysuckle phenology data have proven robust in both model development and climatic research.« less
Tailor-made Surgical Guide Reduces Incidence of Outliers of Cup Placement.

PubMed

Hananouchi, Takehito; Saito, Masanobu; Koyama, Tsuyoshi; Sugano, Nobuhiko; Yoshikawa, Hideki

2010-04-01

Malalignment of the cup in total hip arthroplasty (THA) increases the risks of postoperative complications such as neck cup impingement, dislocation, and wear. We asked whether a tailor-made surgical guide based on CT images would reduce the incidence of outliers beyond 10 degrees from preoperatively planned alignment of the cup compared with those without the surgical guide. We prospectively followed 38 patients (38 hips, Group 1) having primary THA with the conventional technique and 31 patients (31 hips, Group 2) using the surgical guide. We designed the guide for Group 2 based on CT images and fixed it to the acetabular edge with a Kirschner wire to indicate the planned cup direction. Postoperative CT images showed the guide reduced the number of outliers compared with the conventional method (Group 1, 23.7%; Group 2, 0%). The surgical guide provided more reliable cup insertion compared with conventional techniques. Level II, therapeutic study. See the Guidelines for Authors for a complete description of levels of evidence.
Enhancement of partial robust M-regression (PRM) performance using Bisquare weight function

NASA Astrophysics Data System (ADS)

Mohamad, Mazni; Ramli, Norazan Mohamed; Ghani@Mamat, Nor Azura Md; Ahmad, Sanizah

2014-09-01

Partial Least Squares (PLS) regression is a popular regression technique for handling multicollinearity in low and high dimensional data which fits a linear relationship between sets of explanatory and response variables. Several robust PLS methods are proposed to accommodate the classical PLS algorithms which are easily affected with the presence of outliers. The recent one was called partial robust M-regression (PRM). Unfortunately, the use of monotonous weighting function in the PRM algorithm fails to assign appropriate and proper weights to large outliers according to their severity. Thus, in this paper, a modified partial robust M-regression is introduced to enhance the performance of the original PRM. A re-descending weight function, known as Bisquare weight function is recommended to replace the fair function in the PRM. A simulation study is done to assess the performance of the modified PRM and its efficiency is also tested in both contaminated and uncontaminated simulated data under various percentages of outliers, sample sizes and number of predictors.
Kfits: a software framework for fitting and cleaning outliers in kinetic measurements.

PubMed

Rimon, Oded; Reichmann, Dana

2018-01-01

Kinetic measurements have played an important role in elucidating biochemical and biophysical phenomena for over a century. While many tools for analysing kinetic measurements exist, most require low noise levels in the data, leaving outlier measurements to be cleaned manually. This is particularly true for protein misfolding and aggregation processes, which are extremely noisy and hence difficult to model. Understanding these processes is paramount, as they are associated with diverse physiological processes and disorders, most notably neurodegenerative diseases. Therefore, a better tool for analysing and cleaning protein aggregation traces is required. Here we introduce Kfits, an intuitive graphical tool for detecting and removing noise caused by outliers in protein aggregation kinetics data. Following its workflow allows the user to quickly and easily clean large quantities of data and receive kinetic parameters for assessment of the results. With minor adjustments, the software can be applied to any type of kinetic measurements, not restricted to protein aggregation. Kfits is implemented in Python and available online at http://kfits.reichmannlab.com, in source at https://github.com/odedrim/kfits/, or by direct installation from PyPI (`pip install kfits`). danare@mail.huji.ac.il. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Simulated performance of an order statistic threshold strategy for detection of narrowband signals

NASA Technical Reports Server (NTRS)

Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.

1988-01-01

The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.
MIDAS robust trend estimator for accurate GPS station velocities without step detection

PubMed Central

Kreemer, Corné; Hammond, William C.; Gazeaux, Julien

2016-01-01

Abstract Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil‐Sen median trend estimator, for which the ordinary version is the median of slopes vij = (xj–xi)/(tj–ti) computed between all data pairs i > j. For normally distributed data, Theil‐Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil‐Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one‐sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root‐mean‐square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences. PMID:27668140
MIDAS robust trend estimator for accurate GPS station velocities without step detection.

PubMed

Blewitt, Geoffrey; Kreemer, Corné; Hammond, William C; Gazeaux, Julien

2016-03-01

Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil-Sen median trend estimator, for which the ordinary version is the median of slopes v ij = ( x j -x i )/( t j -t i ) computed between all data pairs i > j . For normally distributed data, Theil-Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil-Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one-sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root-mean-square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences.
MIDAS robust trend estimator for accurate GPS station velocities without step detection

NASA Astrophysics Data System (ADS)

Blewitt, Geoffrey; Kreemer, Corné; Hammond, William C.; Gazeaux, Julien

2016-03-01

Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil-Sen median trend estimator, for which the ordinary version is the median of slopes vij = (xj-xi)/(tj-ti) computed between all data pairs i > j. For normally distributed data, Theil-Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil-Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one-sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root-mean-square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences.
[Meta analysis of three-dimensional printing patient-specific instrumentation versus conventional instrumentation in total knee arthroplasty].

PubMed

Ren, J T; Xu, C; Wang, J S; Liu, X L

2017-10-01

Objective: To evaluate the effects of three-dimensional printing patient-specific instrumentation(PSI) versus conventional instrumentation(CI) in the total knee arthroplasty. Methods: According to "patient-specific" , "patient-matched" , "custom" , "Instrumentation" , "Guide Instrumentation" , "cutting blocks" , "total knee arthroplasty" , "total knee replacement" , "TKA" and "TKR" , the literature on PubMed, EMbase, Cochrane library, CBM and WanFang were searched. According to the inclusion and exclusion criteria, the high quality randomized control trial (RCT) studies about three-dimensional (3D) printing patient-specific instrumentation versus conventional instrumentation in the total knee arthroplasty were collected. The post-operative limb mechanical axis outlier, the position of the components outlier, post-operative knee function, operative time, post-operative blood transfusion and complications were analyzed by RevMan 5.3 software. Results: A total of 13 high quality RCT studies were included. The results of Meta-analysis show that there were no statistical differences in the post-operative limb mechanical axis outlier( Z =0.55, P =0.58, 95% CI: 0.78 to 1.56), femoral coronal component outlier( Z =0.38, P =0.71, 95% CI: 0.69 to 1.72), tibia coronal component outlier( Z =1.95, P =0.05, 95% CI: 1.00 to 3.38), femoral rotation angle outlier( Z =0.36, P =0.72, 95% CI: 0.49 to 1.64), post-operative knee function( Z =1.18, P =0.24, 95% CI : -0.66 to 2.63), post-operative blood transfusions( Z =0.74, P =0.46, 95% CI: -0.10 to 0.05) and complications( Z =0.18, P =0.86, 95% CI: -0.07 to 0.05) between the PSI group and the CI group. But there are statistical differences in the operation time( Z =2.66, P =0.01, 95% CI: -15.97 to -2.41)and tibia sagittal component outlier ( Z =3.69, P =0.00, 95% CI: 1.43 to 3.18)between the PSI group and the CI group. Conclusions: In the primary total knee arthroplasty the PSI is not superior over the CI for the knee without severe knee varus or valgus deformity or contracture deformity, without the deformity around the knee and without the knee bone loss and obesity. The use of PSI in the primary total knee arthroplasty are not recommend.
The variance of length of stay and the optimal DRG outlier payments.

PubMed

Felder, Stefan

2009-09-01

Prospective payment schemes in health care often include supply-side insurance for cost outliers. In hospital reimbursement, prospective payments for patient discharges, based on their classification into diagnosis related group (DRGs), are complemented by outlier payments for long stay patients. The outlier scheme fixes the length of stay (LOS) threshold, constraining the profit risk of the hospitals. In most DRG systems, this threshold increases with the standard deviation of the LOS distribution. The present paper addresses the adequacy of this DRG outlier threshold rule for risk-averse hospitals with preferences depending on the expected value and the variance of profits. It first shows that the optimal threshold solves the hospital's tradeoff between higher profit risk and lower premium loading payments. It then demonstrates for normally distributed truncated LOS that the optimal outlier threshold indeed decreases with an increase in the standard deviation.
Predictors of High Profit and High Deficit Outliers under SwissDRG of a Tertiary Care Center

PubMed Central

Mehra, Tarun; Müller, Christian Thomas Benedikt; Volbracht, Jörk; Seifert, Burkhardt; Moos, Rudolf

2015-01-01

Principles Case weights of Diagnosis Related Groups (DRGs) are determined by the average cost of cases from a previous billing period. However, a significant amount of cases are largely over- or underfunded. We therefore decided to analyze earning outliers of our hospital as to search for predictors enabling a better grouping under SwissDRG. Methods 28,893 inpatient cases without additional private insurance discharged from our hospital in 2012 were included in our analysis. Outliers were defined by the interquartile range method. Predictors for deficit and profit outliers were determined with logistic regressions. Predictors were shortlisted with the LASSO regularized logistic regression method and compared to results of Random forest analysis. 10 of these parameters were selected for quantile regression analysis as to quantify their impact on earnings. Results Psychiatric diagnosis and admission as an emergency case were significant predictors for higher deficit with negative regression coefficients for all analyzed quantiles (p<0.001). Admission from an external health care provider was a significant predictor for a higher deficit in all but the 90% quantile (p<0.001 for Q10, Q20, Q50, Q80 and p = 0.0017 for Q90). Burns predicted higher earnings for cases which were favorably remunerated (p<0.001 for the 90% quantile). Osteoporosis predicted a higher deficit in the most underfunded cases, but did not predict differences in earnings for balanced or profitable cases (Q10 and Q20: p<0.00, Q50: p = 0.10, Q80: p = 0.88 and Q90: p = 0.52). ICU stay, mechanical and patient clinical complexity level score (PCCL) predicted higher losses at the 10% quantile but also higher profits at the 90% quantile (p<0.001). Conclusion We suggest considering psychiatric diagnosis, admission as an emergencay case and admission from an external health care provider as DRG split criteria as they predict large, consistent and significant losses. PMID:26517545
Linear discriminant analysis based on L1-norm maximization.

PubMed

Zhong, Fujin; Zhang, Jiashu

2013-08-01

Linear discriminant analysis (LDA) is a well-known dimensionality reduction technique, which is widely used for many purposes. However, conventional LDA is sensitive to outliers because its objective function is based on the distance criterion using L2-norm. This paper proposes a simple but effective robust LDA version based on L1-norm maximization, which learns a set of local optimal projection vectors by maximizing the ratio of the L1-norm-based between-class dispersion and the L1-norm-based within-class dispersion. The proposed method is theoretically proved to be feasible and robust to outliers while overcoming the singular problem of the within-class scatter matrix for conventional LDA. Experiments on artificial datasets, standard classification datasets and three popular image databases demonstrate the efficacy of the proposed method.

Abundant Topological Outliers in Social Media Data and Their Effect on Spatial Analysis

PubMed Central

Zipf, Alexander

2016-01-01

Twitter and related social media feeds have become valuable data sources to many fields of research. Numerous researchers have thereby used social media posts for spatial analysis, since many of them contain explicit geographic locations. However, despite its widespread use within applied research, a thorough understanding of the underlying spatial characteristics of these data is still lacking. In this paper, we investigate how topological outliers influence the outcomes of spatial analyses of social media data. These outliers appear when different users contribute heterogeneous information about different phenomena simultaneously from similar locations. As a consequence, various messages representing different spatial phenomena are captured closely to each other, and are at risk to be falsely related in a spatial analysis. Our results reveal indications for corresponding spurious effects when analyzing Twitter data. Further, we show how the outliers distort the range of outcomes of spatial analysis methods. This has significant influence on the power of spatial inferential techniques, and, more generally, on the validity and interpretability of spatial analysis results. We further investigate how the issues caused by topological outliers are composed in detail. We unveil that multiple disturbing effects are acting simultaneously and that these are related to the geographic scales of the involved overlapping patterns. Our results show that at some scale configurations, the disturbances added through overlap are more severe than at others. Further, their behavior turns into a volatile and almost chaotic fluctuation when the scales of the involved patterns become too different. Overall, our results highlight the critical importance of thoroughly considering the specific characteristics of social media data when analyzing them spatially. PMID:27611199
Environmental versus geographical determinants of genetic structure in two subalpine conifers.

PubMed

Mosca, Elena; González-Martínez, Santiago C; Neale, David B

2014-01-01

Alpine ecosystems are facing rapid human-induced environmental changes, and so more knowledge about tree adaptive potential is needed. This study investigated the relative role of isolation by distance (IBD) versus isolation by adaptation (IBA) in explaining population genetic structure in Abies alba and Larix decidua, based on 231 and 233 single nucleotide polymorphisms (SNPs) sampled across 36 and 22 natural populations, respectively, in the Alps and Apennines. Genetic structure was investigated for both geographical and environmental groups, using analysis of molecular variance (AMOVA). For each species, nine environmental groups were defined using climate variables selected from a multiple factor analysis. Complementary methods were applied to identify outliers based on these groups, and to test for IBD versus IBA. AMOVA showed weak but significant genetic structure for both species, with higher values in L. decidua. Among the potential outliers detected, up to two loci were found for geographical groups and up to seven for environmental groups. A stronger effect of IBD than IBA was found in both species; nevertheless, once spatial effects had been removed, temperature and soil in A. alba, and precipitation in both species, were relevant factors explaining genetic structure. Based on our findings, in the Alpine region, genetic structure seems to be affected by both geographical isolation and environmental gradients, creating opportunities for local adaptation. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Detection of anomalous signals in temporally correlated data (Invited)

NASA Astrophysics Data System (ADS)

Langbein, J. O.

2010-12-01

Detection of transient tectonic signals in data obtained from large geodetic networks requires the ability to detect signals that are both temporally and spatially coherent. In this report I will describe a modification to an existing method that estimates both the coefficients of temporally correlated noise model and an efficient filter based on the noise model. This filter, when applied to the original time-series, effectively whitens (or flattens) the power spectrum. The filtered data provide the means to calculate running averages which are then used to detect deviations from the background trends. For large networks, time-series of signal-to-noise ratio (SNR) can be easily constructed since, by filtering, each of the original time-series has been transformed into one that is closer to having a Gaussian distribution with a variance of 1.0. Anomalous intervals may be identified by counting the number of GPS sites for which the SNR exceeds a specified value. For example, during one time interval, if there were 5 out of 20 time-series with SNR>2, this would be considered anomalous; typically, one would expect at 95% confidence that there would be at least 1 out of 20 time-series with an SNR>2. For time intervals with an anomalously large number of high SNR, the spatial distribution of the SNR is mapped to identify the location of the anomalous signal(s) and their degree of spatial clustering. Estimating the filter that should be used to whiten the data requires modification of the existing methods that employ maximum likelihood estimation to determine the temporal covariance of the data. In these methods, it is assumed that the noise components in the data are a combination of white, flicker and random-walk processes and that they are derived from three different and independent sources. Instead, in this new method, the covariance matrix is constructed assuming that only one source is responsible for the noise and that source can be represented as a white-noise random-number generator convolved with a filter whose spectral properties are frequency (f) independent at its highest frequencies, 1/f at the middle frequencies, and 1/f2 at the lowest frequencies. For data sets with no gaps in their time-series, construction of covariance and inverse covariance matrices is extremely efficient. Application of the above algorithm to real data potentially involves several iterations as small, tectonic signals of interest are often indistinguishable from background noise. Consequently, simply plotting the time-series of each GPS site is used to identify the largest outliers and signals independent of their cause. Any analysis of the background noise levels must factor in these other signals while the gross outliers need to be removed.
Investigating outliers to improve conceptual models of bedrock aquifers

NASA Astrophysics Data System (ADS)

Worthington, Stephen R. H.

2018-06-01

Numerical models play a prominent role in hydrogeology, with simplifying assumptions being inevitable when implementing these models. However, there is a risk of oversimplification, where important processes become neglected. Such processes may be associated with outliers, and consideration of outliers can lead to an improved scientific understanding of bedrock aquifers. Using rigorous logic to investigate outliers can help to explain fundamental scientific questions such as why there are large variations in permeability between different bedrock lithologies.
Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations.

PubMed

Bakker, Marjan; Wicherts, Jelte M

2014-09-01

In psychology, outliers are often excluded before running an independent samples t test, and data are often nonnormal because of the use of sum scores based on tests and questionnaires. This article concerns the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores. After reviewing common practice, we present results of simulations of artificial and actual psychological data, which show that the removal of outliers based on commonly used Z value thresholds severely increases the Type I error rate. We found Type I error rates of above 20% after removing outliers with a threshold value of Z = 2 in a short and difficult test. Inflations of Type I error rates are particularly severe when researchers are given the freedom to alter threshold values of Z after having seen the effects thereof on outcomes. We recommend the use of nonparametric Mann-Whitney-Wilcoxon tests or robust Yuen-Welch tests without removing outliers. These alternatives to independent samples t tests are found to have nominal Type I error rates with a minimal loss of power when no outliers are present in the data and to have nominal Type I error rates and good power when outliers are present. PsycINFO Database Record (c) 2014 APA, all rights reserved.
A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

PubMed

Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun

2014-01-01

As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.
Detecting Solar-like Oscillations in Red Giants with Deep Learning

NASA Astrophysics Data System (ADS)

Hon, Marc; Stello, Dennis; Zinn, Joel C.

2018-05-01

Time-resolved photometry of tens of thousands of red giant stars from space missions like Kepler and K2 has created the need for automated asteroseismic analysis methods. The first and most fundamental step in such analysis is to identify which stars show oscillations. It is critical that this step be performed with no, or little, detection bias, particularly when performing subsequent ensemble analyses that aim to compare the properties of observed stellar populations with those from galactic models. However, an efficient, automated solution to this initial detection step still has not been found, meaning that expert visual inspection of data from each star is required to obtain the highest level of detections. Hence, to mimic how an expert eye analyzes the data, we use supervised deep learning to not only detect oscillations in red giants, but also to predict the location of the frequency at maximum power, ν max, by observing features in 2D images of power spectra. By training on Kepler data, we benchmark our deep-learning classifier against K2 data that are given detections by the expert eye, achieving a detection accuracy of 98% on K2 Campaign 6 stars and a detection accuracy of 99% on K2 Campaign 3 stars. We further find that the estimated uncertainty of our deep-learning-based ν max predictions is about 5%. This is comparable to human-level performance using visual inspection. When examining outliers, we find that the deep-learning results are more likely to provide robust ν max estimates than the classical model-fitting method.
Improving the space surveillance telescope's performance using multi-hypothesis testing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chris Zingarelli, J.; Cain, Stephen; Pearce, Eric

2014-05-01

The Space Surveillance Telescope (SST) is a Defense Advanced Research Projects Agency program designed to detect objects in space like near Earth asteroids and space debris in the geosynchronous Earth orbit (GEO) belt. Binary hypothesis test (BHT) methods have historically been used to facilitate the detection of new objects in space. In this paper a multi-hypothesis detection strategy is introduced to improve the detection performance of SST. In this context, the multi-hypothesis testing (MHT) determines if an unresolvable point source is in either the center, a corner, or a side of a pixel in contrast to BHT, which only testsmore » whether an object is in the pixel or not. The images recorded by SST are undersampled such as to cause aliasing, which degrades the performance of traditional detection schemes. The equations for the MHT are derived in terms of signal-to-noise ratio (S/N), which is computed by subtracting the background light level around the pixel being tested and dividing by the standard deviation of the noise. A new method for determining the local noise statistics that rejects outliers is introduced in combination with the MHT. An experiment using observations of a known GEO satellite are used to demonstrate the improved detection performance of the new algorithm over algorithms previously reported in the literature. The results show a significant improvement in the probability of detection by as much as 50% over existing algorithms. In addition to detection, the S/N results prove to be linearly related to the least-squares estimates of point source irradiance, thus improving photometric accuracy.« less
Multi-temporal thermal analyses for submarine groundwater discharge (SGD) detection over large spatial scales in the Mediterranean

NASA Astrophysics Data System (ADS)

Hennig, Hanna; Mallast, Ulf; Merz, Ralf

2015-04-01

Submarine groundwater discharge (SGD) sites act as important pathways for nutrients and contaminants that deteriorate marine ecosystems. In the Mediterranean it is estimated that 75% of freshwater input is contributed from karst aquifers. Thermal remote sensing can be used for a pre-screening of potential SGD sites in order to optimize field surveys. Although different platforms (ground-, air- and spaceborne) may serve for thermal remote sensing, the most cost-effective are spaceborne platforms (satellites) that likewise cover the largest spatial scale (>100 km per image). Therefore an automatized and objective approach that uses thermal satellite images from Landsat 7 and Landsat 8 was used to localize potential SGD sites on a large spatial scale. The method using descriptive statistic parameter specially range and standard deviation by (Mallast et al., 2014) was adapted to the Mediterranean Sea. Since the method was developed for the Dead Sea were satellite images with cloud cover are rare and no sea level change occurs through tidal cycles it was essential to adapt the method to a region where tidal cycles occur and cloud cover is more frequent . These adaptations include: (1) an automatic and adaptive coastline detection (2) include and process cloud covered scenes to enlarge the data basis, (3) implement tidal data in order to analyze low tide images as SGD is enhanced during these phases and (4) test the applicability for Landsat 8 images that will provide data in the future once Landsat 7 stops working. As previously shown, the range method shows more accurate results compared to the standard deviation. However, the result exclusively depends on two scenes (minimum and maximum) and is largely influenced by outliers. Counteracting on this drawback we developed a new approach. Since it is assumed that sea surface temperature (SST) is stabilized by groundwater at SGD sites, the slope of a bootstrapped linear model fitted to sorted SST per pixel would be less steep than the slope of the surrounding area, resulting in less influence through outliers and an equal weighting of all integrated scenes. Both methods could be used to detect SGD sites in the Mediterranean regardless to the discharge characteristics (diffuse and focused) exceptions are sites with deep emergences. Better results could be shown in bays compared to more exposed sites. Since the range of the SST is mostly influenced by maximum and minimum of the scenes, the slope approach can be seen as a more representative method using all scenes. References: Mallast, U., Gloaguen, R., Friesen, J., Rödiger, T., Geyer, S., Merz, R., Siebert, C., 2014. How to identify groundwater-caused thermal anomalies in lakes based on multi-temporal satellite data in semi-arid regions. Hydrol. Earth Syst. Sci. 18 (7), 2773-2787.
Privacy-preserving outlier detection through random nonlinear data distortion.

PubMed

Bhaduri, Kanishka; Stefanski, Mark D; Srivastava, Ashok N

2011-02-01

Consider a scenario in which the data owner has some private or sensitive data and wants a data miner to access them for studying important patterns without revealing the sensitive information. Privacy-preserving data mining aims to solve this problem by randomly transforming the data prior to their release to the data miners. Previous works only considered the case of linear data perturbations--additive, multiplicative, or a combination of both--for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy-preserving anomaly detection from sensitive data sets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that, for specific cases, it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. The experiments conducted on real-life data sets demonstrate the effectiveness of the approach.
42 CFR 484.240 - Methodology used for the calculation of the outlier payment.

Code of Federal Regulations, 2010 CFR

2010-10-01

... for each case-mix group. (b) The outlier threshold for each case-mix group is the episode payment... the same for all case-mix groups. (c) The outlier payment is a proportion of the amount of estimated...
Robust estimation of partially linear models for longitudinal data with dropouts and measurement error.

PubMed

Qin, Guoyou; Zhang, Jiajia; Zhu, Zhongyi; Fung, Wing

2016-12-20

Outliers, measurement error, and missing data are commonly seen in longitudinal data because of its data collection process. However, no method can address all three of these issues simultaneously. This paper focuses on the robust estimation of partially linear models for longitudinal data with dropouts and measurement error. A new robust estimating equation, simultaneously tackling outliers, measurement error, and missingness, is proposed. The asymptotic properties of the proposed estimator are established under some regularity conditions. The proposed method is easy to implement in practice by utilizing the existing standard generalized estimating equations algorithms. The comprehensive simulation studies show the strength of the proposed method in dealing with longitudinal data with all three features. Finally, the proposed method is applied to data from the Lifestyle Education for Activity and Nutrition study and confirms the effectiveness of the intervention in producing weight loss at month 9. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Weak lensing magnification in the Dark Energy Survey Science Verification Data

DOE PAGES

Garcia-Fernandez, M.; et al.

2018-02-02

In this paper the effect of weak lensing magnification on galaxy number counts is studied by cross-correlating the positions of two galaxy samples, separated by redshift, using data from the Dark Energy Survey Science Verification dataset. The analysis is carried out for two photometrically-selected galaxy samples, with mean photometric redshifts in themore » $0.2 < z < 0.4$ and $0.7 < z < 1.0$ ranges, in the riz bands. A signal is detected with a $$3.5\\sigma$$ significance level in each of the bands tested, and is compatible with the magnification predicted by the $$\\Lambda$$CDM model. After an extensive analysis, it cannot be attributed to any known systematic effect. The detection of the magnification signal is robust to estimated uncertainties in the outlier rate of the pho- tometric redshifts, but this will be an important issue for use of photometric redshifts in magnification mesurements from larger samples. In addition to the detection of the magnification signal, a method to select the sample with the maximum signal-to-noise is proposed and validated with data.« less
Weak lensing magnification in the Dark Energy Survey Science Verification Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garcia-Fernandez, M.; et al.

In this paper the effect of weak lensing magnification on galaxy number counts is studied by cross-correlating the positions of two galaxy samples, separated by redshift, using data from the Dark Energy Survey Science Verification dataset. The analysis is carried out for two photometrically-selected galaxy samples, with mean photometric redshifts in themore » $0.2 < z < 0.4$ and $0.7 < z < 1.0$ ranges, in the riz bands. A signal is detected with a $$3.5\\sigma$$ significance level in each of the bands tested, and is compatible with the magnification predicted by the $$\\Lambda$$CDM model. After an extensive analysis, it cannot be attributed to any known systematic effect. The detection of the magnification signal is robust to estimated uncertainties in the outlier rate of the pho- tometric redshifts, but this will be an important issue for use of photometric redshifts in magnification mesurements from larger samples. In addition to the detection of the magnification signal, a method to select the sample with the maximum signal-to-noise is proposed and validated with data.« less
Weak lensing magnification in the Dark Energy Survey Science Verification Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garcia-Fernandez, M.; et al.

2016-11-30

In this paper the effect of weak lensing magnification on galaxy number counts is studied by cross-correlating the positions of two galaxy samples, separated by redshift, using data from the Dark Energy Survey Science Verification dataset. The analysis is carried out for two photometrically-selected galaxy samples, with mean photometric redshifts in themore » $0.2 < z < 0.4$ and $0.7 < z < 1.0$ ranges, in the riz bands. A signal is detected with a $$3.5\\sigma$$ significance level in each of the bands tested, and is compatible with the magnification predicted by the $$\\Lambda$$CDM model. After an extensive analysis, it cannot be attributed to any known systematic effect. The detection of the magnification signal is robust to estimated uncertainties in the outlier rate of the pho- tometric redshifts, but this will be an important issue for use of photometric redshifts in magnification mesurements from larger samples. In addition to the detection of the magnification signal, a method to select the sample with the maximum signal-to-noise is proposed and validated with data.« less
Hierarchical Kohonenen net for anomaly detection in network security.

PubMed

Sarasamma, Suseela T; Zhu, Qiuming A; Huff, Julie

2005-04-01

A novel multilevel hierarchical Kohonen Net (K-Map) for an intrusion detection system is presented. Each level of the hierarchical map is modeled as a simple winner-take-all K-Map. One significant advantage of this multilevel hierarchical K-Map is its computational efficiency. Unlike other statistical anomaly detection methods such as nearest neighbor approach, K-means clustering or probabilistic analysis that employ distance computation in the feature space to identify the outliers, our approach does not involve costly point-to-point computation in organizing the data into clusters. Another advantage is the reduced network size. We use the classification capability of the K-Map on selected dimensions of data set in detecting anomalies. Randomly selected subsets that contain both attacks and normal records from the KDD Cup 1999 benchmark data are used to train the hierarchical net. We use a confidence measure to label the clusters. Then we use the test set from the same KDD Cup 1999 benchmark to test the hierarchical net. We show that a hierarchical K-Map in which each layer operates on a small subset of the feature space is superior to a single-layer K-Map operating on the whole feature space in detecting a variety of attacks in terms of detection rate as well as false positive rate.
Efficient hybrid monocular-stereo approach to on-board video-based traffic sign detection and tracking

NASA Astrophysics Data System (ADS)

Marinas, Javier; Salgado, Luis; Arróspide, Jon; Camplani, Massimo

2012-01-01

In this paper we propose an innovative method for the automatic detection and tracking of road traffic signs using an onboard stereo camera. It involves a combination of monocular and stereo analysis strategies to increase the reliability of the detections such that it can boost the performance of any traffic sign recognition scheme. Firstly, an adaptive color and appearance based detection is applied at single camera level to generate a set of traffic sign hypotheses. In turn, stereo information allows for sparse 3D reconstruction of potential traffic signs through a SURF-based matching strategy. Namely, the plane that best fits the cloud of 3D points traced back from feature matches is estimated using a RANSAC based approach to improve robustness to outliers. Temporal consistency of the 3D information is ensured through a Kalman-based tracking stage. This also allows for the generation of a predicted 3D traffic sign model, which is in turn used to enhance the previously mentioned color-based detector through a feedback loop, thus improving detection accuracy. The proposed solution has been tested with real sequences under several illumination conditions and in both urban areas and highways, achieving very high detection rates in challenging environments, including rapid motion and significant perspective distortion.
Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach

ERIC Educational Resources Information Center

Magis, David; De Boeck, Paul

2011-01-01

We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…
BKG/DGFI Combination Center Annual Report 2012

NASA Technical Reports Server (NTRS)

Bachmann, Sabine; Loesler, Michael; Heinkelmann, Robert; Gerstl, Michael

2013-01-01

This report summarizes the activities of the Federal Agency for Cartography and Geodesy (Bundesamt fuer Kartographie und Geodaesie, BKG) and the German Geodetic Research Institute (Deutsches Geodaetisches Forschungsinstitut, DGFI)BKG/DGFI Combination Center in 2011 and outlines the planned activities for the year 2012. The main focus was to stabilize outlier detection and to update the Web presentation of the combined products.
Extensive cross-talk and global regulators identified from an analysis of the integrated transcriptional and signaling network in Escherichia coli.

PubMed

Antiqueira, Lucas; Janga, Sarath Chandra; Costa, Luciano da Fontoura

2012-11-01

To understand the regulatory dynamics of transcription factors (TFs) and their interplay with other cellular components we have integrated transcriptional, protein-protein and the allosteric or equivalent interactions which mediate the physiological activity of TFs in Escherichia coli. To study this integrated network we computed a set of network measurements followed by principal component analysis (PCA), investigated the correlations between network structure and dynamics, and carried out a procedure for motif detection. In particular, we show that outliers identified in the integrated network based on their network properties correspond to previously characterized global transcriptional regulators. Furthermore, outliers are highly and widely expressed across conditions, thus supporting their global nature in controlling many genes in the cell. Motifs revealed that TFs not only interact physically with each other but also obtain feedback from signals delivered by signaling proteins supporting the extensive cross-talk between different types of networks. Our analysis can lead to the development of a general framework for detecting and understanding global regulatory factors in regulatory networks and reinforces the importance of integrating multiple types of interactions in underpinning the interrelationships between them.

Moving standard deviation and moving sum of outliers as quality tools for monitoring analytical precision.

PubMed

Liu, Jiakai; Tan, Chin Hon; Badrick, Tony; Loh, Tze Ping

2018-02-01

An increase in analytical imprecision (expressed as CV a ) can introduce additional variability (i.e. noise) to the patient results, which poses a challenge to the optimal management of patients. Relatively little work has been done to address the need for continuous monitoring of analytical imprecision. Through numerical simulations, we describe the use of moving standard deviation (movSD) and a recently described moving sum of outlier (movSO) patient results as means for detecting increased analytical imprecision, and compare their performances against internal quality control (QC) and the average of normal (AoN) approaches. The power of detecting an increase in CV a is suboptimal under routine internal QC procedures. The AoN technique almost always had the highest average number of patient results affected before error detection (ANPed), indicating that it had generally the worst capability for detecting an increased CV a . On the other hand, the movSD and movSO approaches were able to detect an increased CV a at significantly lower ANPed, particularly for measurands that displayed a relatively small ratio of biological variation to CV a. CONCLUSION: The movSD and movSO approaches are effective in detecting an increase in CV a for high-risk measurands with small biological variation. Their performance is relatively poor when the biological variation is large. However, the clinical risks of an increase in analytical imprecision is attenuated for these measurands as an increased analytical imprecision will only add marginally to the total variation and less likely to impact on the clinical care. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
A hybrid approach to advancing quantitative prediction of tissue distribution of basic drugs in human

DOE Office of Scientific and Technical Information (OSTI.GOV)

Poulin, Patrick, E-mail: patrick-poulin@videotron.ca; Ekins, Sean; Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, 20 Penn Street, Baltimore, MD 21201

A general toxicity of basic drugs is related to phospholipidosis in tissues. Therefore, it is essential to predict the tissue distribution of basic drugs to facilitate an initial estimate of that toxicity. The objective of the present study was to further assess the original prediction method that consisted of using the binding to red blood cells measured in vitro for the unbound drug (RBCu) as a surrogate for tissue distribution, by correlating it to unbound tissue:plasma partition coefficients (Kpu) of several tissues, and finally to predict volume of distribution at steady-state (V{sub ss}) in humans under in vivo conditions. Thismore » correlation method demonstrated inaccurate predictions of V{sub ss} for particular basic drugs that did not follow the original correlation principle. Therefore, the novelty of this study is to provide clarity on the actual hypotheses to identify i) the impact of pharmacological mode of action on the generic correlation of RBCu-Kpu, ii) additional mechanisms of tissue distribution for the outlier drugs, iii) molecular features and properties that differentiate compounds as outliers in the original correlation analysis in order to facilitate its applicability domain alongside the properties already used so far, and finally iv) to present a novel and refined correlation method that is superior to what has been previously published for the prediction of human V{sub ss} of basic drugs. Applying a refined correlation method after identifying outliers would facilitate the prediction of more accurate distribution parameters as key inputs used in physiologically based pharmacokinetic (PBPK) and phospholipidosis models.« less
A robust interpolation method for constructing digital elevation models from remote sensing data

NASA Astrophysics Data System (ADS)

Chen, Chuanfa; Liu, Fengying; Li, Yanyan; Yan, Changqing; Liu, Guolin

2016-09-01

A digital elevation model (DEM) derived from remote sensing data often suffers from outliers due to various reasons such as the physical limitation of sensors and low contrast of terrain textures. In order to reduce the effect of outliers on DEM construction, a robust algorithm of multiquadric (MQ) methodology based on M-estimators (MQ-M) was proposed. MQ-M adopts an adaptive weight function with three-parts. The weight function is null for large errors, one for small errors and quadric for others. A mathematical surface was employed to comparatively analyze the robustness of MQ-M, and its performance was compared with those of the classical MQ and a recently developed robust MQ method based on least absolute deviation (MQ-L). Numerical tests show that MQ-M is comparative to the classical MQ and superior to MQ-L when sample points follow normal and Laplace distributions, and under the presence of outliers the former is more accurate than the latter. A real-world example of DEM construction using stereo images indicates that compared with the classical interpolation methods, such as natural neighbor (NN), ordinary kriging (OK), ANUDEM, MQ-L and MQ, MQ-M has a better ability of preserving subtle terrain features. MQ-M replaces thin plate spline for reference DEM construction to assess the contribution to our recently developed multiresolution hierarchical classification method (MHC). Classifying the 15 groups of benchmark datasets provided by the ISPRS Commission demonstrates that MQ-M-based MHC is more accurate than MQ-L-based and TPS-based MHCs. MQ-M has high potential for DEM construction.
kruX: matrix-based non-parametric eQTL discovery.

PubMed

Qi, Jianlong; Asl, Hassan Foroughi; Björkegren, Johan; Michoel, Tom

2014-01-14

The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com.
Robust Indoor Human Activity Recognition Using Wireless Signals.

PubMed

Wang, Yi; Jiang, Xinli; Cao, Rongyu; Wang, Xiyang

2015-07-15

Wireless signals-based activity detection and recognition technology may be complementary to the existing vision-based methods, especially under the circumstance of occlusions, viewpoint change, complex background, lighting condition change, and so on. This paper explores the properties of the channel state information (CSI) of Wi-Fi signals, and presents a robust indoor daily human activity recognition framework with only one pair of transmission points (TP) and access points (AP). First of all, some indoor human actions are selected as primitive actions forming a training set. Then, an online filtering method is designed to make actions' CSI curves smooth and allow them to contain enough pattern information. Each primitive action pattern can be segmented from the outliers of its multi-input multi-output (MIMO) signals by a proposed segmentation method. Lastly, in online activities recognition, by selecting proper features and Support Vector Machine (SVM) based multi-classification, activities constituted by primitive actions can be recognized insensitive to the locations, orientations, and speeds.
Estimating clinical chemistry reference values based on an existing data set of unselected animals.

PubMed

Dimauro, Corrado; Bonelli, Piero; Nicolussi, Paola; Rassu, Salvatore P G; Cappio-Borlino, Aldo; Pulina, Giuseppe

2008-11-01

In an attempt to standardise the determination of biological reference values, the International Federation of Clinical Chemistry (IFCC) has published a series of recommendations on developing reference intervals. The IFCC recommends the use of an a priori sampling of at least 120 healthy individuals. However, such a high number of samples and laboratory analysis is expensive, time-consuming and not always feasible, especially in veterinary medicine. In this paper, an alternative (a posteriori) method is described and is used to determine reference intervals for biochemical parameters of farm animals using an existing laboratory data set. The method used was based on the detection and removal of outliers to obtain a large sample of animals likely to be healthy from the existing data set. This allowed the estimation of reliable reference intervals for biochemical parameters in Sarda dairy sheep. This method may also be useful for the determination of reference intervals for different species, ages and gender.
Dry column-thermal energy analyzer method for determining N-nitrosopyrrolidine in fried bacon: collaborative study.

PubMed

Fiddler, W; Pensabene, J W; Gates, R A; Phillips, J G

1984-01-01

A dry column method for isolating N-nitrosopyrrolidine (NPYR) from fried, cure-pumped bacon and detection by gas chromatography-thermal energy analyzer (TEA) was studied collaboratively. Testing the results obtained from 11 collaborators for homogeneous variances among samples resulted in splitting the nonzero samples into 2 groups of sample levels, each with similar variances. Outlying results were identified by AOAC-recommended procedures, and laboratories having outliers within a group were excluded. Results from the 9 collaborators remaining in the low group yielded coefficients of variation (CV) of 6.00% and 7.47% for repeatability and reproducibility, respectively, and the 8 collaborators remaining in the high group yielded CV values of 5.64% and 13.72%, respectively. An 85.2% overall average recovery of the N-nitrosoazetidine internal standard was obtained with an average laboratory CV of 10.5%. The method has been adopted official first action as an alternative to the mineral oil distillation-TEA screening procedure.
The influence of outliers on results of wet deposition measurements as a function of measurement strategy

NASA Astrophysics Data System (ADS)

Slanina, J.; Möls, J. J.; Baard, J. H.

The results of a wet deposition monitoring experiment, carried out by eight identical wet-only precipitation samplers operating on the basis of 24 h samples, have been used to investigate the accuracy and uncertainties in wet deposition measurements. The experiment was conducted near Lelystad, The Netherlands over the period 1 March 1983-31 December 1985. By rearranging the data for one to eight samplers and sampling periods of 1 day to 1 month both systematic and random errors were investigated as a function of measuring strategy. A Gaussian distribution of the results was observed. Outliers, detected by a Dixon test ( a = 0.05) influenced strongly both the yearly averaged results and the standard deviation of this average as a function of the number of samplers and the length of the sampling period. The systematic bias in bulk elements, using one sampler, varies typically from 2 to 20% and for trace elements from 10 to 500%, respectively. Severe problems are encountered in the case of Zn, Cu, Cr, Ni and especially Cd. For the sensitive detection of trends generally more than one sampler per measuring station is necessary as the standard deviation in the yearly averaged wet deposition is typically 10-20% relative for one sampler. Using three identical samplers, trends of, e.g. 3% per year will be generally detected in 6 years.
VizieR Online Data Catalog: SDSS-DR9 photometric redshifts (Brescia+, 2014)

NASA Astrophysics Data System (ADS)

Brescia, M.; Cavuoti, S.; Longo, G.; de Stefano, V.

2014-07-01

We present an application of a machine learning method to the estimation of photometric redshifts for the galaxies in the SDSS Data Release 9 (SDSS-DR9). Photometric redshifts for more than 143 million galaxies were produced. The MLPQNA (Multi Layer Perceptron with Quasi Newton Algorithm) model provided within the framework of the DAMEWARE (DAta Mining and Exploration Web Application REsource) is an interpolative method derived from machine learning models. The obtained redshifts have an overall uncertainty of σ=0.023 with a very small average bias of about 3x10-5 and a fraction of catastrophic outliers of about 5%. After removal of the catastrophic outliers, the uncertainty is about σ=0.017. The catalogue files report in their name the range of DEC degrees related to the included objects. (60 data files).
Groundwater depth prediction in a shallow aquifer in north China by a quantile regression model

NASA Astrophysics Data System (ADS)

Li, Fawen; Wei, Wan; Zhao, Yong; Qiao, Jiale

2017-01-01

There is a close relationship between groundwater level in a shallow aquifer and the surface ecological environment; hence, it is important to accurately simulate and predict the groundwater level in eco-environmental construction projects. The multiple linear regression (MLR) model is one of the most useful methods to predict groundwater level (depth); however, the predicted values by this model only reflect the mean distribution of the observations and cannot effectively fit the extreme distribution data (outliers). The study reported here builds a prediction model of groundwater-depth dynamics in a shallow aquifer using the quantile regression (QR) method on the basis of the observed data of groundwater depth and related factors. The proposed approach was applied to five sites in Tianjin city, north China, and the groundwater depth was calculated in different quantiles, from which the optimal quantile was screened out according to the box plot method and compared to the values predicted by the MLR model. The results showed that the related factors in the five sites did not follow the standard normal distribution and that there were outliers in the precipitation and last-month (initial state) groundwater-depth factors because the basic assumptions of the MLR model could not be achieved, thereby causing errors. Nevertheless, these conditions had no effect on the QR model, as it could more effectively describe the distribution of original data and had a higher precision in fitting the outliers.
Specifying the Probability Characteristics of Funnel Plot Control Limits: An Investigation of Three Approaches

PubMed Central

Manktelow, Bradley N.; Seaton, Sarah E.

2012-01-01

Background Emphasis is increasingly being placed on the monitoring and comparison of clinical outcomes between healthcare providers. Funnel plots have become a standard graphical methodology to identify outliers and comprise plotting an outcome summary statistic from each provider against a specified ‘target’ together with upper and lower control limits. With discrete probability distributions it is not possible to specify the exact probability that an observation from an ‘in-control’ provider will fall outside the control limits. However, general probability characteristics can be set and specified using interpolation methods. Guidelines recommend that providers falling outside such control limits should be investigated, potentially with significant consequences, so it is important that the properties of the limits are understood. Methods Control limits for funnel plots for the Standardised Mortality Ratio (SMR) based on the Poisson distribution were calculated using three proposed interpolation methods and the probability calculated of an ‘in-control’ provider falling outside of the limits. Examples using published data were shown to demonstrate the potential differences in the identification of outliers. Results The first interpolation method ensured that the probability of an observation of an ‘in control’ provider falling outside either limit was always less than a specified nominal probability (p). The second method resulted in such an observation falling outside either limit with a probability that could be either greater or less than p, depending on the expected number of events. The third method led to a probability that was always greater than, or equal to, p. Conclusion The use of different interpolation methods can lead to differences in the identification of outliers. This is particularly important when the expected number of events is small. We recommend that users of these methods be aware of the differences, and specify which interpolation method is to be used prior to any analysis. PMID:23029202
HIV quality report cards: impact of case-mix adjustment and statistical methods.

PubMed

Ohl, Michael E; Richardson, Kelly K; Goto, Michihiko; Vaughan-Sarrazin, Mary; Schweizer, Marin L; Perencevich, Eli N

2014-10-15

There will be increasing pressure to publicly report and rank the performance of healthcare systems on human immunodeficiency virus (HIV) quality measures. To inform discussion of public reporting, we evaluated the influence of case-mix adjustment when ranking individual care systems on the viral control quality measure. We used data from the Veterans Health Administration (VHA) HIV Clinical Case Registry and administrative databases to estimate case-mix adjusted viral control for 91 local systems caring for 12 368 patients. We compared results using 2 adjustment methods, the observed-to-expected estimator and the risk-standardized ratio. Overall, 10 913 patients (88.2%) achieved viral control (viral load ≤400 copies/mL). Prior to case-mix adjustment, system-level viral control ranged from 51% to 100%. Seventeen (19%) systems were labeled as low outliers (performance significantly below the overall mean) and 11 (12%) as high outliers. Adjustment for case mix (patient demographics, comorbidity, CD4 nadir, time on therapy, and income from VHA administrative databases) reduced the number of low outliers by approximately one-third, but results differed by method. The adjustment model had moderate discrimination (c statistic = 0.66), suggesting potential for unadjusted risk when using administrative data to measure case mix. Case-mix adjustment affects rankings of care systems on the viral control quality measure. Given the sensitivity of rankings to selection of case-mix adjustment methods-and potential for unadjusted risk when using variables limited to current administrative databases-the HIV care community should explore optimal methods for case-mix adjustment before moving forward with public reporting. Published by Oxford University Press on behalf of the Infectious Diseases Society of America 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Efficient l1 -norm-based low-rank matrix approximations for large-scale problems using alternating rectified gradient method.

PubMed

Kim, Eunwoo; Lee, Minsik; Choi, Chong-Ho; Kwak, Nojun; Oh, Songhwai

2015-02-01

Low-rank matrix approximation plays an important role in the area of computer vision and image processing. Most of the conventional low-rank matrix approximation methods are based on the l2 -norm (Frobenius norm) with principal component analysis (PCA) being the most popular among them. However, this can give a poor approximation for data contaminated by outliers (including missing data), because the l2 -norm exaggerates the negative effect of outliers. Recently, to overcome this problem, various methods based on the l1 -norm, such as robust PCA methods, have been proposed for low-rank matrix approximation. Despite the robustness of the methods, they require heavy computational effort and substantial memory for high-dimensional data, which is impractical for real-world problems. In this paper, we propose two efficient low-rank factorization methods based on the l1 -norm that find proper projection and coefficient matrices using the alternating rectified gradient method. The proposed methods are applied to a number of low-rank matrix approximation problems to demonstrate their efficiency and robustness. The experimental results show that our proposals are efficient in both execution time and reconstruction performance unlike other state-of-the-art methods.
Resampling approach for anomalous change detection

NASA Astrophysics Data System (ADS)

Theiler, James; Perkins, Simon

2007-04-01

We investigate the problem of identifying pixels in pairs of co-registered images that correspond to real changes on the ground. Changes that are due to environmental differences (illumination, atmospheric distortion, etc.) or sensor differences (focus, contrast, etc.) will be widespread throughout the image, and the aim is to avoid these changes in favor of changes that occur in only one or a few pixels. Formal outlier detection schemes (such as the one-class support vector machine) can identify rare occurrences, but will be confounded by pixels that are "equally rare" in both images: they may be anomalous, but they are not changes. We describe a resampling scheme we have developed that formally addresses both of these issues, and reduces the problem to a binary classification, a problem for which a large variety of machine learning tools have been developed. In principle, the effects of misregistration will manifest themselves as pervasive changes, and our method will be robust against them - but in practice, misregistration remains a serious issue.
Erosion-tectonics feedbacks in shaping the landscape: An example from the Mekele Outlier (Tigray, Ethiopia)

NASA Astrophysics Data System (ADS)

Sembroni, Andrea; Molin, Paola; Dramis, Francesco; Faccenna, Claudio; Abebe, Bekele

2017-05-01

An outlier consists of an area of younger rocks surrounded by older ones. Its formation is mainly related to the erosion of surrounding rocks which causes the interruption of the original continuity of the rocks. Because of its origin, an outlier is an important witness of the paleogeography of a region and, therefore, essential to understand its topographic and geological evolution. The Mekele Outlier (N Ethiopia) is characterized by poorly incised Mesozoic marine sediments and dolerites (∼2000 m in elevation), surrounded by strongly eroded Precambrian and Paleozoic rocks and Tertiary volcanic deposits in a context of a mantle supported topography. In the past, studies about the Mekele outlier focused mainly in the mere description of the stratigraphic and tectonic settings without taking into account the feedback between surface and deep processes in shaping such peculiar feature. In this study we present the geological and geomorphometric analyses of the Mekele Outlier taking into account the general topographic features (slope map, swath profiles, local relief), the river network and the principal tectonic lineaments of the outlier. The results trace the evolution of the study area as related not only to the mere erosion of the surrounding rocks but to a complex interaction between surface and deep processes where the lithology played a crucial role.
Updating the Standard Spatial Observer for Contrast Detection

NASA Technical Reports Server (NTRS)

Ahumada, Albert J.; Watson, Andrew B.

2011-01-01

Watson and Ahmuada (2005) constructed a Standard Spatial Observer (SSO) model for foveal luminance contrast signal detection based on the Medelfest data (Watson, 1999). Here we propose two changes to the model, dropping the oblique effect from the CSF and using the cone density data of Curcio et al. (1990) to estimate the variation of sensitivity with eccentricity. Dropping the complex images, and using medians to exclude outlier data points, the SSO model now accounts for essentially all the predictable variance in the data, with an RMS prediction error of only 0.67 dB.
Finding the Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions.

PubMed

Hoban, Sean; Kelley, Joanna L; Lotterhos, Katie E; Antolin, Michael F; Bradburd, Gideon; Lowry, David B; Poss, Mary L; Reed, Laura K; Storfer, Andrew; Whitlock, Michael C

2016-10-01

Uncovering the genetic and evolutionary basis of local adaptation is a major focus of evolutionary biology. The recent development of cost-effective methods for obtaining high-quality genome-scale data makes it possible to identify some of the loci responsible for adaptive differences among populations. Two basic approaches for identifying putatively locally adaptive loci have been developed and are broadly used: one that identifies loci with unusually high genetic differentiation among populations (differentiation outlier methods) and one that searches for correlations between local population allele frequencies and local environments (genetic-environment association methods). Here, we review the promises and challenges of these genome scan methods, including correcting for the confounding influence of a species' demographic history, biases caused by missing aspects of the genome, matching scales of environmental data with population structure, and other statistical considerations. In each case, we make suggestions for best practices for maximizing the accuracy and efficiency of genome scans to detect the underlying genetic basis of local adaptation. With attention to their current limitations, genome scan methods can be an important tool in finding the genetic basis of adaptive evolutionary change.
A Hierarchical Convolutional Neural Network for vesicle fusion event classification.

PubMed

Li, Haohan; Mao, Yunxiang; Yin, Zhaozheng; Xu, Yingke

2017-09-01

Quantitative analysis of vesicle exocytosis and classification of different modes of vesicle fusion from the fluorescence microscopy are of primary importance for biomedical researches. In this paper, we propose a novel Hierarchical Convolutional Neural Network (HCNN) method to automatically identify vesicle fusion events in time-lapse Total Internal Reflection Fluorescence Microscopy (TIRFM) image sequences. Firstly, a detection and tracking method is developed to extract image patch sequences containing potential fusion events. Then, a Gaussian Mixture Model (GMM) is applied on each image patch of the patch sequence with outliers rejected for robust Gaussian fitting. By utilizing the high-level time-series intensity change features introduced by GMM and the visual appearance features embedded in some key moments of the fusion process, the proposed HCNN architecture is able to classify each candidate patch sequence into three classes: full fusion event, partial fusion event and non-fusion event. Finally, we validate the performance of our method on 9 challenging datasets that have been annotated by cell biologists, and our method achieves better performances when comparing with three previous methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Detection of Patient Subgroups with Differential Expression in Omics Data: A Comprehensive Comparison of Univariate Measures

PubMed Central

Ahrens, Maike; Turewicz, Michael; Casjens, Swaantje; May, Caroline; Pesch, Beate; Stephan, Christian; Woitalla, Dirk; Gold, Ralf; Brüning, Thomas; Meyer, Helmut E.

2013-01-01

Detection of yet unknown subgroups showing differential gene or protein expression is a frequent goal in the analysis of modern molecular data. Applications range from cancer biology over developmental biology to toxicology. Often a control and an experimental group are compared, and subgroups can be characterized by differential expression for only a subgroup-specific set of genes or proteins. Finding such genes and corresponding patient subgroups can help in understanding pathological pathways, diagnosis and defining drug targets. The size of the subgroup and the type of differential expression determine the optimal strategy for subgroup identification. To date, commonly used software packages hardly provide statistical tests and methods for the detection of such subgroups. Different univariate methods for subgroup detection are characterized and compared, both on simulated and on real data. We present an advanced design for simulation studies: Data is simulated under different distributional assumptions for the expression of the subgroup, and performance results are compared against theoretical upper bounds. For each distribution, different degrees of deviation from the majority of observations are considered for the subgroup. We evaluate classical approaches as well as various new suggestions in the context of omics data, including outlier sum, PADGE, and kurtosis. We also propose the new FisherSum score. ROC curve analysis and AUC values are used to quantify the ability of the methods to distinguish between genes or proteins with and without certain subgroup patterns. In general, FisherSum for small subgroups and -test for large subgroups achieve best results. We apply each method to a case-control study on Parkinson's disease and underline the biological benefit of the new method. PMID:24278130
Quality control of the soil moisture probe response patterns from a green infrastructure site using Dynamic Time Warping (DTW) and association rule learning

NASA Astrophysics Data System (ADS)

Yu, Z.; Bedig, A.; Quigley, M.; Montalto, F. A.

2017-12-01

In-situ field monitoring can help to improve the design and management of decentralized Green Infrastructure (GI) systems in urban areas. Because of the vast quantity of continuous data generated from multi-site sensor systems, cost-effective post-construction opportunities for real-time control are limited; and the physical processes that influence the observed phenomena (e.g. soil moisture) are hard to track and control. To derive knowledge efficiently from real-time monitoring data, there is currently a need to develop more efficient approaches to data quality control. In this paper, we employ dynamic time warping method to compare the similarity of two soil moisture patterns without ignoring the inherent autocorrelation. We also use a rule-based machine learning method to investigate the feasibility of detecting anomalous responses from soil moisture probes. The data was generated from both individual and clusters of probes, deployed in a GI site in Milwaukee, WI. In contrast to traditional QAQC methods, which seek to detect outliers at individual time steps, the new method presented here converts the continuous time series into event-based symbolic sequences from which unusual response patterns can be detected. Different Matching rules are developed on different physical characteristics for different seasons. The results suggest that this method could be used alternatively to detect sensor failure, to identify extreme events, and to call out abnormal change patterns, compared to intra-probe and inter-probe historical observations. Though this algorithm was developed for soil moisture probes, the same approach could easily be extended to advance QAQC efficiency for any continuous environmental datasets.

A Review of Subsequence Time Series Clustering

PubMed Central

Teh, Ying Wah

2014-01-01

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies. PMID:25140332
A review of subsequence time series clustering.

PubMed

Zolhavarieh, Seyedjamal; Aghabozorgi, Saeed; Teh, Ying Wah

2014-01-01

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.
Gas detection by correlation spectroscopy employing a multimode diode laser.

PubMed

Lou, Xiutao; Somesfalean, Gabriel; Zhang, Zhiguo

2008-05-01

A gas sensor based on the gas-correlation technique has been developed using a multimode diode laser (MDL) in a dual-beam detection scheme. Measurement of CO(2) mixed with CO as an interfering gas is successfully demonstrated using a 1570 nm tunable MDL. Despite overlapping absorption spectra and occasional mode hops, the interfering signals can be effectively excluded by a statistical procedure including correlation analysis and outlier identification. The gas concentration is retrieved from several pair-correlated signals by a linear-regression scheme, yielding a reliable and accurate measurement. This demonstrates the utility of the unsophisticated MDLs as novel light sources for gas detection applications.
Comparison of mode estimation methods and application in molecular clock analysis

NASA Technical Reports Server (NTRS)

Hedges, S. Blair; Shah, Prachi

2003-01-01

BACKGROUND: Distributions of time estimates in molecular clock studies are sometimes skewed or contain outliers. In those cases, the mode is a better estimator of the overall time of divergence than the mean or median. However, different methods are available for estimating the mode. We compared these methods in simulations to determine their strengths and weaknesses and further assessed their performance when applied to real data sets from a molecular clock study. RESULTS: We found that the half-range mode and robust parametric mode methods have a lower bias than other mode methods under a diversity of conditions. However, the half-range mode suffers from a relatively high variance and the robust parametric mode is more susceptible to bias by outliers. We determined that bootstrapping reduces the variance of both mode estimators. Application of the different methods to real data sets yielded results that were concordant with the simulations. CONCLUSION: Because the half-range mode is a simple and fast method, and produced less bias overall in our simulations, we recommend the bootstrapped version of it as a general-purpose mode estimator and suggest a bootstrap method for obtaining the standard error and 95% confidence interval of the mode.
Detection of latent fingerprints using high-resolution 3D confocal microscopy in non-planar acquisition scenarios

NASA Astrophysics Data System (ADS)

Kirst, Stefan; Vielhauer, Claus

2015-03-01

In digitized forensics the support of investigators in any manner is one of the main goals. Using conservative lifting methods, the detection of traces is done manually. For non-destructive contactless methods, the necessity for detecting traces is obvious for further biometric analysis. High resolutional 3D confocal laser scanning microscopy (CLSM) grants the possibility for a detection by segmentation approach with improved detection results. Optimal scan results with CLSM are achieved on surfaces orthogonal to the sensor, which is not always possible due to environmental circumstances or the surface's shape. This introduces additional noise, outliers and a lack of contrast, making a detection of traces even harder. Prior work showed the possibility of determining angle-independent classification models for the detection of latent fingerprints (LFP). Enhancing this approach, we introduce a larger feature space containing a variety of statistical-, roughness-, color-, edge-directivity-, histogram-, Gabor-, gradient- and Tamura features based on raw data and gray-level co-occurrence matrices (GLCM) using high resolutional data. Our test set consists of eight different surfaces for the detection of LFP in four different acquisition angles with a total of 1920 single scans. For each surface and angles in steps of 10, we capture samples from five donors to introduce variance by a variety of sweat compositions and application influences such as pressure or differences in ridge thickness. By analyzing the present test set with our approach, we intend to determine angle- and substrate-dependent classification models to determine optimal surface specific acquisition setups and also classification models for a general detection purpose for both, angles and substrates. The results on overall models with classification rates up to 75.15% (kappa 0.50) already show a positive tendency regarding the usability of the proposed methods for LFP detection on varying surfaces in non-planar scenarios.
40 CFR Appendix Xviii to Part 86 - Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks...

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Statistical Outlier Identification... (CONTINUED) Pt. 86, App. XVIII Appendix XVIII to Part 86—Statistical Outlier Identification Procedure for..., but suffer theoretical deficiencies if statistical significance tests are required. Consequently, the...
40 CFR Appendix Xviii to Part 86 - Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks...

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 19 2010-07-01 2010-07-01 false Statistical Outlier Identification... (CONTINUED) Pt. 86, App. XVIII Appendix XVIII to Part 86—Statistical Outlier Identification Procedure for..., but suffer theoretical deficiencies if statistical significance tests are required. Consequently, the...
Information Processing Research.

DTIC Science & Technology

1986-09-01

Kuroe. The 3D MOSAIC Scene Understanding System. In Alan Bundy, Editor, Proceedings of the Eighth International Joint Conference on Artificial ... Artificial Jntelligencel7(1-3):409-460, August, 1981. Given a single picture which is a projection of a three-dimensional scene onto the two...values are detected as outliers by computing the distribution of values over a sliding 80 msec window. During the third pass (based on artificial
A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds.

PubMed

Yang, Songbai; Li, Xiuling; Li, Kui; Fan, Bin; Tang, Zhonglin

2014-01-15

Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs.
A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds

PubMed Central

2014-01-01

Background Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. Results We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. Conclusions In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs. PMID:24422716
Detecting Outliers in Marathon Data by Means of the Andrews Plot

NASA Astrophysics Data System (ADS)

Stehlík, Milan; Wald, Helmut; Bielik, Viktor; Petrovič, Juraj

2011-09-01

For an optimal race performance, it is important, that the runner keeps steady pace during most of the time of the competition. First time runners or athletes without many competitions often experience an "blow out" after a few kilometers of the race. This could happen, because of strong emotional experiences or low control of running intensity. Competition pace of half marathon of the middle level recreational athletes is approximately 10 sec quicker than their training pace. If an athlete runs the first third of race (7 km) at a pace that is 20 sec quicker than is his capacity (trainability), he would experience an "blow out" in the last third of the race. This would be reflected by reducing the running intensity and inability to keep steady pace in the last kilometers of the race and in the final time as well. In sports science, there are many diagnostic methods ([3], [2], [6]) that are used for prediction of optimal race pace tempo and final time. Otherwise there is lacking practical evidence of diagnostics methods and its use in the field (competition, race). One of the conditions that needs to be carried out is that athletes have not only similar final times, but it is important that they keep constant pace as much as possible during whole race. For this reason it is very important to find outliers. Our experimental group consisted of 20 recreational trained athletes (mean age 32,6 years±8,9). Before the race the athletes were instructed to run on the basis of their subjective feeling and previous experience. The data (running pace of each kilometer, average and maximal heart rate of each kilometer) were collected by GPS-enabled personal trainer Forerunner 305.
Robust feature matching via support-line voting and affine-invariant ratios

NASA Astrophysics Data System (ADS)

Li, Jiayuan; Hu, Qingwu; Ai, Mingyao; Zhong, Ruofei

2017-10-01

Robust image matching is crucial for many applications of remote sensing and photogrammetry, such as image fusion, image registration, and change detection. In this paper, we propose a robust feature matching method based on support-line voting and affine-invariant ratios. We first use popular feature matching algorithms, such as SIFT, to obtain a set of initial matches. A support-line descriptor based on multiple adaptive binning gradient histograms is subsequently applied in the support-line voting stage to filter outliers. In addition, we use affine-invariant ratios computed by a two-line structure to refine the matching results and estimate the local affine transformation. The local affine model is more robust to distortions caused by elevation differences than the global affine transformation, especially for high-resolution remote sensing images and UAV images. Thus, the proposed method is suitable for both rigid and non-rigid image matching problems. Finally, we extract as many high-precision correspondences as possible based on the local affine extension and build a grid-wise affine model for remote sensing image registration. We compare the proposed method with six state-of-the-art algorithms on several data sets and show that our method significantly outperforms the other methods. The proposed method achieves 94.46% average precision on 15 challenging remote sensing image pairs, while the second-best method, RANSAC, only achieves 70.3%. In addition, the number of detected correct matches of the proposed method is approximately four times the number of initial SIFT matches.
Self-calibration of photometric redshift scatter in weak-lensing surveys

DOE PAGES

Zhang, Pengjie; Pen, Ue -Li; Bernstein, Gary

2010-06-11

Photo-z errors, especially catastrophic errors, are a major uncertainty for precision weak lensing cosmology. We find that the shear-(galaxy number) density and density-density cross correlation measurements between photo-z bins, available from the same lensing surveys, contain valuable information for self-calibration of the scattering probabilities between the true-z and photo-z bins. The self-calibration technique we propose does not rely on cosmological priors nor parameterization of the photo-z probability distribution function, and preserves all of the cosmological information available from shear-shear measurement. We estimate the calibration accuracy through the Fisher matrix formalism. We find that, for advanced lensing surveys such as themore » planned stage IV surveys, the rate of photo-z outliers can be determined with statistical uncertainties of 0.01-1% for z < 2 galaxies. Among the several sources of calibration error that we identify and investigate, the galaxy distribution bias is likely the most dominant systematic error, whereby photo-z outliers have different redshift distributions and/or bias than non-outliers from the same bin. This bias affects all photo-z calibration techniques based on correlation measurements. As a result, galaxy bias variations of O(0.1) produce biases in photo-z outlier rates similar to the statistical errors of our method, so this galaxy distribution bias may bias the reconstructed scatters at several-σ level, but is unlikely to completely invalidate the self-calibration technique.« less
Adaptive Kalman filter for indoor localization using Bluetooth Low Energy and inertial measurement unit.

PubMed

Yoon, Paul K; Zihajehzadeh, Shaghayegh; Bong-Soo Kang; Park, Edward J

2015-08-01

This paper proposes a novel indoor localization method using the Bluetooth Low Energy (BLE) and an inertial measurement unit (IMU). The multipath and non-line-of-sight errors from low-power wireless localization systems commonly result in outliers, affecting the positioning accuracy. We address this problem by adaptively weighting the estimates from the IMU and BLE in our proposed cascaded Kalman filter (KF). The positioning accuracy is further improved with the Rauch-Tung-Striebel smoother. The performance of the proposed algorithm is compared against that of the standard KF experimentally. The results show that the proposed algorithm can maintain high accuracy for position tracking the sensor in the presence of the outliers.
How recalibration method, pricing, and coding affect DRG weights

PubMed Central

Carter, Grace M.; Rogowski, Jeannette A.

1992-01-01

We compared diagnosis-related group (DRG) weights calculated using the hospital-specific relative-value (HSR V) methodology with those calculated using the standard methodology for each year from 1985 through 1989 and analyzed differences between the two methods in detail for 1989. We provide evidence suggesting that classification error and subsidies of higher weighted cases by lower weighted cases caused compression in the weights used for payment as late as the fifth year of the prospective payment system. However, later weights calculated by the standard method are not compressed because a statistical correlation between high markups and high case-mix indexes offsets the cross-subsidization. HSR V weights from the same files are compressed because this methodology is more sensitive to cross-subsidies. However, both sets of weights produce equally good estimates of hospital-level costs net of those expenses that are paid by outlier payments. The greater compression of the HSR V weights is counterbalanced by the fact that more high-weight cases qualify as outliers. PMID:10127456
Robust group-wise rigid registration of point sets using t-mixture model

NASA Astrophysics Data System (ADS)

Ravikumar, Nishant; Gooya, Ali; Frangi, Alejandro F.; Taylor, Zeike A.

2016-03-01

A probabilistic framework for robust, group-wise rigid alignment of point-sets using a mixture of Students t-distribution especially when the point sets are of varying lengths, are corrupted by an unknown degree of outliers or in the presence of missing data. Medical images (in particular magnetic resonance (MR) images), their segmentations and consequently point-sets generated from these are highly susceptible to corruption by outliers. This poses a problem for robust correspondence estimation and accurate alignment of shapes, necessary for training statistical shape models (SSMs). To address these issues, this study proposes to use a t-mixture model (TMM), to approximate the underlying joint probability density of a group of similar shapes and align them to a common reference frame. The heavy-tailed nature of t-distributions provides a more robust registration framework in comparison to state of the art algorithms. Significant reduction in alignment errors is achieved in the presence of outliers, using the proposed TMM-based group-wise rigid registration method, in comparison to its Gaussian mixture model (GMM) counterparts. The proposed TMM-framework is compared with a group-wise variant of the well-known Coherent Point Drift (CPD) algorithm and two other group-wise methods using GMMs, using both synthetic and real data sets. Rigid alignment errors for groups of shapes are quantified using the Hausdorff distance (HD) and quadratic surface distance (QSD) metrics.
Improving the Space Surveillance Telescope's Performance Using Multi-Hypothesis Testing

NASA Astrophysics Data System (ADS)

Zingarelli, J. Chris; Pearce, Eric; Lambour, Richard; Blake, Travis; Peterson, Curtis J. R.; Cain, Stephen

2014-05-01

The Space Surveillance Telescope (SST) is a Defense Advanced Research Projects Agency program designed to detect objects in space like near Earth asteroids and space debris in the geosynchronous Earth orbit (GEO) belt. Binary hypothesis test (BHT) methods have historically been used to facilitate the detection of new objects in space. In this paper a multi-hypothesis detection strategy is introduced to improve the detection performance of SST. In this context, the multi-hypothesis testing (MHT) determines if an unresolvable point source is in either the center, a corner, or a side of a pixel in contrast to BHT, which only tests whether an object is in the pixel or not. The images recorded by SST are undersampled such as to cause aliasing, which degrades the performance of traditional detection schemes. The equations for the MHT are derived in terms of signal-to-noise ratio (S/N), which is computed by subtracting the background light level around the pixel being tested and dividing by the standard deviation of the noise. A new method for determining the local noise statistics that rejects outliers is introduced in combination with the MHT. An experiment using observations of a known GEO satellite are used to demonstrate the improved detection performance of the new algorithm over algorithms previously reported in the literature. The results show a significant improvement in the probability of detection by as much as 50% over existing algorithms. In addition to detection, the S/N results prove to be linearly related to the least-squares estimates of point source irradiance, thus improving photometric accuracy. The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Mass Spectral Library Quality Assurance by Inter-Library Comparison

NASA Astrophysics Data System (ADS)

Wallace, William E.; Ji, Weihua; Tchekhovskoi, Dmitrii V.; Phinney, Karen W.; Stein, Stephen E.

2017-04-01

A method to discover and correct errors in mass spectral libraries is described. Comparing across a set of highly curated reference libraries compounds that have the same chemical structure quickly identifies entries that are outliers. In cases where three or more entries for the same compound are compared, the outlier as determined by visual inspection was almost always found to contain the error. These errors were either in the spectrum itself or in the chemical descriptors that accompanied it. The method is demonstrated on finding errors in compounds of forensic interest in the NIST/EPA/NIH Mass Spectral Library. The target list of compounds checked was the Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) mass spectral library. Some examples of errors found are described. A checklist of errors that curators should look for when performing inter-library comparisons is provided.
Effects of rooting via out-groups on in-group topology in phylogeny.

PubMed

Ackerman, Margareta; Brown, Daniel G; Loker, David

2014-01-01

Users of phylogenetic methods require rooted trees, because the direction of time depends on the placement of the root. While phylogenetic trees are typically rooted by using an out-group, this mechanism is inappropriate when the addition of an out-group changes the in-group topology. We perform a formal analysis of phylogenetic algorithms under the inclusion of distant out-groups. It turns out that linkage-based algorithms (including UPGMA) and a class of bisecting methods do not modify the topology of the in-group when an out-group is included. By contrast, the popular neighbour joining algorithm fails this property in a strong sense: every data set can have its structure destroyed by some arbitrarily distant outlier. Furthermore, including multiple outliers can lead to an arbitrary topology on the in-group. The standard rooting approach that uses out-groups may be fundamentally unsuited for neighbour joining.
Discriminant locality preserving projections based on L1-norm maximization.

PubMed

Zhong, Fujin; Zhang, Jiashu; Li, Defang

2014-11-01

Conventional discriminant locality preserving projection (DLPP) is a dimensionality reduction technique based on manifold learning, which has demonstrated good performance in pattern recognition. However, because its objective function is based on the distance criterion using L2-norm, conventional DLPP is not robust to outliers which are present in many applications. This paper proposes an effective and robust DLPP version based on L1-norm maximization, which learns a set of local optimal projection vectors by maximizing the ratio of the L1-norm-based locality preserving between-class dispersion and the L1-norm-based locality preserving within-class dispersion. The proposed method is proven to be feasible and also robust to outliers while overcoming the small sample size problem. The experimental results on artificial datasets, Binary Alphadigits dataset, FERET face dataset and PolyU palmprint dataset have demonstrated the effectiveness of the proposed method.

New fuzzy support vector machine for the class imbalance problem in medical datasets classification.

PubMed

Gu, Xiaoqing; Ni, Tongguang; Wang, Hongyuan

2014-01-01

In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.
Mass Spectral Library Quality Assurance by Inter-Library Comparison

PubMed Central

Wallace, W.E.; Ji, W.; Tchekhovskoi, D.V.; Phinney, K.W.; Stein, S.E.

2017-01-01

A method to discover and correct errors in mass spectral libraries is described. Comparing across a set of highly curated reference libraries compounds that have the same chemical structure quickly identifies entries that are outliers. In cases where three or more entries for the same compound are compared the outlier as determined by visual inspection was almost always found to contain the error. These errors were either in the spectrum itself or in the chemical descriptors that accompanied it. The method is demonstrated on finding errors in compounds of forensic interest in the NIST/EPA/NIH Mass Spectral Library. The target list of compounds checked was the Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) mass spectral library. Some examples of errors found are described. A checklist of errors that curators should look for when performing inter-library comparisons is provided. PMID:28127680
How Significant Is a Boxplot Outlier?

ERIC Educational Resources Information Center

Dawson, Robert

2011-01-01

It is common to consider Tukey's schematic ("full") boxplot as an informal test for the existence of outliers. While the procedure is useful, it should be used with caution, as at least 30% of samples from a normally-distributed population of any size will be flagged as containing an outlier, while for small samples (N less than 10) even extreme…
The Impact of Outliers on Cronbach's Coefficient Alpha Estimate of Reliability: Visual Analogue Scales

ERIC Educational Resources Information Center

Liu, Yan; Zumbo, Bruno D.

2007-01-01

The impact of outliers on Cronbach's coefficient [alpha] has not been documented in the psychometric or statistical literature. This is an important gap because coefficient [alpha] is the most widely used measurement statistic in all of the social, educational, and health sciences. The impact of outliers on coefficient [alpha] is investigated for…
42 CFR 412.82 - Payment for extended length-of-stay cases (day outliers).

Code of Federal Regulations, 2012 CFR

2012-10-01

... 42 Public Health 2 2012-10-01 2012-10-01 false Payment for extended length-of-stay cases (day outliers). 412.82 Section 412.82 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Certain Replaced Devices Payment for Outlier Cases § 412.82 Payment for extended length-of-stay cases (day...
42 CFR 412.82 - Payment for extended length-of-stay cases (day outliers).

Code of Federal Regulations, 2013 CFR

2013-10-01

... 42 Public Health 2 2013-10-01 2013-10-01 false Payment for extended length-of-stay cases (day outliers). 412.82 Section 412.82 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Certain Replaced Devices Payment for Outlier Cases § 412.82 Payment for extended length-of-stay cases (day...
42 CFR 412.82 - Payment for extended length-of-stay cases (day outliers).

Code of Federal Regulations, 2011 CFR

2011-10-01

... 42 Public Health 2 2011-10-01 2011-10-01 false Payment for extended length-of-stay cases (day outliers). 412.82 Section 412.82 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Certain Replaced Devices Payment for Outlier Cases § 412.82 Payment for extended length-of-stay cases (day...
42 CFR 412.82 - Payment for extended length-of-stay cases (day outliers).

Code of Federal Regulations, 2010 CFR

2010-10-01

... 42 Public Health 2 2010-10-01 2010-10-01 false Payment for extended length-of-stay cases (day outliers). 412.82 Section 412.82 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Certain Replaced Devices Payment for Outlier Cases § 412.82 Payment for extended length-of-stay cases (day...
Comparing 2 methods of assessing 30-day readmissions: what is the impact on hospital profiling in the veterans health administration?

PubMed

Mull, Hillary J; Chen, Qi; O'Brien, William J; Shwartz, Michael; Borzecki, Ann M; Hanchate, Amresh; Rosen, Amy K

2013-07-01

The Centers for Medicare and Medicaid Services' (CMS) all-cause readmission measure and the 3M Health Information System Division Potentially Preventable Readmissions (PPR) measure are both used for public reporting. These 2 methods have not been directly compared in terms of how they identify high-performing and low-performing hospitals. To examine how consistently the CMS and PPR methods identify performance outliers, and explore how the PPR preventability component impacts hospital readmission rates, public reporting on CMS' Hospital Compare website, and pay-for-performance under CMS' Hospital Readmission Reduction Program for 3 conditions (acute myocardial infarction, heart failure, and pneumonia). We applied the CMS all-cause model and the PPR software to VA administrative data to calculate 30-day observed FY08-10 VA hospital readmission rates and hospital profiles. We then tested the effect of preventability on hospital readmission rates and outlier identification for reporting and pay-for-performance by replacing the dependent variable in the CMS all-cause model (Yes/No readmission) with the dichotomous PPR outcome (Yes/No preventable readmission). The CMS and PPR methods had moderate correlations in readmission rates for each condition. After controlling for all methodological differences but preventability, correlations increased to >90%. The assessment of preventability yielded different outlier results for public reporting in 7% of hospitals; for 30% of hospitals there would be an impact on Hospital Readmission Reduction Program reimbursement rates. Despite uncertainty over which readmission measure is superior in evaluating hospital performance, we confirmed that there are differences in CMS-generated and PPR-generated hospital profiles for reporting and pay-for-performance, because of methodological differences and the PPR's preventability component.
Initiating statistical process control to improve quality outcomes in colorectal surgery.

PubMed

Keller, Deborah S; Stulberg, Jonah J; Lawrence, Justin K; Samia, Hoda; Delaney, Conor P

2015-12-01

Unexpected variations in postoperative length of stay (LOS) negatively impact resources and patient outcomes. Statistical process control (SPC) measures performance, evaluates productivity, and modifies processes for optimal performance. The goal of this study was to initiate SPC to identify LOS outliers and evaluate its feasibility to improve outcomes in colorectal surgery. Review of a prospective database identified colorectal procedures performed by a single surgeon. Patients were grouped into elective and emergent categories and then stratified by laparoscopic and open approaches. All followed a standardized enhanced recovery protocol. SPC was applied to identify outliers and evaluate causes within each group. A total of 1294 cases were analyzed--83% elective (n = 1074) and 17% emergent (n = 220). Emergent cases were 70.5% open and 29.5% laparoscopic; elective cases were 36.8% open and 63.2% laparoscopic. All groups had a wide range in LOS. LOS outliers ranged from 8.6% (elective laparoscopic) to 10.8% (emergent laparoscopic). Evaluation of outliers demonstrated patient characteristics of higher ASA scores, longer operating times, ICU requirement, and temporary nursing at discharge. Outliers had higher postoperative complication rates in elective open (57.1 vs. 20.0%) and elective lap groups (77.6 vs. 26.1%). Outliers also had higher readmission rates for emergent open (11.4 vs. 5.4%), emergent lap (14.3 vs. 9.2%), and elective lap (32.8 vs. 6.9%). Elective open outliers did not follow trends of longer LOS or higher reoperation rates. SPC is feasible and promising for improving colorectal surgery outcomes. SPC identified patient and process characteristics associated with increased LOS. SPC may allow real-time outlier identification, during quality improvement efforts, and reevaluation of outcomes after introducing process change. SPC has clinical implications for improving patient outcomes and resource utilization.
Patient specific guides for total knee arthroplasty are ready for primetime

PubMed Central

Schotanus, Martijn GM; Boonen, Bert; Kort, Nanne P

2016-01-01

AIM: To present the radiological results of total knee arthroplasty (TKA) with use of patient specific matched guides (PSG) from different manufacturer in patients suffering from severe osteoarthritis of the knee joint. METHODS: This study describes the results of 57 knees operated with 4 different PSG systems and a group operated with conventional instrumentation (n = 60) by a single surgeon. The PSG systems were compared with each other and subdivided into cut- and pin PSG. The biomechanical axis [hip-knee-ankle angle (HKA)], varus/valgus of the femur [frontal femoral component (FFC)] and tibia (frontal tibial component) component, flexion/extension of the femur [flexion/extension of the femur component (LFC)] and posterior slope of the tibia [lateral tibial component (LTC)] component were evaluated on long-leg standing and lateral X-rays. A percentage of > 3° deviation was seen as an outlier. RESULTS: The inter class correlation coefficient (ICC) revealed that radiographic measurements between both assessors were reliable (ICC > 0.8). Fisher exact test was used to test differences of proportions. The percentage of outliers of the HKA-axis was comparable between both the PSG and conventional groups (12.28% vs 18.33%, P < 0.424) and the cut- and pin PSG groups (14.3% vs 10.3%, P < 1.00). The percentage of outliers of the FFC (0% vs 18.33%, P < 0.000), LFC (15.78% vs 58.33%, P < 0.000) and LTC (15.78% vs 41.67%, P < 0.033) were significant different in favour of the PSG group. There were no significant differences regarding the outliers between the individual PSG systems and the PSG group subdivided into cut- and pin PSG. CONCLUSION: PSG for TKA show significant less outliers compared to the conventional technique. These single surgeon results suggest that PSG are ready for primetime. PMID:26807358
Low income, community poverty and risk of end stage renal disease.

PubMed

Crews, Deidra C; Gutiérrez, Orlando M; Fedewa, Stacey A; Luthi, Jean-Christophe; Shoham, David; Judd, Suzanne E; Powe, Neil R; McClellan, William M

2014-12-04

The risk of end stage renal disease (ESRD) is increased among individuals with low income and in low income communities. However, few studies have examined the relation of both individual and community socioeconomic status (SES) with incident ESRD. Among 23,314 U.S. adults in the population-based Reasons for Geographic and Racial Differences in Stroke study, we assessed participant differences across geospatially-linked categories of county poverty [outlier poverty, extremely high poverty, very high poverty, high poverty, neither (reference), high affluence and outlier affluence]. Multivariable Cox proportional hazards models were used to examine associations of annual household income and geospatially-linked county poverty measures with incident ESRD, while accounting for death as a competing event using the Fine and Gray method. There were 158 ESRD cases during follow-up. Incident ESRD rates were 178.8 per 100,000 person-years (105 py) in high poverty outlier counties and were 76.3 /105 py in affluent outlier counties, p trend=0.06. In unadjusted competing risk models, persons residing in high poverty outlier counties had higher incidence of ESRD (which was not statistically significant) when compared to those persons residing in counties with neither high poverty nor affluence [hazard ratio (HR) 1.54, 95% Confidence Interval (CI) 0.75-3.20]. This association was markedly attenuated following adjustment for socio-demographic factors (age, sex, race, education, and income); HR 0.96, 95% CI 0.46-2.00. However, in the same adjusted model, income was independently associated with risk of ESRD [HR 3.75, 95% CI 1.62-8.64, comparing the <$20,000 income group to the >$75,000 group]. There were no statistically significant associations of county measures of poverty with incident ESRD, and no evidence of effect modification. In contrast to annual family income, geospatially-linked measures of county poverty have little relation with risk of ESRD. Efforts to mitigate socioeconomic disparities in kidney disease may be best appropriated at the individual level.
On the predictability of outliers in ensemble forecasts

NASA Astrophysics Data System (ADS)

Siegert, S.; Bröcker, J.; Kantz, H.

2012-03-01

In numerical weather prediction, ensembles are used to retrieve probabilistic forecasts of future weather conditions. We consider events where the verification is smaller than the smallest, or larger than the largest ensemble member of a scalar ensemble forecast. These events are called outliers. In a statistically consistent K-member ensemble, outliers should occur with a base rate of 2/(K+1). In operational ensembles this base rate tends to be higher. We study the predictability of outlier events in terms of the Brier Skill Score and find that forecast probabilities can be calculated which are more skillful than the unconditional base rate. This is shown analytically for statistically consistent ensembles. Using logistic regression, forecast probabilities for outlier events in an operational ensemble are calculated. These probabilities exhibit positive skill which is quantitatively similar to the analytical results. Possible causes of these results as well as their consequences for ensemble interpretation are discussed.
Near-real time 3D probabilistic earthquakes locations at Mt. Etna volcano

NASA Astrophysics Data System (ADS)

Barberi, G.; D'Agostino, M.; Mostaccio, A.; Patane', D.; Tuve', T.

2012-04-01

Automatic procedure for locating earthquake in quasi-real time must provide a good estimation of earthquakes location within a few seconds after the event is first detected and is strongly needed for seismic warning system. The reliability of an automatic location algorithm is influenced by several factors such as errors in picking seismic phases, network geometry, and velocity model uncertainties. On Mt. Etna, the seismic network is managed by INGV and the quasi-real time earthquakes locations are performed by using an automatic-picking algorithm based on short-term-average to long-term-average ratios (STA/LTA) calculated from an approximate squared envelope function of the seismogram, which furnish a list of P-wave arrival times, and the location algorithm Hypoellipse, with a 1D velocity model. The main purpose of this work is to investigate the performances of a different automatic procedure to improve the quasi-real time earthquakes locations. In fact, as the automatic data processing may be affected by outliers (wrong picks), the use of a traditional earthquake location techniques based on a least-square misfit function (L2-norm) often yield unstable and unreliable solutions. Moreover, on Mt. Etna, the 1D model is often unable to represent the complex structure of the volcano (in particular the strong lateral heterogeneities), whereas the increasing accuracy in the 3D velocity models at Mt. Etna during recent years allows their use today in routine earthquake locations. Therefore, we selected, as reference locations, all the events occurred on Mt. Etna in the last year (2011) which was automatically detected and located by means of the Hypoellipse code. By using this dataset (more than 300 events), we applied a nonlinear probabilistic earthquake location algorithm using the Equal Differential Time (EDT) likelihood function, (Font et al., 2004; Lomax, 2005) which is much more robust in the presence of outliers in the data. Successively, by using a probabilistic non linear method (NonLinLoc, Lomax, 2001) and the 3D velocity model, derived from the one developed by Patanè et al. (2006) integrated with that obtained by Chiarabba et al. (2004), we obtained the best possible constraint on the location of the focii expressed as a probability density function (PDF) for the hypocenter location in 3D space. As expected, the obtained results, compared with the reference ones, show that the NonLinLoc software (applied to a 3D velocity model) is more reliable than the Hypoellipse code (applied to layered 1D velocity models), leading to more reliable automatic locations also when outliers are present.
How Historical Information Can Improve Extreme Value Analysis of Coastal Water Levels

NASA Astrophysics Data System (ADS)

Le Cozannet, G.; Bulteau, T.; Idier, D.; Lambert, J.; Garcin, M.

2016-12-01

The knowledge of extreme coastal water levels is useful for coastal flooding studies or the design of coastal defences. While deriving such extremes with standard analyses using tide gauge measurements, one often needs to deal with limited effective duration of observation which can result in large statistical uncertainties. This is even truer when one faces outliers, those particularly extreme values distant from the others. In a recent work (Bulteau et al., 2015), we investigated how historical information of past events reported in archives can reduce statistical uncertainties and relativize such outlying observations. We adapted a Bayesian Markov Chain Monte Carlo method, initially developed in the hydrology field (Reis and Stedinger, 2005), to the specific case of coastal water levels. We applied this method to the site of La Rochelle (France), where the storm Xynthia in 2010 generated a water level considered so far as an outlier. Based on 30 years of tide gauge measurements and 8 historical events since 1890, the results showed a significant decrease in statistical uncertainties on return levels when historical information is used. Also, Xynthia's water level no longer appeared as an outlier and we could have reasonably predicted the annual exceedance probability of that level beforehand (predictive probability for 2010 based on data until the end of 2009 of the same order of magnitude as the standard estimative probability using data until the end of 2010). Such results illustrate the usefulness of historical information in extreme value analyses of coastal water levels, as well as the relevance of the proposed method to integrate heterogeneous data in such analyses.
Detecting New Pedestrian Facilities from VGI Data Sources

NASA Astrophysics Data System (ADS)

Zhong, S.; Xie, Z.

2017-12-01

Pedestrian facility (e.g. footbridge, pedestrian crossing and underground passage) information is an important basic data of location based service (LBS) for pedestrians. However, timely updating pedestrian facility information challenges due to facilities change frequently. Previous pedestrian facility information collecting and updating tasks are mainly completed by highly trained specialized persons. However, this conventional approach has several disadvantages such as high cost, long update cycle and so on. Volunteered Geographic Information (VGI) has proven efficiency to provide new, free and fast growing spatial data. Pedestrian trajectory, which can be seen as measurements of real pedestrian road, is one of the most valuable information of VGI data. Although the accuracy of the trajectories is not too high, due to the large number of measurements, an improvement of quality of the road information can be achieved. Thus, we develop a method for detecting new pedestrian facilities based on the current road network and pedestrian trajectories. Specifically, 1) by analyzing speed, distance and direction, those outliers of pedestrian trajectories are removed, 2) a road network matching algorithm is developed for eliminating redundant trajectories, and 3) a space-time cluster algorithm is adopted for detecting new walking facilities. The performance of the method is evaluated with a series of experiments conducted on a part of the road network of Heifei and a large number of real pedestrian trajectories, and verified the results by using Tencent Street map. The results show that the proposed method is able to detecting new pedestrian facilities from VGI data accurately. We believe that the proposed method provides an alternative way for general road data acquisition, and can improve the quality of LBS for pedestrians.
Predictors of High Profit and High Deficit Outliers under SwissDRG of a Tertiary Care Center.

PubMed

Mehra, Tarun; Müller, Christian Thomas Benedikt; Volbracht, Jörk; Seifert, Burkhardt; Moos, Rudolf

2015-01-01

Case weights of Diagnosis Related Groups (DRGs) are determined by the average cost of cases from a previous billing period. However, a significant amount of cases are largely over- or underfunded. We therefore decided to analyze earning outliers of our hospital as to search for predictors enabling a better grouping under SwissDRG. 28,893 inpatient cases without additional private insurance discharged from our hospital in 2012 were included in our analysis. Outliers were defined by the interquartile range method. Predictors for deficit and profit outliers were determined with logistic regressions. Predictors were shortlisted with the LASSO regularized logistic regression method and compared to results of Random forest analysis. 10 of these parameters were selected for quantile regression analysis as to quantify their impact on earnings. Psychiatric diagnosis and admission as an emergency case were significant predictors for higher deficit with negative regression coefficients for all analyzed quantiles (p<0.001). Admission from an external health care provider was a significant predictor for a higher deficit in all but the 90% quantile (p<0.001 for Q10, Q20, Q50, Q80 and p = 0.0017 for Q90). Burns predicted higher earnings for cases which were favorably remunerated (p<0.001 for the 90% quantile). Osteoporosis predicted a higher deficit in the most underfunded cases, but did not predict differences in earnings for balanced or profitable cases (Q10 and Q20: p<0.00, Q50: p = 0.10, Q80: p = 0.88 and Q90: p = 0.52). ICU stay, mechanical and patient clinical complexity level score (PCCL) predicted higher losses at the 10% quantile but also higher profits at the 90% quantile (p<0.001). We suggest considering psychiatric diagnosis, admission as an emergency case and admission from an external health care provider as DRG split criteria as they predict large, consistent and significant losses.
Dispersal of emerald ash borer at outlier sites: three case studies

Treesearch

Deborah G. McCullough; Nathan W. Siegert; Therese M. Poland; David L. Cappaert; Ivich Fraser; David Williams

2005-01-01

We worked with cooperators from several state and federal agencies in 2003 and 2004 to assess dispersal of emerald ash borer (EAB), Agrilus planipennis Fairmaire, from known source points in three outlier sites. In February 2003, we felled and sampled more than 200 ash trees at an outlier site near Tipton, Michigan, where one generation of adult...
Neural Mechanisms Behind Identification of Leptokurtic Noise and Adaptive Behavioral Response

PubMed Central

d'Acremont, Mathieu; Bossaerts, Peter

2016-01-01

Large-scale human interaction through, for example, financial markets causes ceaseless random changes in outcome variability, producing frequent and salient outliers that render the outcome distribution more peaked than the Gaussian distribution, and with longer tails. Here, we study how humans cope with this evolutionary novel leptokurtic noise, focusing on the neurobiological mechanisms that allow the brain, 1) to recognize the outliers as noise and 2) to regulate the control necessary for adaptive response. We used functional magnetic resonance imaging, while participants tracked a target whose movements were affected by leptokurtic noise. After initial overreaction and insufficient subsequent correction, participants improved performance significantly. Yet, persistently long reaction times pointed to continued need for vigilance and control. We ran a contrasting treatment where outliers reflected permanent moves of the target, as in traditional mean-shift paradigms. Importantly, outliers were equally frequent and salient. There, control was superior and reaction time was faster. We present a novel reinforcement learning model that fits observed choices better than the Bayes-optimal model. Only anterior insula discriminated between the 2 types of outliers. In both treatments, outliers initially activated an extensive bottom-up attention and belief network, followed by sustained engagement of the fronto-parietal control network. PMID:26850528
Results of continuous monitoring of the performance of rubella virus IgG and hepatitis B virus surface antibody assays using trueness controls in a multicenter trial.

PubMed

Kruk, Tamara; Ratnam, Sam; Preiksaitis, Jutta; Lau, Allan; Hatchette, Todd; Horsman, Greg; Van Caeseele, Paul; Timmons, Brian; Tipples, Graham

2012-10-01

We conducted a multicenter trial in Canada to assess the value of using trueness controls (TC) for rubella virus IgG and hepatitis B virus surface antibody (anti-HBs) serology to determine test performance across laboratories over time. TC were obtained from a single source with known international units. Seven laboratories using different test systems and kit lots included the TC in routine assay runs of the analytes. TC measurements of 1,095 rubella virus IgG and 1,195 anti-HBs runs were plotted on Levey-Jennings control charts for individual laboratories and analyzed using a multirule quality control (MQC) scheme as well as a single three-standard-deviation (3-SD) rule. All rubella virus IgG TC results were "in control" in only one of the seven laboratories. Among the rest, "out-of-control" results ranged from 5.6% to 10% with an outlier at 20.3% by MQC and from 1.1% to 5.6% with an outlier at 13.4% by the 3-SD rule. All anti-HBs TC results were "in control" in only two laboratories. Among the rest, "out-of-control" results ranged from 3.3% to 7.9% with an outlier at 19.8% by MQC and from 0% to 3.3% with an outlier at 10.5% by the 3-SD rule. In conclusion, through the continuous monitoring of assay performance using TC and quality control rules, our trial detected significant intra- and interlaboratory, test system, and kit lot variations for both analytes. In most cases the assay rejections could be attributable to the laboratories rather than to kit lots. This has implications for routine diagnostic screening and clinical practice guidelines and underscores the value of using an approach as described above for continuous quality improvement in result reporting and harmonization for these analytes.

A Self-Aware Machine Platform in Manufacturing Shop Floor Utilizing MTConnect Data

DTIC Science & Technology

2014-10-02

condition monitoring , and equipment time to failure prediction in manufacturing 1 ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2014 589...Component Level Health Monitoring and Prediction One of the characteristics of a self-aware machine is to be able to detect its components...the annual conference of the prognostics and health management society. Filzmoser, P., Garrett, R. G., & Reimann, C . (2005). Mul- tivariate outlier
A Multi Agent System for Flow-Based Intrusion Detection

DTIC Science & Technology

2013-03-01

Student t-test, as it is less likely to spuriously indicate significance because of the presence of outliers [128]. We use the MATLAB ranksum function [77...effectiveness of self-organization and “ entangled hierarchies” for accomplishing scenario objectives. One of the interesting features of SOMAS is the ability...cross-validation and automatic model selection. It has interfaces for Java, Python, R, Splus, MATLAB , Perl, Ruby, and LabVIEW. Kernels: linear
Temporal interpolation alters motion in fMRI scans: Magnitudes and consequences for artifact detection.

PubMed

Power, Jonathan D; Plitt, Mark; Kundu, Prantik; Bandettini, Peter A; Martin, Alex

2017-01-01

Head motion can be estimated at any point of fMRI image processing. Processing steps involving temporal interpolation (e.g., slice time correction or outlier replacement) often precede motion estimation in the literature. From first principles it can be anticipated that temporal interpolation will alter head motion in a scan. Here we demonstrate this effect and its consequences in five large fMRI datasets. Estimated head motion was reduced by 10-50% or more following temporal interpolation, and reductions were often visible to the naked eye. Such reductions make the data seem to be of improved quality. Such reductions also degrade the sensitivity of analyses aimed at detecting motion-related artifact and can cause a dataset with artifact to falsely appear artifact-free. These reduced motion estimates will be particularly problematic for studies needing estimates of motion in time, such as studies of dynamics. Based on these findings, it is sensible to obtain motion estimates prior to any image processing (regardless of subsequent processing steps and the actual timing of motion correction procedures, which need not be changed). We also find that outlier replacement procedures change signals almost entirely during times of motion and therefore have notable similarities to motion-targeting censoring strategies (which withhold or replace signals entirely during times of motion).
Temporal interpolation alters motion in fMRI scans: Magnitudes and consequences for artifact detection

PubMed Central

Plitt, Mark; Kundu, Prantik; Bandettini, Peter A.; Martin, Alex

2017-01-01

Head motion can be estimated at any point of fMRI image processing. Processing steps involving temporal interpolation (e.g., slice time correction or outlier replacement) often precede motion estimation in the literature. From first principles it can be anticipated that temporal interpolation will alter head motion in a scan. Here we demonstrate this effect and its consequences in five large fMRI datasets. Estimated head motion was reduced by 10–50% or more following temporal interpolation, and reductions were often visible to the naked eye. Such reductions make the data seem to be of improved quality. Such reductions also degrade the sensitivity of analyses aimed at detecting motion-related artifact and can cause a dataset with artifact to falsely appear artifact-free. These reduced motion estimates will be particularly problematic for studies needing estimates of motion in time, such as studies of dynamics. Based on these findings, it is sensible to obtain motion estimates prior to any image processing (regardless of subsequent processing steps and the actual timing of motion correction procedures, which need not be changed). We also find that outlier replacement procedures change signals almost entirely during times of motion and therefore have notable similarities to motion-targeting censoring strategies (which withhold or replace signals entirely during times of motion). PMID:28880888
Feature Detection in SAR Interferograms With Missing Data Displays Fault Slip Near El Mayor-Cucapah and South Napa Earthquakes

NASA Astrophysics Data System (ADS)

Parker, J. W.; Donnellan, A.; Glasscoe, M. T.; Stough, T.

2015-12-01

Edge detection identifies seismic or aseismic fault motion, as demonstrated in repeat-pass inteferograms obtained by the Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) program. But this identification, demonstrated in 2010, was not robust: for best results, it requires a flattened background image, interpolation into missing data (holes) and outliers, and background noise that is either sufficiently small or roughly white Gaussian. Proper treatment of missing data, bursting noise patches, and tiny noise differences at short distances apart from bursts are essential to creating an acceptably reliable method sensitive to small near-surface fractures. Clearly a robust method is needed for machine scanning of the thousands of UAVSAR repeat-pass interferograms for evidence of fault slip, landslides, and other local features: hand-crafted intervention will not do. Effective methods of identifying, removing and filling in bad pixels reveal significant features of surface fractures. A rich network of edges (probably fractures and subsidence) in difference images spanning the South Napa earthquake give way to a simple set of postseismically slipping faults. Coseismic El Mayor-Cucapah interferograms compared to post-seismic difference images show nearly disjoint patterns of surface fractures in California's Sonoran Desert; the combined pattern reveals a network of near-perpendicular, probably conjugate faults not mapped before the earthquake. The current algorithms for UAVSAR interferogram edge detections are shown to be effective in difficult environments, including agricultural (Napa, Imperial Valley) and difficult urban areas (Orange County.).
Robust Approach to Verifying the Weak Form of the Efficient Market Hypothesis

NASA Astrophysics Data System (ADS)

Střelec, Luboš

2011-09-01

The weak form of the efficient markets hypothesis states that prices incorporate only past information about the asset. An implication of this form of the efficient markets hypothesis is that one cannot detect mispriced assets and consistently outperform the market through technical analysis of past prices. One of possible formulations of the efficient market hypothesis used for weak form tests is that share prices follow a random walk. It means that returns are realizations of IID sequence of random variables. Consequently, for verifying the weak form of the efficient market hypothesis, we can use distribution tests, among others, i.e. some tests of normality and/or some graphical methods. Many procedures for testing the normality of univariate samples have been proposed in the literature [7]. Today the most popular omnibus test of normality for a general use is the Shapiro-Wilk test. The Jarque-Bera test is the most widely adopted omnibus test of normality in econometrics and related fields. In particular, the Jarque-Bera test (i.e. test based on the classical measures of skewness and kurtosis) is frequently used when one is more concerned about heavy-tailed alternatives. As these measures are based on moments of the data, this test has a zero breakdown value [2]. In other words, a single outlier can make the test worthless. The reason so many classical procedures are nonrobust to outliers is that the parameters of the model are expressed in terms of moments, and their classical estimators are expressed in terms of sample moments, which are very sensitive to outliers. Another approach to robustness is to concentrate on the parameters of interest suggested by the problem under this study. Consequently, novel robust testing procedures of testing normality are presented in this paper to overcome shortcomings of classical normality tests in the field of financial data, which are typical with occurrence of remote data points and additional types of deviations from normality. This study also discusses some results of simulation power studies of these tests for normality against selected alternatives. Based on outcome of the power simulation study, selected normality tests were consequently used to verify weak form of efficiency in Central Europe stock markets.
Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry

PubMed Central

2013-01-01

Background Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima’s D, Fay and Wu’s H and Fu and Li’s D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Results Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. Conclusions We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation. PMID:23848512
Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry.

PubMed

Hider, Jessica L; Gittelman, Rachel M; Shah, Tapan; Edwards, Melissa; Rosenbloom, Arnold; Akey, Joshua M; Parra, Esteban J

2013-07-12

Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima's D, Fay and Wu's H and Fu and Li's D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation.
Increased frequencies of aberrant sperm as indicators of mutagenic damage in mice.

PubMed

Soares, E R; Sheridan, W; Haseman, J K; Segall, M

1979-02-01

We have tested the effects of TEM in 3 strains of mice using the sperm morphology assay. In addition, we have made an attempt to evaluate this test system with respect to experimental design, statistical problems and possible interlaboratory differences. Treatment with TEM results in significant increases in the percent of abnormally shaped sperm. These increases are readily detectable in sperm treated as spermatocytes and spermatogonial stages. Our data indicate possible problems associated with inter-laboratory variation in slide analysis. We have found that despite the introduction of such sources of variation, our data were consistent with respect to the effects of TEM. Another area of concern in the sperm morphology test is the presence of "outlier" animals. In our study, such animals comprised 4% of the total number of animals considered. Statistical analysis of the slides from these animals have shown that this problem can be dealt with and that when recognized as such, "outliers" do not effect the outcome of the sperm morphology assay.
A computational study on outliers in world music

PubMed Central

Benetos, Emmanouil; Dixon, Simon

2017-01-01

The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country. PMID:29253027
Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003-2012.

PubMed

Khan, Diba; Rossen, Lauren M; Hamilton, Brady E; He, Yulei; Wei, Rong; Dienes, Erin

2017-06-01

Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003-2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. Published by Elsevier Ltd.
Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003–2012

PubMed Central

Khan, Diba; Rossen, Lauren M.; Hamilton, Brady E.; He, Yulei; Wei, Rong; Dienes, Erin

2017-01-01

Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003–2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. PMID:28552189
Damage localization by statistical evaluation of signal-processed mode shapes

NASA Astrophysics Data System (ADS)

Ulriksen, M. D.; Damkilde, L.

2015-07-01

Due to their inherent, ability to provide structural information on a local level, mode shapes and t.lieir derivatives are utilized extensively for structural damage identification. Typically, more or less advanced mathematical methods are implemented to identify damage-induced discontinuities in the spatial mode shape signals, hereby potentially facilitating damage detection and/or localization. However, by being based on distinguishing damage-induced discontinuities from other signal irregularities, an intrinsic deficiency in these methods is the high sensitivity towards measurement, noise. The present, article introduces a damage localization method which, compared to the conventional mode shape-based methods, has greatly enhanced robustness towards measurement, noise. The method is based on signal processing of spatial mode shapes by means of continuous wavelet, transformation (CWT) and subsequent, application of a generalized discrete Teager-Kaiser energy operator (GDTKEO) to identify damage-induced mode shape discontinuities. In order to evaluate whether the identified discontinuities are in fact, damage-induced, outlier analysis of principal components of the signal-processed mode shapes is conducted on the basis of T2-statistics. The proposed method is demonstrated in the context, of analytical work with a free-vibrating Euler-Bernoulli beam under noisy conditions.
Least Median of Squares Filtering of Locally Optimal Point Matches for Compressible Flow Image Registration

PubMed Central

Castillo, Edward; Castillo, Richard; White, Benjamin; Rojo, Javier; Guerrero, Thomas

2012-01-01

Compressible flow based image registration operates under the assumption that the mass of the imaged material is conserved from one image to the next. Depending on how the mass conservation assumption is modeled, the performance of existing compressible flow methods is limited by factors such as image quality, noise, large magnitude voxel displacements, and computational requirements. The Least Median of Squares Filtered Compressible Flow (LFC) method introduced here is based on a localized, nonlinear least squares, compressible flow model that describes the displacement of a single voxel that lends itself to a simple grid search (block matching) optimization strategy. Spatially inaccurate grid search point matches, corresponding to erroneous local minimizers of the nonlinear compressible flow model, are removed by a novel filtering approach based on least median of squares fitting and the forward search outlier detection method. The spatial accuracy of the method is measured using ten thoracic CT image sets and large samples of expert determined landmarks (available at www.dir-lab.com). The LFC method produces an average error within the intra-observer error on eight of the ten cases, indicating that the method is capable of achieving a high spatial accuracy for thoracic CT registration. PMID:22797602
Slope angle estimation method based on sparse subspace clustering for probe safe landing

NASA Astrophysics Data System (ADS)

Li, Haibo; Cao, Yunfeng; Ding, Meng; Zhuang, Likui

2018-06-01

To avoid planetary probes landing on steep slopes where they may slip or tip over, a new method of slope angle estimation based on sparse subspace clustering is proposed to improve accuracy. First, a coordinate system is defined and established to describe the measured data of light detection and ranging (LIDAR). Second, this data is processed and expressed with a sparse representation. Third, on this basis, the data is made to cluster to determine which subspace it belongs to. Fourth, eliminating outliers in subspace, the correct data points are used for the fitting planes. Finally, the vectors normal to the planes are obtained using the plane model, and the angle between the normal vectors is obtained through calculation. Based on the geometric relationship, this angle is equal in value to the slope angle. The proposed method was tested in a series of experiments. The experimental results show that this method can effectively estimate the slope angle, can overcome the influence of noise and obtain an exact slope angle. Compared with other methods, this method can minimize the measuring errors and further improve the estimation accuracy of the slope angle.
Robust estimation approach for blind denoising.

PubMed

Rabie, Tamer

2005-11-01

This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.
Rapid amino acid quantitation with pre-column derivatization; ultra-performance reverse phase liquid chromatography and single quadrupole mass spectrometry.

PubMed

Pretorius, Carel J; McWhinney, Brett C; Sipinkoski, Bilyana; Wilce, Alice; Cox, David; McWhinney, Avis; Ungerer, Jacobus P J

2018-03-01

We optimized a quantitative amino acid method with pre-column derivatization, norvaline (nva) internal standard and reverse phase ultra-performance liquid chromatography by replacing the ultraviolet detector with a single quadrupole mass spectrometer (MS nva ). We used 13 C 15 N isotopically labeled amino acid internal standards and a C18 column with 1.6μm particles to optimize the chromatography and to confirm separation of isobaric compounds (MS lis ). We compared the analytical performance of MS nva with MS lis and the original method (UV nva ) with clinical samples. The chromatography time per sample of MS nva was 8min, detection capabilities were <1μmol/L for most components, intermediate imprecisions at low concentrations were <10% and there was negligible carryover. MS nva was linear up to a total amino acid concentration in a sample of approximately 9500μmol/L. The agreements between most individual amino acids were satisfactory compared to UV nva with the latter prone to outliers and suboptimal quantitation of urinary arginine, aspartate, glutamate and methionine. MS nva reliably detected argnininosuccinate, β-alanine, citrulline and cysteine-s-sulfate. MS nva resulted in a more than fivefold increase in operational efficiency with accurate detection of amino acids and metabolic intermediates in clinical samples. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
kruX: matrix-based non-parametric eQTL discovery

PubMed Central

2014-01-01

Background The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. Results We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. Conclusion kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com. PMID:24423115
Shadow Areas Robust Matching Among Image Sequence in Planetary Landing

NASA Astrophysics Data System (ADS)

Ruoyan, Wei; Xiaogang, Ruan; Naigong, Yu; Xiaoqing, Zhu; Jia, Lin

2017-01-01

In this paper, an approach for robust matching shadow areas in autonomous visual navigation and planetary landing is proposed. The approach begins with detecting shadow areas, which are extracted by Maximally Stable Extremal Regions (MSER). Then, an affine normalization algorithm is applied to normalize the areas. Thirdly, a descriptor called Multiple Angles-SIFT (MA-SIFT) that coming from SIFT is proposed, the descriptor can extract more features of an area. Finally, for eliminating the influence of outliers, a method of improved RANSAC based on Skinner Operation Condition is proposed to extract inliers. At last, series of experiments are conducted to test the performance of the approach this paper proposed, the results show that the approach can maintain the matching accuracy at a high level even the differences among the images are obvious with no attitude measurements supplied.
Rejection of Multivariate Outliers.

DTIC Science & Technology

1983-05-01

available in Gnanadesikan (1977). 2 The motivation for the present investigation lies in a recent paper of Schvager and Margolin (1982) who derive a... Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York. [7] Hawkins, D.M. (1980). Identification of

Some links on this page may take you to non-federal websites. Their policies may differ from this site.