outlier detection algorithm: Topics by Science.gov

Sample records for outlier detection algorithm

The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

NASA Astrophysics Data System (ADS)

Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

2017-05-01

Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Spatio-temporal Outlier Detection in Precipitation Data

NASA Astrophysics Data System (ADS)

Wu, Elizabeth; Liu, Wei; Chawla, Sanjay

The detection of outliers from spatio-temporal data is an important task due to the increasing amount of spatio-temporal data available and the need to understand and interpret it. Due to the limitations of current data mining techniques, new techniques to handle this data need to be developed. We propose a spatio-temporal outlier detection algorithm called Outstretch, which discovers the outlier movement patterns of the top-k spatial outliers over several time periods. The top-k spatial outliers are found using the Exact-Grid Top- k and Approx-Grid Top- k algorithms, which are an extension of algorithms developed by Agarwal et al. [1]. Since they use the Kulldorff spatial scan statistic, they are capable of discovering all outliers, unaffected by neighbouring regions that may contain missing values. After generating the outlier sequences, we show one way they can be interpreted, by comparing them to the phases of the El Niño Southern Oscilliation (ENSO) weather phenomenon to provide a meaningful analysis of the results.
Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection

ERIC Educational Resources Information Center

Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

2011-01-01

Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to…
Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

NASA Astrophysics Data System (ADS)

Goel, Amit; Montgomery, Michele

2015-08-01

Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.
Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences

NASA Technical Reports Server (NTRS)

Budalakoti, Suratna; Srivastava, Ashok N.; Akella, Ram; Turkov, Eugene

2006-01-01

This paper addresses the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. The approach taken uses unsupervised clustering of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by detailed analysis of outliers to detect anomalies. As the LCS measure is expensive to compute, the first part of the paper discusses existing algorithms, such as the Hunt-Szymanski algorithm, that have low time-complexity. We then discuss why these algorithms often do not work well in practice and present a new hybrid algorithm for computing the LCS that, in our tests, outperforms the Hunt-Szymanski algorithm by a factor of five. The second part of the paper presents new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence, compared to more normal sequences. The algorithms we present are general and domain-independent, so we discuss applications in related areas such as anomaly detection.
Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization.

PubMed

Zair, Salim; Le Hégarat-Mascle, Sylvie; Seignez, Emmanuel

2016-04-22

In urban areas or space-constrained environments with obstacles, vehicle localization using Global Navigation Satellite System (GNSS) data is hindered by Non-Line Of Sight (NLOS) and multipath receptions. These phenomena induce faulty data that disrupt the precise localization of the GNSS receiver. In this study, we detect the outliers among the observations, Pseudo-Range (PR) and/or Doppler measurements, and we evaluate how discarding them improves the localization. We specify a contrario modeling for GNSS raw data to derive an algorithm that partitions the dataset between inliers and outliers. Then, only the inlier data are considered in the localization process performed either through a classical Particle Filter (PF) or a Rao-Blackwellization (RB) approach. Both localization algorithms exclusively use GNSS data, but they differ by the way Doppler measurements are processed. An experiment has been performed with a GPS receiver aboard a vehicle. Results show that the proposed algorithms are able to detect the 'outliers' in the raw data while being robust to non-Gaussian noise and to intermittent satellite blockage. We compare the performance results achieved either estimating only PR outliers or estimating both PR and Doppler outliers. The best localization is achieved using the RB approach coupled with PR-Doppler outlier estimation.
Bayesian methods for outliers detection in GNSS time series

NASA Astrophysics Data System (ADS)

Qianqian, Zhang; Qingming, Gui

2013-07-01

This article is concerned with the problem of detecting outliers in GNSS time series based on Bayesian statistical theory. Firstly, a new model is proposed to simultaneously detect different types of outliers based on the conception of introducing different types of classification variables corresponding to the different types of outliers; the problem of outlier detection is converted into the computation of the corresponding posterior probabilities, and the algorithm for computing the posterior probabilities based on standard Gibbs sampler is designed. Secondly, we analyze the reasons of masking and swamping about detecting patches of additive outliers intensively; an unmasking Bayesian method for detecting additive outlier patches is proposed based on an adaptive Gibbs sampler. Thirdly, the correctness of the theories and methods proposed above is illustrated by simulated data and then by analyzing real GNSS observations, such as cycle slips detection in carrier phase data. Examples illustrate that the Bayesian methods for outliers detection in GNSS time series proposed by this paper are not only capable of detecting isolated outliers but also capable of detecting additive outlier patches. Furthermore, it can be successfully used to process cycle slips in phase data, which solves the problem of small cycle slips.
Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization

PubMed Central

Zair, Salim; Le Hégarat-Mascle, Sylvie; Seignez, Emmanuel

2016-01-01

In urban areas or space-constrained environments with obstacles, vehicle localization using Global Navigation Satellite System (GNSS) data is hindered by Non-Line Of Sight (NLOS) and multipath receptions. These phenomena induce faulty data that disrupt the precise localization of the GNSS receiver. In this study, we detect the outliers among the observations, Pseudo-Range (PR) and/or Doppler measurements, and we evaluate how discarding them improves the localization. We specify a contrario modeling for GNSS raw data to derive an algorithm that partitions the dataset between inliers and outliers. Then, only the inlier data are considered in the localization process performed either through a classical Particle Filter (PF) or a Rao-Blackwellization (RB) approach. Both localization algorithms exclusively use GNSS data, but they differ by the way Doppler measurements are processed. An experiment has been performed with a GPS receiver aboard a vehicle. Results show that the proposed algorithms are able to detect the ‘outliers’ in the raw data while being robust to non-Gaussian noise and to intermittent satellite blockage. We compare the performance results achieved either estimating only PR outliers or estimating both PR and Doppler outliers. The best localization is achieved using the RB approach coupled with PR-Doppler outlier estimation. PMID:27110796
Comparison of outliers and novelty detection to identify ionospheric TEC irregularities during geomagnetic storm and substorm

NASA Astrophysics Data System (ADS)

Pattisahusiwa, Asis; Houw Liong, The; Purqon, Acep

2016-08-01

In this study, we compare two learning mechanisms: outliers and novelty detection in order to detect ionospheric TEC disturbance by November 2004 geomagnetic storm and January 2005 substorm. The mechanisms are applied by using v-SVR learning algorithm which is a regression version of SVM. Our results show that both mechanisms are quiet accurate in learning TEC data. However, novelty detection is more accurate than outliers detection in extracting anomalies related to geomagnetic events. The detected anomalies by outliers detection are mostly related to trend of data, while novelty detection are associated to geomagnetic events. Novelty detection also shows evidence of LSTID during geomagnetic events.
Smartphone-Based Indoor Localization with Bluetooth Low Energy Beacons

PubMed Central

Zhuang, Yuan; Yang, Jun; Li, You; Qi, Longning; El-Sheimy, Naser

2016-01-01

Indoor wireless localization using Bluetooth Low Energy (BLE) beacons has attracted considerable attention after the release of the BLE protocol. In this paper, we propose an algorithm that uses the combination of channel-separate polynomial regression model (PRM), channel-separate fingerprinting (FP), outlier detection and extended Kalman filtering (EKF) for smartphone-based indoor localization with BLE beacons. The proposed algorithm uses FP and PRM to estimate the target’s location and the distances between the target and BLE beacons respectively. We compare the performance of distance estimation that uses separate PRM for three advertisement channels (i.e., the separate strategy) with that use an aggregate PRM generated through the combination of information from all channels (i.e., the aggregate strategy). The performance of FP-based location estimation results of the separate strategy and the aggregate strategy are also compared. It was found that the separate strategy can provide higher accuracy; thus, it is preferred to adopt PRM and FP for each BLE advertisement channel separately. Furthermore, to enhance the robustness of the algorithm, a two-level outlier detection mechanism is designed. Distance and location estimates obtained from PRM and FP are passed to the first outlier detection to generate improved distance estimates for the EKF. After the EKF process, the second outlier detection algorithm based on statistical testing is further performed to remove the outliers. The proposed algorithm was evaluated by various field experiments. Results show that the proposed algorithm achieved the accuracy of <2.56 m at 90% of the time with dense deployment of BLE beacons (1 beacon per 9 m), which performs 35.82% better than <3.99 m from the Propagation Model (PM) + EKF algorithm and 15.77% more accurate than <3.04 m from the FP + EKF algorithm. With sparse deployment (1 beacon per 18 m), the proposed algorithm achieves the accuracies of <3.88 m at 90% of the time, which performs 49.58% more accurate than <8.00 m from the PM + EKF algorithm and 21.41% better than <4.94 m from the FP + EKF algorithm. Therefore, the proposed algorithm is especially useful to improve the localization accuracy in environments with sparse beacon deployment. PMID:27128917
Smartphone-Based Indoor Localization with Bluetooth Low Energy Beacons.

PubMed

Zhuang, Yuan; Yang, Jun; Li, You; Qi, Longning; El-Sheimy, Naser

2016-04-26

Indoor wireless localization using Bluetooth Low Energy (BLE) beacons has attracted considerable attention after the release of the BLE protocol. In this paper, we propose an algorithm that uses the combination of channel-separate polynomial regression model (PRM), channel-separate fingerprinting (FP), outlier detection and extended Kalman filtering (EKF) for smartphone-based indoor localization with BLE beacons. The proposed algorithm uses FP and PRM to estimate the target's location and the distances between the target and BLE beacons respectively. We compare the performance of distance estimation that uses separate PRM for three advertisement channels (i.e., the separate strategy) with that use an aggregate PRM generated through the combination of information from all channels (i.e., the aggregate strategy). The performance of FP-based location estimation results of the separate strategy and the aggregate strategy are also compared. It was found that the separate strategy can provide higher accuracy; thus, it is preferred to adopt PRM and FP for each BLE advertisement channel separately. Furthermore, to enhance the robustness of the algorithm, a two-level outlier detection mechanism is designed. Distance and location estimates obtained from PRM and FP are passed to the first outlier detection to generate improved distance estimates for the EKF. After the EKF process, the second outlier detection algorithm based on statistical testing is further performed to remove the outliers. The proposed algorithm was evaluated by various field experiments. Results show that the proposed algorithm achieved the accuracy of <2.56 m at 90% of the time with dense deployment of BLE beacons (1 beacon per 9 m), which performs 35.82% better than <3.99 m from the Propagation Model (PM) + EKF algorithm and 15.77% more accurate than <3.04 m from the FP + EKF algorithm. With sparse deployment (1 beacon per 18 m), the proposed algorithm achieves the accuracies of <3.88 m at 90% of the time, which performs 49.58% more accurate than <8.00 m from the PM + EKF algorithm and 21.41% better than <4.94 m from the FP + EKF algorithm. Therefore, the proposed algorithm is especially useful to improve the localization accuracy in environments with sparse beacon deployment.
Stratification-Based Outlier Detection over the Deep Web.

PubMed

Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

2016-01-01

For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
Stratification-Based Outlier Detection over the Deep Web

PubMed Central

Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S.; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

2016-01-01

For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web. PMID:27313603
Using State Estimation Residuals to Detect Abnormal SCADA Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Jian; Chen, Yousu; Huang, Zhenyu

2010-04-30

Detection of abnormal supervisory control and data acquisition (SCADA) data is critically important for safe and secure operation of modern power systems. In this paper, a methodology of abnormal SCADA data detection based on state estimation residuals is presented. Preceded with a brief overview of outlier detection methods and bad SCADA data detection for state estimation, the framework of the proposed methodology is described. Instead of using original SCADA measurements as the bad data sources, the residuals calculated based on the results of the state estimator are used as the input for the outlier detection algorithm. The BACON algorithm ismore » applied to the outlier detection task. The IEEE 118-bus system is used as a test base to evaluate the effectiveness of the proposed methodology. The accuracy of the BACON method is compared with that of the 3-σ method for the simulated SCADA measurements and residuals.« less
Outlier detection in contamination control

NASA Astrophysics Data System (ADS)

Weintraub, Jeffrey; Warrick, Scott

2018-03-01

A machine-learning model is presented that effectively partitions historical process data into outlier and inlier subpopulations. This is necessary in order to avoid using outlier data to build a model for detecting process instability. Exact control limits are given without recourse to approximations and the error characteristics of the control model are derived. A worked example for contamination control is presented along with the machine learning algorithm used and all the programming statements needed for implementation.
A framework for periodic outlier pattern detection in time-series sequences.

PubMed

Rasheed, Faraz; Alhajj, Reda

2014-05-01

Periodic pattern detection in time-ordered sequences is an important data mining task, which discovers in the time series all patterns that exhibit temporal regularities. Periodic pattern mining has a large number of applications in real life; it helps understanding the regular trend of the data along time, and enables the forecast and prediction of future events. An interesting related and vital problem that has not received enough attention is to discover outlier periodic patterns in a time series. Outlier patterns are defined as those which are different from the rest of the patterns; outliers are not noise. While noise does not belong to the data and it is mostly eliminated by preprocessing, outliers are actual instances in the data but have exceptional characteristics compared with the majority of the other instances. Outliers are unusual patterns that rarely occur, and, thus, have lesser support (frequency of appearance) in the data. Outlier patterns may hint toward discrepancy in the data such as fraudulent transactions, network intrusion, change in customer behavior, recession in the economy, epidemic and disease biomarkers, severe weather conditions like tornados, etc. We argue that detecting the periodicity of outlier patterns might be more important in many sequences than the periodicity of regular, more frequent patterns. In this paper, we present a robust and time efficient suffix tree-based algorithm capable of detecting the periodicity of outlier patterns in a time series by giving more significance to less frequent yet periodic patterns. Several experiments have been conducted using both real and synthetic data; all aspects of the proposed approach are compared with the existing algorithm InfoMiner; the reported results demonstrate the effectiveness and applicability of the proposed approach.
An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

PubMed

Leung, Yuk Yee; Chang, Chun Qi; Hung, Yeung Sam

2012-01-01

Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the 'wrong' (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.
Outlier detection for particle image velocimetry data using a locally estimated noise variance

NASA Astrophysics Data System (ADS)

Lee, Yong; Yang, Hua; Yin, ZhouPing

2017-03-01

This work describes an adaptive spatial variable threshold outlier detection algorithm for raw gridded particle image velocimetry data using a locally estimated noise variance. This method is an iterative procedure, and each iteration is composed of a reference vector field reconstruction step and an outlier detection step. We construct the reference vector field using a weighted adaptive smoothing method (Garcia 2010 Comput. Stat. Data Anal. 54 1167-78), and the weights are determined in the outlier detection step using a modified outlier detector (Ma et al 2014 IEEE Trans. Image Process. 23 1706-21). A hard decision on the final weights of the iteration can produce outlier labels of the field. The technical contribution is that the spatial variable threshold motivation is embedded in the modified outlier detector with a locally estimated noise variance in an iterative framework for the first time. It turns out that a spatial variable threshold is preferable to a single spatial constant threshold in complicated flows such as vortex flows or turbulent flows. Synthetic cellular vortical flows with simulated scattered or clustered outliers are adopted to evaluate the performance of our proposed method in comparison with popular validation approaches. This method also turns out to be beneficial in a real PIV measurement of turbulent flow. The experimental results demonstrated that the proposed method yields the competitive performance in terms of outlier under-detection count and over-detection count. In addition, the outlier detection method is computational efficient and adaptive, requires no user-defined parameters, and corresponding implementations are also provided in supplementary materials.
Spatial detection of tv channel logos as outliers from the content

NASA Astrophysics Data System (ADS)

Ekin, Ahmet; Braspenning, Ralph

2006-01-01

This paper proposes a purely image-based TV channel logo detection algorithm that can detect logos independently from their motion and transparency features. The proposed algorithm can robustly detect any type of logos, such as transparent and animated, without requiring any temporal constraints whereas known methods have to wait for the occurrence of large motion in the scene and assume stationary logos. The algorithm models logo pixels as outliers from the actual scene content that is represented by multiple 3-D histograms in the YC BC R space. We use four scene histograms corresponding to each of the four corners because the content characteristics change from one image corner to another. A further novelty of the proposed algorithm is that we define image corners and the areas where we compute the scene histograms by a cinematic technique called Golden Section Rule that is used by professionals. The robustness of the proposed algorithm is demonstrated over a dataset of representative TV content.
Intelligent agent-based intrusion detection system using enhanced multiclass SVM.

PubMed

Ganapathy, S; Yogesh, P; Kannan, A

2012-01-01

Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set.

Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM

PubMed Central

Ganapathy, S.; Yogesh, P.; Kannan, A.

2012-01-01

Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set. PMID:23056036
Online Conditional Outlier Detection in Nonstationary Time Series

PubMed Central

Liu, Siqi; Wright, Adam; Hauskrecht, Milos

2017-01-01

The objective of this work is to develop methods for detecting outliers in time series data. Such methods can become the key component of various monitoring and alerting systems, where an outlier may be equal to some adverse condition that needs human attention. However, real-world time series are often affected by various sources of variability present in the environment that may influence the quality of detection; they may (1) explain some of the changes in the signal that would otherwise lead to false positive detections, as well as, (2) reduce the sensitivity of the detection algorithm leading to increase in false negatives. To alleviate these problems, we propose a new two-layer outlier detection approach that first tries to model and account for the nonstationarity and periodic variation in the time series, and then tries to use other observable variables in the environment to explain any additional signal variation. Our experiments on several data sets in different domains show that our method provides more accurate modeling of the time series, and that it is able to significantly improve outlier detection performance. PMID:29644345
Online Conditional Outlier Detection in Nonstationary Time Series.

PubMed

Liu, Siqi; Wright, Adam; Hauskrecht, Milos

2017-05-01

The objective of this work is to develop methods for detecting outliers in time series data. Such methods can become the key component of various monitoring and alerting systems, where an outlier may be equal to some adverse condition that needs human attention. However, real-world time series are often affected by various sources of variability present in the environment that may influence the quality of detection; they may (1) explain some of the changes in the signal that would otherwise lead to false positive detections, as well as, (2) reduce the sensitivity of the detection algorithm leading to increase in false negatives. To alleviate these problems, we propose a new two-layer outlier detection approach that first tries to model and account for the nonstationarity and periodic variation in the time series, and then tries to use other observable variables in the environment to explain any additional signal variation. Our experiments on several data sets in different domains show that our method provides more accurate modeling of the time series, and that it is able to significantly improve outlier detection performance.
Detecting Anomalies from End-to-End Internet Performance Measurements (PingER) Using Cluster Based Local Outlier Factor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie

PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less
Detecting Anomalies from End-to-End Internet Performance Measurements (PingER) Using Cluster Based Local Outlier Factor

DOE PAGES

Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie; ...

2018-05-28

PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less
The good, the bad and the outliers: automated detection of errors and outliers from groundwater hydrographs

NASA Astrophysics Data System (ADS)

Peterson, Tim J.; Western, Andrew W.; Cheng, Xiang

2018-03-01

Suspicious groundwater-level observations are common and can arise for many reasons ranging from an unforeseen biophysical process to bore failure and data management errors. Unforeseen observations may provide valuable insights that challenge existing expectations and can be deemed outliers, while monitoring and data handling failures can be deemed errors, and, if ignored, may compromise trend analysis and groundwater model calibration. Ideally, outliers and errors should be identified but to date this has been a subjective process that is not reproducible and is inefficient. This paper presents an approach to objectively and efficiently identify multiple types of errors and outliers. The approach requires only the observed groundwater hydrograph, requires no particular consideration of the hydrogeology, the drivers (e.g. pumping) or the monitoring frequency, and is freely available in the HydroSight toolbox. Herein, the algorithms and time-series model are detailed and applied to four observation bores with varying dynamics. The detection of outliers was most reliable when the observation data were acquired quarterly or more frequently. Outlier detection where the groundwater-level variance is nonstationary or the absolute trend increases rapidly was more challenging, with the former likely to result in an under-estimation of the number of outliers and the latter an overestimation in the number of outliers.
Improving Electronic Sensor Reliability by Robust Outlier Screening

PubMed Central

Moreno-Lizaranzu, Manuel J.; Cuesta, Federico

2013-01-01

Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs. PMID:24113682
Improving electronic sensor reliability by robust outlier screening.

PubMed

Moreno-Lizaranzu, Manuel J; Cuesta, Federico

2013-10-09

Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs.
Evaluation schemes for video and image anomaly detection algorithms

NASA Astrophysics Data System (ADS)

Parameswaran, Shibin; Harguess, Josh; Barngrover, Christopher; Shafer, Scott; Reese, Michael

2016-05-01

Video anomaly detection is a critical research area in computer vision. It is a natural first step before applying object recognition algorithms. There are many algorithms that detect anomalies (outliers) in videos and images that have been introduced in recent years. However, these algorithms behave and perform differently based on differences in domains and tasks to which they are subjected. In order to better understand the strengths and weaknesses of outlier algorithms and their applicability in a particular domain/task of interest, it is important to measure and quantify their performance using appropriate evaluation metrics. There are many evaluation metrics that have been used in the literature such as precision curves, precision-recall curves, and receiver operating characteristic (ROC) curves. In order to construct these different metrics, it is also important to choose an appropriate evaluation scheme that decides when a proposed detection is considered a true or a false detection. Choosing the right evaluation metric and the right scheme is very critical since the choice can introduce positive or negative bias in the measuring criterion and may favor (or work against) a particular algorithm or task. In this paper, we review evaluation metrics and popular evaluation schemes that are used to measure the performance of anomaly detection algorithms on videos and imagery with one or more anomalies. We analyze the biases introduced by these by measuring the performance of an existing anomaly detection algorithm.
Detecting Outliers in Factor Analysis Using the Forward Search Algorithm

ERIC Educational Resources Information Center

Mavridis, Dimitris; Moustaki, Irini

2008-01-01

In this article we extend and implement the forward search algorithm for identifying atypical subjects/observations in factor analysis models. The forward search has been mainly developed for detecting aberrant observations in regression models (Atkinson, 1994) and in multivariate methods such as cluster and discriminant analysis (Atkinson, Riani,…
Locality-constrained anomaly detection for hyperspectral imagery

NASA Astrophysics Data System (ADS)

Liu, Jiabin; Li, Wei; Du, Qian; Liu, Kui

2015-12-01

Detecting a target with low-occurrence-probability from unknown background in a hyperspectral image, namely anomaly detection, is of practical significance. Reed-Xiaoli (RX) algorithm is considered as a classic anomaly detector, which calculates the Mahalanobis distance between local background and the pixel under test. Local RX, as an adaptive RX detector, employs a dual-window strategy to consider pixels within the frame between inner and outer windows as local background. However, the detector is sensitive if such a local region contains anomalous pixels (i.e., outliers). In this paper, a locality-constrained anomaly detector is proposed to remove outliers in the local background region before employing the RX algorithm. Specifically, a local linear representation is designed to exploit the internal relationship between linearly correlated pixels in the local background region and the pixel under test and its neighbors. Experimental results demonstrate that the proposed detector improves the original local RX algorithm.
An efficient sampling algorithm for uncertain abnormal data detection in biomedical image processing and disease prediction.

PubMed

Liu, Fei; Zhang, Xi; Jia, Yan

2015-01-01

In this paper, we propose a computer information processing algorithm that can be used for biomedical image processing and disease prediction. A biomedical image is considered a data object in a multi-dimensional space. Each dimension is a feature that can be used for disease diagnosis. We introduce a new concept of the top (k1,k2) outlier. It can be used to detect abnormal data objects in the multi-dimensional space. This technique focuses on uncertain space, where each data object has several possible instances with distinct probabilities. We design an efficient sampling algorithm for the top (k1,k2) outlier in uncertain space. Some improvement techniques are used for acceleration. Experiments show our methods' high accuracy and high efficiency.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

PubMed

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

NASA Astrophysics Data System (ADS)

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Visualizing Big Data Outliers through Distributed Aggregation.

PubMed

Wilkinson, Leland

2017-08-29

Visualizing outliers in massive datasets requires statistical pre-processing in order to reduce the scale of the problem to a size amenable to rendering systems like D3, Plotly or analytic systems like R or SAS. This paper presents a new algorithm, called hdoutliers, for detecting multidimensional outliers. It is unique for a) dealing with a mixture of categorical and continuous variables, b) dealing with big-p (many columns of data), c) dealing with big-n (many rows of data), d) dealing with outliers that mask other outliers, and e) dealing consistently with unidimensional and multidimensional datasets. Unlike ad hoc methods found in many machine learning papers, hdoutliers is based on a distributional model that allows outliers to be tagged with a probability. This critical feature reduces the likelihood of false discoveries.
Data Analytics for Smart Parking Applications.

PubMed

Piovesan, Nicola; Turi, Leo; Toigo, Enrico; Martinez, Borja; Rossi, Michele

2016-09-23

We consider real-life smart parking systems where parking lot occupancy data are collected from field sensor devices and sent to backend servers for further processing and usage for applications. Our objective is to make these data useful to end users, such as parking managers, and, ultimately, to citizens. To this end, we concoct and validate an automated classification algorithm having two objectives: (1) outlier detection: to detect sensors with anomalous behavioral patterns, i.e., outliers; and (2) clustering: to group the parking sensors exhibiting similar patterns into distinct clusters. We first analyze the statistics of real parking data, obtaining suitable simulation models for parking traces. We then consider a simple classification algorithm based on the empirical complementary distribution function of occupancy times and show its limitations. Hence, we design a more sophisticated algorithm exploiting unsupervised learning techniques (self-organizing maps). These are tuned following a supervised approach using our trace generator and are compared against other clustering schemes, namely expectation maximization, k-means clustering and DBSCAN, considering six months of data from a real sensor deployment. Our approach is found to be superior in terms of classification accuracy, while also being capable of identifying all of the outliers in the dataset.
Data Analytics for Smart Parking Applications

PubMed Central

Piovesan, Nicola; Turi, Leo; Toigo, Enrico; Martinez, Borja; Rossi, Michele

2016-01-01

We consider real-life smart parking systems where parking lot occupancy data are collected from field sensor devices and sent to backend servers for further processing and usage for applications. Our objective is to make these data useful to end users, such as parking managers, and, ultimately, to citizens. To this end, we concoct and validate an automated classification algorithm having two objectives: (1) outlier detection: to detect sensors with anomalous behavioral patterns, i.e., outliers; and (2) clustering: to group the parking sensors exhibiting similar patterns into distinct clusters. We first analyze the statistics of real parking data, obtaining suitable simulation models for parking traces. We then consider a simple classification algorithm based on the empirical complementary distribution function of occupancy times and show its limitations. Hence, we design a more sophisticated algorithm exploiting unsupervised learning techniques (self-organizing maps). These are tuned following a supervised approach using our trace generator and are compared against other clustering schemes, namely expectation maximization, k-means clustering and DBSCAN, considering six months of data from a real sensor deployment. Our approach is found to be superior in terms of classification accuracy, while also being capable of identifying all of the outliers in the dataset. PMID:27669259
Correction of Dual-PRF Doppler Velocity Outliers in the Presence of Aliasing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Altube, Patricia; Bech, Joan; Argemí, Oriol

In Doppler weather radars, the presence of unfolding errors or outliers is a well-known quality issue for radial velocity fields estimated using the dual–pulse repetition frequency (PRF) technique. Postprocessing methods have been developed to correct dual-PRF outliers, but these need prior application of a dealiasing algorithm for an adequate correction. Our paper presents an alternative procedure based on circular statistics that corrects dual-PRF errors in the presence of extended Nyquist aliasing. The correction potential of the proposed method is quantitatively tested by means of velocity field simulations and is exemplified in the application to real cases, including severe storm events.more » The comparison with two other existing correction methods indicates an improved performance in the correction of clustered outliers. The technique we propose is well suited for real-time applications requiring high-quality Doppler radar velocity fields, such as wind shear and mesocyclone detection algorithms, or assimilation in numerical weather prediction models.« less
Correction of Dual-PRF Doppler Velocity Outliers in the Presence of Aliasing

DOE PAGES

Altube, Patricia; Bech, Joan; Argemí, Oriol; ...

2017-07-18

In Doppler weather radars, the presence of unfolding errors or outliers is a well-known quality issue for radial velocity fields estimated using the dual–pulse repetition frequency (PRF) technique. Postprocessing methods have been developed to correct dual-PRF outliers, but these need prior application of a dealiasing algorithm for an adequate correction. Our paper presents an alternative procedure based on circular statistics that corrects dual-PRF errors in the presence of extended Nyquist aliasing. The correction potential of the proposed method is quantitatively tested by means of velocity field simulations and is exemplified in the application to real cases, including severe storm events.more » The comparison with two other existing correction methods indicates an improved performance in the correction of clustered outliers. The technique we propose is well suited for real-time applications requiring high-quality Doppler radar velocity fields, such as wind shear and mesocyclone detection algorithms, or assimilation in numerical weather prediction models.« less
Privacy Preserving Nearest Neighbor Search

NASA Astrophysics Data System (ADS)

Shaneck, Mark; Kim, Yongdae; Kumar, Vipin

Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation

Detecting outliers and learning complex structures with large spectroscopic surveys - a case study with APOGEE stars

NASA Astrophysics Data System (ADS)

Reis, Itamar; Poznanski, Dovi; Baron, Dalya; Zasowski, Gail; Shahaf, Sahar

2018-05-01

In this work, we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the data set, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the data set for objects allows us to find objects that are impossible to find using their best-fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the data set, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data.
A New Secondary Structure Assignment Algorithm Using Cα Backbone Fragments

PubMed Central

Cao, Chen; Wang, Guishen; Liu, An; Xu, Shutan; Wang, Lincong; Zou, Shuxue

2016-01-01

The assignment of secondary structure elements in proteins is a key step in the analysis of their structures and functions. We have developed an algorithm, SACF (secondary structure assignment based on Cα fragments), for secondary structure element (SSE) assignment based on the alignment of Cα backbone fragments with central poses derived by clustering known SSE fragments. The assignment algorithm consists of three steps: First, the outlier fragments on known SSEs are detected. Next, the remaining fragments are clustered to obtain the central fragments for each cluster. Finally, the central fragments are used as a template to make assignments. Following a large-scale comparison of 11 secondary structure assignment methods, SACF, KAKSI and PROSS are found to have similar agreement with DSSP, while PCASSO agrees with DSSP best. SACF and PCASSO show preference to reducing residues in N and C cap regions, whereas KAKSI, P-SEA and SEGNO tend to add residues to the terminals when DSSP assignment is taken as standard. Moreover, our algorithm is able to assign subtle helices (310-helix, π-helix and left-handed helix) and make uniform assignments, as well as to detect rare SSEs in β-sheets or long helices as outlier fragments from other programs. The structural uniformity should be useful for protein structure classification and prediction, while outlier fragments underlie the structure–function relationship. PMID:26978354
A robust data scaling algorithm to improve classification accuracies in biomedical data.

PubMed

Cao, Xi Hang; Stojkovic, Ivan; Obradovic, Zoran

2016-09-09

Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic/classification models in clinical/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy. To evaluate the effectiveness of the proposed algorithm, we conducted experiments on 16 binary classification tasks with different variable types and cover a wide range of applications. The resultant performance in terms of area under the receiver operation characteristic curve (AUROC) and percentage of correct classification showed that models learned using data scaled by the GL algorithm outperform the ones using data scaled by the Min-max and the Z-score algorithm, which are the most commonly used data scaling algorithms. The proposed GL algorithm is simple and effective. It is robust to outliers, so no additional denoising or outlier detection step is needed in data preprocessing. Empirical results also show models learned from data scaled by the GL algorithm have higher accuracy compared to the commonly used data scaling algorithms.
Mixture-Tuned, Clutter Matched Filter for Remote Detection of Subpixel Spectral Signals

NASA Technical Reports Server (NTRS)

Thompson, David R.; Mandrake, Lukas; Green, Robert O.

2013-01-01

Mapping localized spectral features in large images demands sensitive and robust detection algorithms. Two aspects of large images that can harm matched-filter detection performance are addressed simultaneously. First, multimodal backgrounds may thwart the typical Gaussian model. Second, outlier features can trigger false detections from large projections onto the target vector. Two state-of-the-art approaches are combined that independently address outlier false positives and multimodal backgrounds. The background clustering models multimodal backgrounds, and the mixture tuned matched filter (MT-MF) addresses outliers. Combining the two methods captures significant additional performance benefits. The resulting mixture tuned clutter matched filter (MT-CMF) shows effective performance on simulated and airborne datasets. The classical MNF transform was applied, followed by k-means clustering. Then, each cluster s mean, covariance, and the corresponding eigenvalues were estimated. This yields a cluster-specific matched filter estimate as well as a cluster- specific feasibility score to flag outlier false positives. The technology described is a proof of concept that may be employed in future target detection and mapping applications for remote imaging spectrometers. It is of most direct relevance to JPL proposals for airborne and orbital hyperspectral instruments. Applications include subpixel target detection in hyperspectral scenes for military surveillance. Earth science applications include mineralogical mapping, species discrimination for ecosystem health monitoring, and land use classification.
Data quality enhancement and knowledge discovery from relevant signals in acoustic emission

NASA Astrophysics Data System (ADS)

Mejia, Felipe; Shyu, Mei-Ling; Nanni, Antonio

2015-10-01

The increasing popularity of structural health monitoring has brought with it a growing need for automated data management and data analysis tools. Of great importance are filters that can systematically detect unwanted signals in acoustic emission datasets. This study presents a semi-supervised data mining scheme that detects data belonging to unfamiliar distributions. This type of outlier detection scheme is useful detecting the presence of new acoustic emission sources, given a training dataset of unwanted signals. In addition to classifying new observations (herein referred to as "outliers") within a dataset, the scheme generates a decision tree that classifies sub-clusters within the outlier context set. The obtained tree can be interpreted as a series of characterization rules for newly-observed data, and they can potentially describe the basic structure of different modes within the outlier distribution. The data mining scheme is first validated on a synthetic dataset, and an attempt is made to confirm the algorithms' ability to discriminate outlier acoustic emission sources from a controlled pencil-lead-break experiment. Finally, the scheme is applied to data from two fatigue crack-growth steel specimens, where it is shown that extracted rules can adequately describe crack-growth related acoustic emission sources while filtering out background "noise." Results show promising performance in filter generation, thereby allowing analysts to extract, characterize, and focus only on meaningful signals.
Sparsity-weighted outlier FLOODing (OFLOOD) method: Efficient rare event sampling method using sparsity of distribution.

PubMed

Harada, Ryuhei; Nakamura, Tomotake; Shigeta, Yasuteru

2016-03-30

As an extension of the Outlier FLOODing (OFLOOD) method [Harada et al., J. Comput. Chem. 2015, 36, 763], the sparsity of the outliers defined by a hierarchical clustering algorithm, FlexDice, was considered to achieve an efficient conformational search as sparsity-weighted "OFLOOD." In OFLOOD, FlexDice detects areas of sparse distribution as outliers. The outliers are regarded as candidates that have high potential to promote conformational transitions and are employed as initial structures for conformational resampling by restarting molecular dynamics simulations. When detecting outliers, FlexDice defines a rank in the hierarchy for each outlier, which relates to sparsity in the distribution. In this study, we define a lower rank (first ranked), a medium rank (second ranked), and the highest rank (third ranked) outliers, respectively. For instance, the first-ranked outliers are located in a given conformational space away from the clusters (highly sparse distribution), whereas those with the third-ranked outliers are nearby the clusters (a moderately sparse distribution). To achieve the conformational search efficiently, resampling from the outliers with a given rank is performed. As demonstrations, this method was applied to several model systems: Alanine dipeptide, Met-enkephalin, Trp-cage, T4 lysozyme, and glutamine binding protein. In each demonstration, the present method successfully reproduced transitions among metastable states. In particular, the first-ranked OFLOOD highly accelerated the exploration of conformational space by expanding the edges. In contrast, the third-ranked OFLOOD reproduced local transitions among neighboring metastable states intensively. For quantitatively evaluations of sampled snapshots, free energy calculations were performed with a combination of umbrella samplings, providing rigorous landscapes of the biomolecules. © 2015 Wiley Periodicals, Inc.
New Quality Control Algorithm Based on GNSS Sensing Data for a Bridge Health Monitoring System

PubMed Central

Lee, Jae Kang; Lee, Jae One; Kim, Jung Ok

2016-01-01

This research introduces an improvement plan for the reliability of Global Navigation Satellite System (GNSS) positioning solutions. It should be considered the most suitable methodology in terms of the adjustment and positioning of GNSS in order to maximize the utilization of GNSS applications. Though various studies have been conducted with regards to Bridge Health Monitoring System (BHMS) based on GNSS, the outliers which depend on the signal reception environment could not be considered until now. Since these outliers may be connected to GNSS data collected from major bridge members, which can reduce the reliability of a whole monitoring system through the delivery of false information, they should be detected and eliminated in the previous adjustment stage. In this investigation, the Detection, Identification, Adaptation (DIA) technique was applied and implemented through an algorithm. Moreover, it can be directly applied to GNSS data collected from long span cable stayed bridges and most of outliers were efficiently detected and eliminated simultaneously. By these effects, the reliability of GNSS should be enormously improved. Improvement on GNSS positioning accuracy is directly linked to the safety of bridges itself, and at the same time, the reliability of monitoring systems in terms of the system operation can also be increased. PMID:27240375
Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT

PubMed Central

Wang, Dantong; Fong, Simon; Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Wong, Kelvin K. L.

2017-01-01

Outlier detection in bioinformatics data streaming mining has received significant attention by research communities in recent years. The problems of how to distinguish noise from an exception and deciding whether to discard it or to devise an extra decision path for accommodating it are causing dilemma. In this paper, we propose a novel algorithm called ODR with incrementally Optimized Very Fast Decision Tree (ODR-ioVFDT) for taking care of outliers in the progress of continuous data learning. By using an adaptive interquartile-range based identification method, a tolerance threshold is set. It is then used to judge if a data of exceptional value should be included for training or otherwise. This is different from the traditional outlier detection/removal approaches which are two separate steps in processing through the data. The proposed algorithm is tested using datasets of five bioinformatics scenarios and comparing the performance of our model and other ones without ODR. The results show that ODR-ioVFDT has better performance in classification accuracy, kappa statistics, and time consumption. The ODR-ioVFDT applied onto bioinformatics streaming data processing for detecting and quantifying the information of life phenomena, states, characters, variables and components of the organism can help to diagnose and treat disease more effectively. PMID:28230161
New Quality Control Algorithm Based on GNSS Sensing Data for a Bridge Health Monitoring System.

PubMed

Lee, Jae Kang; Lee, Jae One; Kim, Jung Ok

2016-05-27

This research introduces an improvement plan for the reliability of Global Navigation Satellite System (GNSS) positioning solutions. It should be considered the most suitable methodology in terms of the adjustment and positioning of GNSS in order to maximize the utilization of GNSS applications. Though various studies have been conducted with regards to Bridge Health Monitoring System (BHMS) based on GNSS, the outliers which depend on the signal reception environment could not be considered until now. Since these outliers may be connected to GNSS data collected from major bridge members, which can reduce the reliability of a whole monitoring system through the delivery of false information, they should be detected and eliminated in the previous adjustment stage. In this investigation, the Detection, Identification, Adaptation (DIA) technique was applied and implemented through an algorithm. Moreover, it can be directly applied to GNSS data collected from long span cable stayed bridges and most of outliers were efficiently detected and eliminated simultaneously. By these effects, the reliability of GNSS should be enormously improved. Improvement on GNSS positioning accuracy is directly linked to the safety of bridges itself, and at the same time, the reliability of monitoring systems in terms of the system operation can also be increased.
An Unsupervised Anomalous Event Detection and Interactive Analysis Framework for Large-scale Satellite Data

NASA Astrophysics Data System (ADS)

LIU, Q.; Lv, Q.; Klucik, R.; Chen, C.; Gallaher, D. W.; Grant, G.; Shang, L.

2016-12-01

Due to the high volume and complexity of satellite data, computer-aided tools for fast quality assessments and scientific discovery are indispensable for scientists in the era of Big Data. In this work, we have developed a framework for automated anomalous event detection in massive satellite data. The framework consists of a clustering-based anomaly detection algorithm and a cloud-based tool for interactive analysis of detected anomalies. The algorithm is unsupervised and requires no prior knowledge of the data (e.g., expected normal pattern or known anomalies). As such, it works for diverse data sets, and performs well even in the presence of missing and noisy data. The cloud-based tool provides an intuitive mapping interface that allows users to interactively analyze anomalies using multiple features. As a whole, our framework can (1) identify outliers in a spatio-temporal context, (2) recognize and distinguish meaningful anomalous events from individual outliers, (3) rank those events based on "interestingness" (e.g., rareness or total number of outliers) defined by users, and (4) enable interactively query, exploration, and analysis of those anomalous events. In this presentation, we will demonstrate the effectiveness and efficiency of our framework in the application of detecting data quality issues and unusual natural events using two satellite datasets. The techniques and tools developed in this project are applicable for a diverse set of satellite data and will be made publicly available for scientists in early 2017.
A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

USGS Publications Warehouse

Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.

2013-01-01

he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

NASA Astrophysics Data System (ADS)

Cohn, T. A.; England, J. F.; Berenbrock, C. E.; Mason, R. R.; Stedinger, J. R.; Lamontagne, J. R.

2013-08-01

The Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as "less-than" values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
Automated Discovery of Long Intergenic RNAs Associated with Breast Cancer Progression

DTIC Science & Technology

2012-02-01

manuscript in preparation), (2) development and publication of an algorithm for detecting gene fusions in RNA-Seq data [1], and (3) discovery of outlier long...subjected to de novo assembly algorithms to discover novel transcripts representing either unannotated genes or novel somatic mutations such as gene...fusions. To this end the P.I. developed and published a novel algorithm called ChimeraScan to facilitate the discovery and validation of gene
Using State Estimation Residuals to Detect Abnormal SCADA Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Jian; Chen, Yousu; Huang, Zhenyu

2010-06-14

Detection of manipulated supervisory control and data acquisition (SCADA) data is critically important for the safe and secure operation of modern power systems. In this paper, a methodology of detecting manipulated SCADA data based on state estimation residuals is presented. A framework of the proposed methodology is described. Instead of using original SCADA measurements as the bad data sources, the residuals calculated based on the results of the state estimator are used as the input for the outlier detection process. The BACON algorithm is applied to detect outliers in the state estimation residuals. The IEEE 118-bus system is used asmore » a test case to evaluate the effectiveness of the proposed methodology. The accuracy of the BACON method is compared with that of the 3-σ method for the simulated SCADA measurements and residuals.« less
Ranking Fragment Ions Based on Outlier Detection for Improved Label-Free Quantification in Data-Independent Acquisition LC-MS/MS

PubMed Central

Bilbao, Aivett; Zhang, Ying; Varesio, Emmanuel; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard

2016-01-01

Data-independent acquisition LC-MS/MS techniques complement supervised methods for peptide quantification. However, due to the wide precursor isolation windows, these techniques are prone to interference at the fragment ion level, which in turn is detrimental for accurate quantification. The “non-outlier fragment ion” (NOFI) ranking algorithm has been developed to assign low priority to fragment ions affected by interference. By using the optimal subset of high priority fragment ions these interfered fragment ions are effectively excluded from quantification. NOFI represents each fragment ion as a vector of four dimensions related to chromatographic and MS fragmentation attributes and applies multivariate outlier detection techniques. Benchmarking conducted on a well-defined quantitative dataset (i.e. the SWATH Gold Standard), indicates that NOFI on average is able to accurately quantify 11-25% more peptides than the commonly used Top-N library intensity ranking method. The sum of the area of the Top3-5 NOFIs produces similar coefficients of variation as compared to the library intensity method but with more accurate quantification results. On a biologically relevant human dendritic cell digest dataset, NOFI properly assigns low priority ranks to 85% of annotated interferences, resulting in sensitivity values between 0.92 and 0.80 against 0.76 for the Spectronaut interference detection algorithm. PMID:26412574
The stopping rules for winsorized tree

NASA Astrophysics Data System (ADS)

Ch'ng, Chee Keong; Mahat, Nor Idayu

2017-11-01

Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree. It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier. If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met. The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting. The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain.
Adaptive distributed outlier detection for WSNs.

PubMed

De Paola, Alessandra; Gaglio, Salvatore; Lo Re, Giuseppe; Milazzo, Fabrizio; Ortolani, Marco

2015-05-01

The paradigm of pervasive computing is gaining more and more attention nowadays, thanks to the possibility of obtaining precise and continuous monitoring. Ease of deployment and adaptivity are typically implemented by adopting autonomous and cooperative sensory devices; however, for such systems to be of any practical use, reliability and fault tolerance must be guaranteed, for instance by detecting corrupted readings amidst the huge amount of gathered sensory data. This paper proposes an adaptive distributed Bayesian approach for detecting outliers in data collected by a wireless sensor network; our algorithm aims at optimizing classification accuracy, time complexity and communication complexity, and also considering externally imposed constraints on such conflicting goals. The performed experimental evaluation showed that our approach is able to improve the considered metrics for latency and energy consumption, with limited impact on classification accuracy.
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms

PubMed Central

2014-01-01

On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.

PubMed

Hu, Yi-Chung

2014-01-01

On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
Outlier and target detection in aerial hyperspectral imagery: a comparison of traditional and percentage occupancy hit or miss transform techniques

NASA Astrophysics Data System (ADS)

Young, Andrew; Marshall, Stephen; Gray, Alison

2016-05-01

The use of aerial hyperspectral imagery for the purpose of remote sensing is a rapidly growing research area. Currently, targets are generally detected by looking for distinct spectral features of the objects under surveillance. For example, a camouflaged vehicle, deliberately designed to blend into background trees and grass in the visible spectrum, can be revealed using spectral features in the near-infrared spectrum. This work aims to develop improved target detection methods, using a two-stage approach, firstly by development of a physics-based atmospheric correction algorithm to convert radiance into re ectance hyperspectral image data and secondly by use of improved outlier detection techniques. In this paper the use of the Percentage Occupancy Hit or Miss Transform is explored to provide an automated method for target detection in aerial hyperspectral imagery.

Onboard Robust Visual Tracking for UAVs Using a Reliable Global-Local Object Model

PubMed Central

Fu, Changhong; Duan, Ran; Kircali, Dogan; Kayacan, Erdal

2016-01-01

In this paper, we present a novel onboard robust visual algorithm for long-term arbitrary 2D and 3D object tracking using a reliable global-local object model for unmanned aerial vehicle (UAV) applications, e.g., autonomous tracking and chasing a moving target. The first main approach in this novel algorithm is the use of a global matching and local tracking approach. In other words, the algorithm initially finds feature correspondences in a way that an improved binary descriptor is developed for global feature matching and an iterative Lucas–Kanade optical flow algorithm is employed for local feature tracking. The second main module is the use of an efficient local geometric filter (LGF), which handles outlier feature correspondences based on a new forward-backward pairwise dissimilarity measure, thereby maintaining pairwise geometric consistency. In the proposed LGF module, a hierarchical agglomerative clustering, i.e., bottom-up aggregation, is applied using an effective single-link method. The third proposed module is a heuristic local outlier factor (to the best of our knowledge, it is utilized for the first time to deal with outlier features in a visual tracking application), which further maximizes the representation of the target object in which we formulate outlier feature detection as a binary classification problem with the output features of the LGF module. Extensive UAV flight experiments show that the proposed visual tracker achieves real-time frame rates of more than thirty-five frames per second on an i7 processor with 640 × 512 image resolution and outperforms the most popular state-of-the-art trackers favorably in terms of robustness, efficiency and accuracy. PMID:27589769
Multi-modal automatic montaging of adaptive optics retinal images

PubMed Central

Chen, Min; Cooper, Robert F.; Han, Grace K.; Gee, James; Brainard, David H.; Morgan, Jessica I. W.

2016-01-01

We present a fully automated adaptive optics (AO) retinal image montaging algorithm using classic scale invariant feature transform with random sample consensus for outlier removal. Our approach is capable of using information from multiple AO modalities (confocal, split detection, and dark field) and can accurately detect discontinuities in the montage. The algorithm output is compared to manual montaging by evaluating the similarity of the overlapping regions after montaging, and calculating the detection rate of discontinuities in the montage. Our results show that the proposed algorithm has high alignment accuracy and a discontinuity detection rate that is comparable (and often superior) to manual montaging. In addition, we analyze and show the benefits of using multiple modalities in the montaging process. We provide the algorithm presented in this paper as open-source and freely available to download. PMID:28018714
Geomagnetic matching navigation algorithm based on robust estimation

NASA Astrophysics Data System (ADS)

Xie, Weinan; Huang, Liping; Qu, Zhenshen; Wang, Zhenhuan

2017-08-01

The outliers in the geomagnetic survey data seriously affect the precision of the geomagnetic matching navigation and badly disrupt its reliability. A novel algorithm which can eliminate the outliers influence is investigated in this paper. First, the weight function is designed and its principle of the robust estimation is introduced. By combining the relation equation between the matching trajectory and the reference trajectory with the Taylor series expansion for geomagnetic information, a mathematical expression of the longitude, latitude and heading errors is acquired. The robust target function is obtained by the weight function and the mathematical expression. Then the geomagnetic matching problem is converted to the solutions of nonlinear equations. Finally, Newton iteration is applied to implement the novel algorithm. Simulation results show that the matching error of the novel algorithm is decreased to 7.75% compared to the conventional mean square difference (MSD) algorithm, and is decreased to 18.39% to the conventional iterative contour matching algorithm when the outlier is 40nT. Meanwhile, the position error of the novel algorithm is 0.017° while the other two algorithms fail to match when the outlier is 400nT.
A new algorithm for automatic Outlier Detection in GPS Time Series

NASA Astrophysics Data System (ADS)

Cannavo', Flavio; Mattia, Mario; Rossi, Massimo; Palano, Mimmo; Bruno, Valentina

2010-05-01

Nowadays continuous GPS time series are considered a crucial product of GPS permanent networks, useful in many geo-science fields, such as active tectonics, seismology, crustal deformation and volcano monitoring (Altamimi et al. 2002, Elósegui et al. 2006, Aloisi et al. 2009). Although the GPS data elaboration software has increased in reliability, the time series are still affected by different kind of noise, from the intrinsic noise (e.g. thropospheric delay) to the un-modeled noise (e.g. cycle slips, satellite faults, parameters changing). Typically GPS Time Series present characteristic noise that is a linear combination of white noise and correlated colored noise, and this characteristic is fractal in the sense that is evident for every considered time scale or sampling rate. The un-modeled noise sources result in spikes, outliers and steps. These kind of errors can appreciably influence the estimation of velocities of the monitored sites. The outlier detection in generic time series is a widely treated problem in literature (Wei, 2005), while is not fully developed for the specific kind of GPS series. We propose a robust automatic procedure for cleaning the GPS time series from the outliers and, especially for long daily series, steps due to strong seismic or volcanic events or merely instrumentation changing such as antenna and receiver upgrades. The procedure is basically divided in two steps: a first step for the colored noise reduction and a second step for outlier detection through adaptive series segmentation. Both algorithms present novel ideas and are nearly unsupervised. In particular, we propose an algorithm to estimate an autoregressive model for colored noise in GPS time series in order to subtract the effect of non Gaussian noise on the series. This step is useful for the subsequent step (i.e. adaptive segmentation) which requires the hypothesis of Gaussian noise. The proposed algorithms are tested in a benchmark case study and the results confirm that the algorithms are effective and reasonable. Bibliography - Aloisi M., A. Bonaccorso, F. Cannavò, S. Gambino, M. Mattia, G. Puglisi, E. Boschi, A new dyke intrusion style for the Mount Etna May 2008 eruption modelled through continuous tilt and GPS data, Terra Nova, Volume 21 Issue 4 , Pages 316 - 321, doi: 10.1111/j.1365-3121.2009.00889.x (August 2009) - Altamimi Z., Sillard P., Boucher C., ITRF2000: A new release of the International Terrestrial Reference frame for earth science applications, J Geophys Res-Solid Earth, 107 (B10): art. no.-2214, (Oct 2002) - Elósegui, P., J. L. Davis, D. Oberlander, R. Baena, and G. Ekström , Accuracy of high-rate GPS for seismology, Geophys. Res. Lett., 33, L11308, doi:10.1029/2006GL026065 (2006) - Wei W. S., Time Series Analysis: Univariate and Multivariate Methods, Addison Wesley (2 edition), ISBN-10: 0321322169 (July, 2005)
Supervised Detection of Anomalous Light Curves in Massive Astronomical Catalogs

NASA Astrophysics Data System (ADS)

Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won

2014-09-01

The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.
A Geometrical-Statistical Approach to Outlier Removal for TDOA Measurements

NASA Astrophysics Data System (ADS)

Compagnoni, Marco; Pini, Alessia; Canclini, Antonio; Bestagini, Paolo; Antonacci, Fabio; Tubaro, Stefano; Sarti, Augusto

2017-08-01

The curse of outlier measurements in estimation problems is a well known issue in a variety of fields. Therefore, outlier removal procedures, which enables the identification of spurious measurements within a set, have been developed for many different scenarios and applications. In this paper, we propose a statistically motivated outlier removal algorithm for time differences of arrival (TDOAs), or equivalently range differences (RD), acquired at sensor arrays. The method exploits the TDOA-space formalism and works by only knowing relative sensor positions. As the proposed method is completely independent from the application for which measurements are used, it can be reliably used to identify outliers within a set of TDOA/RD measurements in different fields (e.g. acoustic source localization, sensor synchronization, radar, remote sensing, etc.). The proposed outlier removal algorithm is validated by means of synthetic simulations and real experiments.
Analysis of lightning outliers in the EUCLID network

NASA Astrophysics Data System (ADS)

Poelman, Dieter R.; Schulz, Wolfgang; Kaltenboeck, Rudolf; Delobbe, Laurent

2017-11-01

Lightning data as observed by the European Cooperation for Lightning Detection (EUCLID) network are used in combination with radar data to retrieve the temporal and spatial behavior of lightning outliers, i.e., discharges located in a wrong place, over a 5-year period from 2011 to 2016. Cloud-to-ground (CG) stroke and intracloud (IC) pulse data are superimposed on corresponding 5 min radar precipitation fields in two topographically different areas, Belgium and Austria, in order to extract lightning outliers based on the distance between each lightning event and the nearest precipitation. It is shown that the percentage of outliers is sensitive to changes in the network and to the location algorithm itself. The total percentage of outliers for both regions varies over the years between 0.8 and 1.7 % for a distance to the nearest precipitation of 2 km, with an average of approximately 1.2 % in Belgium and Austria. Outside the European summer thunderstorm season, the percentage of outliers tends to increase somewhat. The majority of all the outliers are low peak current events with absolute values falling between 0 and 10 kA. More specifically, positive cloud-to-ground strokes are more likely to be classified as outliers compared to all other types of discharges. Furthermore, it turns out that the number of sensors participating in locating a lightning discharge is different for outliers versus correctly located events, with outliers having the lowest amount of sensors participating. In addition, it is shown that in most cases the semi-major axis (SMA) assigned to a lightning discharge as a confidence indicator in the location accuracy (LA) is smaller for correctly located events compared to the semi-major axis of outliers.
Development of a methodology for the detection of hospital financial outliers using information systems.

PubMed

Okada, Sachiko; Nagase, Keisuke; Ito, Ayako; Ando, Fumihiko; Nakagawa, Yoshiaki; Okamoto, Kazuya; Kume, Naoto; Takemura, Tadamasa; Kuroda, Tomohiro; Yoshihara, Hiroyuki

2014-01-01

Comparison of financial indices helps to illustrate differences in operations and efficiency among similar hospitals. Outlier data tend to influence statistical indices, and so detection of outliers is desirable. Development of a methodology for financial outlier detection using information systems will help to reduce the time and effort required, eliminate the subjective elements in detection of outlier data, and improve the efficiency and quality of analysis. The purpose of this research was to develop such a methodology. Financial outliers were defined based on a case model. An outlier-detection method using the distances between cases in multi-dimensional space is proposed. Experiments using three diagnosis groups indicated successful detection of cases for which the profitability and income structure differed from other cases. Therefore, the method proposed here can be used to detect outliers. Copyright © 2013 John Wiley & Sons, Ltd.
Incremental Principal Component Analysis Based Outlier Detection Methods for Spatiotemporal Data Streams

NASA Astrophysics Data System (ADS)

Bhushan, A.; Sharker, M. H.; Karimi, H. A.

2015-07-01

In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams.
Automated rice leaf disease detection using color image analysis

NASA Astrophysics Data System (ADS)

Pugoy, Reinald Adrian D. L.; Mariano, Vladimir Y.

2011-06-01

In rice-related institutions such as the International Rice Research Institute, assessing the health condition of a rice plant through its leaves, which is usually done as a manual eyeball exercise, is important to come up with good nutrient and disease management strategies. In this paper, an automated system that can detect diseases present in a rice leaf using color image analysis is presented. In the system, the outlier region is first obtained from a rice leaf image to be tested using histogram intersection between the test and healthy rice leaf images. Upon obtaining the outlier, it is then subjected to a threshold-based K-means clustering algorithm to group related regions into clusters. Then, these clusters are subjected to further analysis to finally determine the suspected diseases of the rice leaf.
Image Corruption Detection in Diffusion Tensor Imaging for Post-Processing and Real-Time Monitoring

PubMed Central

Li, Yue; Shea, Steven M.; Lorenz, Christine H.; Jiang, Hangyi; Chou, Ming-Chung; Mori, Susumu

2013-01-01

Due to the high sensitivity of diffusion tensor imaging (DTI) to physiological motion, clinical DTI scans often suffer a significant amount of artifacts. Tensor-fitting-based, post-processing outlier rejection is often used to reduce the influence of motion artifacts. Although it is an effective approach, when there are multiple corrupted data, this method may no longer correctly identify and reject the corrupted data. In this paper, we introduce a new criterion called “corrected Inter-Slice Intensity Discontinuity” (cISID) to detect motion-induced artifacts. We compared the performance of algorithms using cISID and other existing methods with regard to artifact detection. The experimental results show that the integration of cISID into fitting-based methods significantly improves the retrospective detection performance at post-processing analysis. The performance of the cISID criterion, if used alone, was inferior to the fitting-based methods, but cISID could effectively identify severely corrupted images with a rapid calculation time. In the second part of this paper, an outlier rejection scheme was implemented on a scanner for real-time monitoring of image quality and reacquisition of the corrupted data. The real-time monitoring, based on cISID and followed by post-processing, fitting-based outlier rejection, could provide a robust environment for routine DTI studies. PMID:24204551
Root System Water Consumption Pattern Identification on Time Series Data.

PubMed

Figueroa, Manuel; Pope, Christopher

2017-06-16

In agriculture, soil and meteorological sensors are used along low power networks to capture data, which allows for optimal resource usage and minimizing environmental impact. This study uses time series analysis methods for outliers' detection and pattern recognition on soil moisture sensor data to identify irrigation and consumption patterns and to improve a soil moisture prediction and irrigation system. This study compares three new algorithms with the current detection technique in the project; the results greatly decrease the number of false positives detected. The best result is obtained by the Series Strings Comparison (SSC) algorithm averaging a precision of 0.872 on the testing sets, vastly improving the current system's 0.348 precision.
Outliers in Questionnaire Data: Can They Be Detected and Should They Be Removed?

ERIC Educational Resources Information Center

Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

2011-01-01

Outliers in questionnaire data are unusual observations, which may bias statistical results, and outlier statistics may be used to detect such outliers. The authors investigated the effect outliers have on the specificity and the sensitivity of each of six different outlier statistics. The Mahalanobis distance and the item-pair based outlier…
WAMS measurements pre-processing for detecting low-frequency oscillations in power systems

NASA Astrophysics Data System (ADS)

Kovalenko, P. Y.

2017-07-01

Processing the data received from measurement systems implies the situation when one or more registered values stand apart from the sample collection. These values are referred to as “outliers”. The processing results may be influenced significantly by the presence of those in the data sample under consideration. In order to ensure the accuracy of low-frequency oscillations detection in power systems the corresponding algorithm has been developed for the outliers detection and elimination. The algorithm is based on the concept of the irregular component of measurement signal. This component comprises measurement errors and is assumed to be Gauss-distributed random. The median filtering is employed to detect the values lying outside the range of the normally distributed measurement error on the basis of a 3σ criterion. The algorithm has been validated involving simulated signals and WAMS data as well.
Simulation of a Geiger-Mode Imaging LADAR System for Performance Assessment

PubMed Central

Kim, Seongjoon; Lee, Impyeong; Kwon, Yong Joon

2013-01-01

As LADAR systems applications gradually become more diverse, new types of systems are being developed. When developing new systems, simulation studies are an essential prerequisite. A simulator enables performance predictions and optimal system parameters at the design level, as well as providing sample data for developing and validating application algorithms. The purpose of the study is to propose a method for simulating a Geiger-mode imaging LADAR system. We develop simulation software to assess system performance and generate sample data for the applications. The simulation is based on three aspects of modeling—the geometry, radiometry and detection. The geometric model computes the ranges to the reflection points of the laser pulses. The radiometric model generates the return signals, including the noises. The detection model determines the flight times of the laser pulses based on the nature of the Geiger-mode detector. We generated sample data using the simulator with the system parameters and analyzed the detection performance by comparing the simulated points to the reference points. The proportion of the outliers in the simulated points reached 25.53%, indicating the need for efficient outlier elimination algorithms. In addition, the false alarm rate and dropout rate of the designed system were computed as 1.76% and 1.06%, respectively. PMID:23823970
A Robust Method for Ego-Motion Estimation in Urban Environment Using Stereo Camera.

PubMed

Ci, Wenyan; Huang, Yingping

2016-10-17

Visual odometry estimates the ego-motion of an agent (e.g., vehicle and robot) using image information and is a key component for autonomous vehicles and robotics. This paper proposes a robust and precise method for estimating the 6-DoF ego-motion, using a stereo rig with optical flow analysis. An objective function fitted with a set of feature points is created by establishing the mathematical relationship between optical flow, depth and camera ego-motion parameters through the camera's 3-dimensional motion and planar imaging model. Accordingly, the six motion parameters are computed by minimizing the objective function, using the iterative Levenberg-Marquard method. One of key points for visual odometry is that the feature points selected for the computation should contain inliers as much as possible. In this work, the feature points and their optical flows are initially detected by using the Kanade-Lucas-Tomasi (KLT) algorithm. A circle matching is followed to remove the outliers caused by the mismatching of the KLT algorithm. A space position constraint is imposed to filter out the moving points from the point set detected by the KLT algorithm. The Random Sample Consensus (RANSAC) algorithm is employed to further refine the feature point set, i.e., to eliminate the effects of outliers. The remaining points are tracked to estimate the ego-motion parameters in the subsequent frames. The approach presented here is tested on real traffic videos and the results prove the robustness and precision of the method.
A Robust Method for Ego-Motion Estimation in Urban Environment Using Stereo Camera

PubMed Central

Ci, Wenyan; Huang, Yingping

2016-01-01

Visual odometry estimates the ego-motion of an agent (e.g., vehicle and robot) using image information and is a key component for autonomous vehicles and robotics. This paper proposes a robust and precise method for estimating the 6-DoF ego-motion, using a stereo rig with optical flow analysis. An objective function fitted with a set of feature points is created by establishing the mathematical relationship between optical flow, depth and camera ego-motion parameters through the camera’s 3-dimensional motion and planar imaging model. Accordingly, the six motion parameters are computed by minimizing the objective function, using the iterative Levenberg–Marquard method. One of key points for visual odometry is that the feature points selected for the computation should contain inliers as much as possible. In this work, the feature points and their optical flows are initially detected by using the Kanade–Lucas–Tomasi (KLT) algorithm. A circle matching is followed to remove the outliers caused by the mismatching of the KLT algorithm. A space position constraint is imposed to filter out the moving points from the point set detected by the KLT algorithm. The Random Sample Consensus (RANSAC) algorithm is employed to further refine the feature point set, i.e., to eliminate the effects of outliers. The remaining points are tracked to estimate the ego-motion parameters in the subsequent frames. The approach presented here is tested on real traffic videos and the results prove the robustness and precision of the method. PMID:27763508
Outlier Detection in Urban Air Quality Sensor Networks.

PubMed

van Zoest, V M; Stein, A; Hoek, G

2018-01-01

Low-cost urban air quality sensor networks are increasingly used to study the spatio-temporal variability in air pollutant concentrations. Recently installed low-cost urban sensors, however, are more prone to result in erroneous data than conventional monitors, e.g., leading to outliers. Commonly applied outlier detection methods are unsuitable for air pollutant measurements that have large spatial and temporal variations as occur in urban areas. We present a novel outlier detection method based upon a spatio-temporal classification, focusing on hourly NO 2 concentrations. We divide a full year's observations into 16 spatio-temporal classes, reflecting urban background vs. urban traffic stations, weekdays vs. weekends, and four periods per day. For each spatio-temporal class, we detect outliers using the mean and standard deviation of the normal distribution underlying the truncated normal distribution of the NO 2 observations. Applying this method to a low-cost air quality sensor network in the city of Eindhoven, the Netherlands, we found 0.1-0.5% of outliers. Outliers could reflect measurement errors or unusual high air pollution events. Additional evaluation using expert knowledge is needed to decide on treatment of the identified outliers. We conclude that our method is able to detect outliers while maintaining the spatio-temporal variability of air pollutant concentrations in urban areas.
Fusion of an Ensemble of Augmented Image Detectors for Robust Object Detection

PubMed Central

Wei, Pan; Anderson, Derek T.

2018-01-01

A significant challenge in object detection is accurate identification of an object’s position in image space, whereas one algorithm with one set of parameters is usually not enough, and the fusion of multiple algorithms and/or parameters can lead to more robust results. Herein, a new computational intelligence fusion approach based on the dynamic analysis of agreement among object detection outputs is proposed. Furthermore, we propose an online versus just in training image augmentation strategy. Experiments comparing the results both with and without fusion are presented. We demonstrate that the augmented and fused combination results are the best, with respect to higher accuracy rates and reduction of outlier influences. The approach is demonstrated in the context of cone, pedestrian and box detection for Advanced Driver Assistance Systems (ADAS) applications. PMID:29562609
Automated peroperative assessment of stents apposition from OCT pullbacks.

PubMed

Dubuisson, Florian; Péry, Emilie; Ouchchane, Lemlih; Combaret, Nicolas; Kauffmann, Claude; Souteyrand, Géraud; Motreff, Pascal; Sarry, Laurent

2015-04-01

This study's aim was to control the stents apposition by automatically analyzing endovascular optical coherence tomography (OCT) sequences. Lumen is detected using threshold, morphological and gradient operators to run a Dijkstra algorithm. Wrong detection tagged by the user and caused by bifurcation, struts'presence, thrombotic lesions or dissections can be corrected using a morphing algorithm. Struts are also segmented by computing symmetrical and morphological operators. Euclidian distance between detected struts and wall artery initializes a stent's complete distance map and missing data are interpolated with thin-plate spline functions. Rejection of detected outliers, regularization of parameters by generalized cross-validation and using the one-side cyclic property of the map also optimize accuracy. Several indices computed from the map provide quantitative values of malapposition. Algorithm was run on four in-vivo OCT sequences including different incomplete stent apposition's cases. Comparison with manual expert measurements validates the segmentation׳s accuracy and shows an almost perfect concordance of automated results. Copyright © 2014 Elsevier Ltd. All rights reserved.

Detecting measurement outliers: remeasure efficiently

NASA Astrophysics Data System (ADS)

Ullrich, Albrecht

2010-09-01

Shrinking structures, advanced optical proximity correction (OPC) and complex measurement strategies continually challenge critical dimension (CD) metrology tools and recipe creation processes. One important quality ensuring task is the control of measurement outlier behavior. Outliers could trigger false positive alarm for specification violations impacting cycle time or potentially yield. Constant high level of outliers not only deteriorates cycle time but also puts unnecessary stress on tool operators leading eventually to human errors. At tool level the sources of outliers are natural variations (e.g. beam current etc.), drifts, contrast conditions, focus determination or pattern recognition issues, etc. Some of these can result from suboptimal or even wrong recipe settings, like focus position or measurement box size. Such outliers, created by an automatic recipe creation process faced with more complicated structures, would manifest itself rather as systematic variation of measurements than the one caused by 'pure' tool variation. I analyzed several statistical methods to detect outliers. These range from classical outlier tests for extrema, robust metrics like interquartile range (IQR) to methods evaluating the distribution of different populations of measurement sites, like the Cochran test. The latter suits especially the detection of systematic effects. The next level of outlier detection entwines additional information about the mask and the manufacturing process with the measurement results. The methods were reviewed for measured variations assumed to be normally distributed with zero mean but also for the presence of a statistically significant spatial process signature. I arrive at the conclusion that intelligent outlier detection can influence the efficiency and cycle time of CD metrology greatly. In combination with process information like target, typical platform variation and signature, one can tailor the detection to the needs of the photomask at hand. By monitoring the outlier behavior carefully, weaknesses of the automatic recipe creation process can be spotted.
Quality assurance using outlier detection on an automatic segmentation method for the cerebellar peduncles

NASA Astrophysics Data System (ADS)

Li, Ke; Ye, Chuyang; Yang, Zhen; Carass, Aaron; Ying, Sarah H.; Prince, Jerry L.

2016-03-01

Cerebellar peduncles (CPs) are white matter tracts connecting the cerebellum to other brain regions. Automatic segmentation methods of the CPs have been proposed for studying their structure and function. Usually the performance of these methods is evaluated by comparing segmentation results with manual delineations (ground truth). However, when a segmentation method is run on new data (for which no ground truth exists) it is highly desirable to efficiently detect and assess algorithm failures so that these cases can be excluded from scientific analysis. In this work, two outlier detection methods aimed to assess the performance of an automatic CP segmentation algorithm are presented. The first one is a univariate non-parametric method using a box-whisker plot. We first categorize automatic segmentation results of a dataset of diffusion tensor imaging (DTI) scans from 48 subjects as either a success or a failure. We then design three groups of features from the image data of nine categorized failures for failure detection. Results show that most of these features can efficiently detect the true failures. The second method—supervised classification—was employed on a larger DTI dataset of 249 manually categorized subjects. Four classifiers—linear discriminant analysis (LDA), logistic regression (LR), support vector machine (SVM), and random forest classification (RFC)—were trained using the designed features and evaluated using a leave-one-out cross validation. Results show that the LR performs worst among the four classifiers and the other three perform comparably, which demonstrates the feasibility of automatically detecting segmentation failures using classification methods.
Algorithm for Identifying Erroneous Rain-Gauge Readings

NASA Technical Reports Server (NTRS)

Rickman, Doug

2005-01-01

An algorithm analyzes rain-gauge data to identify statistical outliers that could be deemed to be erroneous readings. Heretofore, analyses of this type have been performed in burdensome manual procedures that have involved subjective judgements. Sometimes, the analyses have included computational assistance for detecting values falling outside of arbitrary limits. The analyses have been performed without statistically valid knowledge of the spatial and temporal variations of precipitation within rain events. In contrast, the present algorithm makes it possible to automate such an analysis, makes the analysis objective, takes account of the spatial distribution of rain gauges in conjunction with the statistical nature of spatial variations in rainfall readings, and minimizes the use of arbitrary criteria. The algorithm implements an iterative process that involves nonparametric statistics.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Nun, Isadora; Pichara, Karim; Protopapas, Pavlos

The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each ofmore » the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.« less
Mining Distance Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

NASA Technical Reports Server (NTRS)

Bay, Stephen D.; Schwabacher, Mark

2003-01-01

Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.
Ensemble Learning Method for Outlier Detection and its Application to Astronomical Light Curves

NASA Astrophysics Data System (ADS)

Nun, Isadora; Protopapas, Pavlos; Sim, Brandon; Chen, Wesley

2016-09-01

Outlier detection is necessary for automated data analysis, with specific applications spanning almost every domain from financial markets to epidemiology to fraud detection. We introduce a novel mixture of the experts outlier detection model, which uses a dynamically trained, weighted network of five distinct outlier detection methods. After dimensionality reduction, individual outlier detection methods score each data point for “outlierness” in this new feature space. Our model then uses dynamically trained parameters to weigh the scores of each method, allowing for a finalized outlier score. We find that the mixture of experts model performs, on average, better than any single expert model in identifying both artificially and manually picked outliers. This mixture model is applied to a data set of astronomical light curves, after dimensionality reduction via time series feature extraction. Our model was tested using three fields from the MACHO catalog and generated a list of anomalous candidates. We confirm that the outliers detected using this method belong to rare classes, like Novae, He-burning, and red giant stars; other outlier light curves identified have no available information associated with them. To elucidate their nature, we created a website containing the light-curve data and information about these objects. Users can attempt to classify the light curves, give conjectures about their identities, and sign up for follow up messages about the progress made on identifying these objects. This user submitted data can be used further train of our mixture of experts model. Our code is publicly available to all who are interested.
Detecting multiple outliers in linear functional relationship model for circular variables using clustering technique

NASA Astrophysics Data System (ADS)

Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor

2017-05-01

Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.
Identification of moisture content in tobacco plant leaves using outlier sample eliminating algorithms and hyperspectral data.

PubMed

Sun, Jun; Zhou, Xin; Wu, Xiaohong; Zhang, Xiaodong; Li, Qinglin

2016-02-26

Fast identification of moisture content in tobacco plant leaves plays a key role in the tobacco cultivation industry and benefits the management of tobacco plant in the farm. In order to identify moisture content of tobacco plant leaves in a fast and nondestructive way, a method involving Mahalanobis distance coupled with Monte Carlo cross validation(MD-MCCV) was proposed to eliminate outlier sample in this study. The hyperspectral data of 200 tobacco plant leaf samples of 20 moisture gradients were obtained using FieldSpc(®) 3 spectrometer. Savitzky-Golay smoothing(SG), roughness penalty smoothing(RPS), kernel smoothing(KS) and median smoothing(MS) were used to preprocess the raw spectra. In addition, Mahalanobis distance(MD), Monte Carlo cross validation(MCCV) and Mahalanobis distance coupled to Monte Carlo cross validation(MD-MCCV) were applied to select the outlier sample of the raw spectrum and four smoothing preprocessing spectra. Successive projections algorithm (SPA) was used to extract the most influential wavelengths. Multiple Linear Regression (MLR) was applied to build the prediction models based on preprocessed spectra feature in characteristic wavelengths. The results showed that the preferably four prediction model were MD-MCCV-SG (Rp(2) = 0.8401 and RMSEP = 0.1355), MD-MCCV-RPS (Rp(2) = 0.8030 and RMSEP = 0.1274), MD-MCCV-KS (Rp(2) = 0.8117 and RMSEP = 0.1433), MD-MCCV-MS (Rp(2) = 0.9132 and RMSEP = 0.1162). MD-MCCV algorithm performed best among MD algorithm, MCCV algorithm and the method without sample pretreatment algorithm in the eliminating outlier sample from 20 different moisture gradients of tobacco plant leaves and MD-MCCV can be used to eliminate outlier sample in the spectral preprocessing. Copyright © 2016 Elsevier Inc. All rights reserved.
The LSST Data Mining Research Agenda

NASA Astrophysics Data System (ADS)

Borne, K.; Becla, J.; Davidson, I.; Szalay, A.; Tyson, J. A.

2008-12-01

We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night) multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.
Simulated performance of an order statistic threshold strategy for detection of narrowband signals

NASA Technical Reports Server (NTRS)

Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.

1988-01-01

The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.
Optoelectronic instrumentation enhancement using data mining feedback for a 3D measurement system

NASA Astrophysics Data System (ADS)

Flores-Fuentes, Wendy; Sergiyenko, Oleg; Gonzalez-Navarro, Félix F.; Rivas-López, Moisés; Hernandez-Balbuena, Daniel; Rodríguez-Quiñonez, Julio C.; Tyrsa, Vera; Lindner, Lars

2016-12-01

3D measurement by a cyber-physical system based on optoelectronic scanning instrumentation has been enhanced by outliers and regression data mining feedback. The prototype has applications in (1) industrial manufacturing systems that include: robotic machinery, embedded vision, and motion control, (2) health care systems for measurement scanning, and (3) infrastructure by providing structural health monitoring. This paper presents new research performed in data processing of a 3D measurement vision sensing database. Outliers from multivariate data have been detected and removal to improve artificial intelligence regression algorithm results. Physical measurement error regression data has been used for 3D measurements error correction. Concluding, that the joint of physical phenomena, measurement and computation is an effectiveness action for feedback loops in the control of industrial, medical and civil tasks.
Detection of Outliers in Spatial-Temporal Data

ERIC Educational Resources Information Center

Rogers, James P.

2010-01-01

Outlier detection is an important data mining task that is focused on the discovery of objects that deviate significantly when compared with a set of observations that are considered typical. Outlier detection can reveal objects that behave anomalously with respect to other observations, and these objects may highlight current or future problems. …
Trend-Residual Dual Modeling for Detection of Outliers in Low-Cost GPS Trajectories.

PubMed

Chen, Xiaojian; Cui, Tingting; Fu, Jianhong; Peng, Jianwei; Shan, Jie

2016-12-01

Low-cost GPS (receiver) has become a ubiquitous and integral part of our daily life. Despite noticeable advantages such as being cheap, small, light, and easy to use, its limited positioning accuracy devalues and hampers its wide applications for reliable mapping and analysis. Two conventional techniques to remove outliers in a GPS trajectory are thresholding and Kalman-based methods, which are difficult in selecting appropriate thresholds and modeling the trajectories. Moreover, they are insensitive to medium and small outliers, especially for low-sample-rate trajectories. This paper proposes a model-based GPS trajectory cleaner. Rather than examining speed and acceleration or assuming a pre-determined trajectory model, we first use cubic smooth spline to adaptively model the trend of the trajectory. The residuals, i.e., the differences between the trend and GPS measurements, are then further modeled by time series method. Outliers are detected by scoring the residuals at every GPS trajectory point. Comparing to the conventional procedures, the trend-residual dual modeling approach has the following features: (a) it is able to model trajectories and detect outliers adaptively; (b) only one critical value for outlier scores needs to be set; (c) it is able to robustly detect unapparent outliers; and (d) it is effective in cleaning outliers for GPS trajectories with low sample rates. Tests are carried out on three real-world GPS trajectories datasets. The evaluation demonstrates an average of 9.27 times better performance in outlier detection for GPS trajectories than thresholding and Kalman-based techniques.
Genetic Algorithm for Initial Orbit Determination with Too Short Arc (Continued)

NASA Astrophysics Data System (ADS)

Li, X. R.; Wang, X.

2016-03-01

When using the genetic algorithm to solve the problem of too-short-arc (TSA) determination, due to the difference of computing processes between the genetic algorithm and classical method, the methods for outliers editing are no longer applicable. In the genetic algorithm, the robust estimation is acquired by means of using different loss functions in the fitness function, then the outlier problem of TSAs is solved. Compared with the classical method, the application of loss functions in the genetic algorithm is greatly simplified. Through the comparison of results of different loss functions, it is clear that the methods of least median square and least trimmed square can greatly improve the robustness of TSAs, and have a high breakdown point.
Identification of piecewise affine systems based on fuzzy PCA-guided robust clustering technique

NASA Astrophysics Data System (ADS)

Khanmirza, Esmaeel; Nazarahari, Milad; Mousavi, Alireza

2016-12-01

Hybrid systems are a class of dynamical systems whose behaviors are based on the interaction between discrete and continuous dynamical behaviors. Since a general method for the analysis of hybrid systems is not available, some researchers have focused on specific types of hybrid systems. Piecewise affine (PWA) systems are one of the subsets of hybrid systems. The identification of PWA systems includes the estimation of the parameters of affine subsystems and the coefficients of the hyperplanes defining the partition of the state-input domain. In this paper, we have proposed a PWA identification approach based on a modified clustering technique. By using a fuzzy PCA-guided robust k-means clustering algorithm along with neighborhood outlier detection, the two main drawbacks of the well-known clustering algorithms, i.e., the poor initialization and the presence of outliers, are eliminated. Furthermore, this modified clustering technique enables us to determine the number of subsystems without any prior knowledge about system. In addition, applying the structure of the state-input domain, that is, considering the time sequence of input-output pairs, provides a more efficient clustering algorithm, which is the other novelty of this work. Finally, the proposed algorithm has been evaluated by parameter identification of an IGV servo actuator. Simulation together with experiment analysis has proved the effectiveness of the proposed method.
Robust and Adaptive Online Time Series Prediction with Long Short-Term Memory

PubMed Central

Tao, Qing

2017-01-01

Online time series prediction is the mainstream method in a wide range of fields, ranging from speech analysis and noise cancelation to stock market analysis. However, the data often contains many outliers with the increasing length of time series in real world. These outliers can mislead the learned model if treated as normal points in the process of prediction. To address this issue, in this paper, we propose a robust and adaptive online gradient learning method, RoAdam (Robust Adam), for long short-term memory (LSTM) to predict time series with outliers. This method tunes the learning rate of the stochastic gradient algorithm adaptively in the process of prediction, which reduces the adverse effect of outliers. It tracks the relative prediction error of the loss function with a weighted average through modifying Adam, a popular stochastic gradient method algorithm for training deep neural networks. In our algorithm, the large value of the relative prediction error corresponds to a small learning rate, and vice versa. The experiments on both synthetic data and real time series show that our method achieves better performance compared to the existing methods based on LSTM. PMID:29391864
Robust and Adaptive Online Time Series Prediction with Long Short-Term Memory.

PubMed

Yang, Haimin; Pan, Zhisong; Tao, Qing

2017-01-01

Online time series prediction is the mainstream method in a wide range of fields, ranging from speech analysis and noise cancelation to stock market analysis. However, the data often contains many outliers with the increasing length of time series in real world. These outliers can mislead the learned model if treated as normal points in the process of prediction. To address this issue, in this paper, we propose a robust and adaptive online gradient learning method, RoAdam (Robust Adam), for long short-term memory (LSTM) to predict time series with outliers. This method tunes the learning rate of the stochastic gradient algorithm adaptively in the process of prediction, which reduces the adverse effect of outliers. It tracks the relative prediction error of the loss function with a weighted average through modifying Adam, a popular stochastic gradient method algorithm for training deep neural networks. In our algorithm, the large value of the relative prediction error corresponds to a small learning rate, and vice versa. The experiments on both synthetic data and real time series show that our method achieves better performance compared to the existing methods based on LSTM.
Anomaly detection in hyperspectral imagery: statistics vs. graph-based algorithms

NASA Astrophysics Data System (ADS)

Berkson, Emily E.; Messinger, David W.

2016-05-01

Anomaly detection (AD) algorithms are frequently applied to hyperspectral imagery, but different algorithms produce different outlier results depending on the image scene content and the assumed background model. This work provides the first comparison of anomaly score distributions between common statistics-based anomaly detection algorithms (RX and subspace-RX) and the graph-based Topological Anomaly Detector (TAD). Anomaly scores in statistical AD algorithms should theoretically approximate a chi-squared distribution; however, this is rarely the case with real hyperspectral imagery. The expected distribution of scores found with graph-based methods remains unclear. We also look for general trends in algorithm performance with varied scene content. Three separate scenes were extracted from the hyperspectral MegaScene image taken over downtown Rochester, NY with the VIS-NIR-SWIR ProSpecTIR instrument. In order of most to least cluttered, we study an urban, suburban, and rural scene. The three AD algorithms were applied to each scene, and the distributions of the most anomalous 5% of pixels were compared. We find that subspace-RX performs better than RX, because the data becomes more normal when the highest variance principal components are removed. We also see that compared to statistical detectors, anomalies detected by TAD are easier to separate from the background. Due to their different underlying assumptions, the statistical and graph-based algorithms highlighted different anomalies within the urban scene. These results will lead to a deeper understanding of these algorithms and their applicability across different types of imagery.
Automatic detection of end-diastolic and end-systolic frames in 2D echocardiography.

PubMed

Zolgharni, Massoud; Negoita, Madalina; Dhutia, Niti M; Mielewczik, Michael; Manoharan, Karikaran; Sohaib, S M Afzal; Finegold, Judith A; Sacchi, Stefania; Cole, Graham D; Francis, Darrel P

2017-07-01

Correctly selecting the end-diastolic and end-systolic frames on a 2D echocardiogram is important and challenging, for both human experts and automated algorithms. Manual selection is time-consuming and subject to uncertainty, and may affect the results obtained, especially for advanced measurements such as myocardial strain. We developed and evaluated algorithms which can automatically extract global and regional cardiac velocity, and identify end-diastolic and end-systolic frames. We acquired apical four-chamber 2D echocardiographic video recordings, each at least 10 heartbeats long, acquired twice at frame rates of 52 and 79 frames/s from 19 patients, yielding 38 recordings. Five experienced echocardiographers independently marked end-systolic and end-diastolic frames for the first 10 heartbeats of each recording. The automated algorithm also did this. Using the average of time points identified by five human operators as the reference gold standard, the individual operators had a root mean square difference from that gold standard of 46.5 ms. The algorithm had a root mean square difference from the human gold standard of 40.5 ms (P<.0001). Put another way, the algorithm-identified time point was an outlier in 122/564 heartbeats (21.6%), whereas the average human operator was an outlier in 254/564 heartbeats (45%). An automated algorithm can identify the end-systolic and end-diastolic frames with performance indistinguishable from that of human experts. This saves staff time, which could therefore be invested in assessing more beats, and reduces uncertainty about the reliability of the choice of frame. © 2017, Wiley Periodicals, Inc.
Trend-Residual Dual Modeling for Detection of Outliers in Low-Cost GPS Trajectories

PubMed Central

Chen, Xiaojian; Cui, Tingting; Fu, Jianhong; Peng, Jianwei; Shan, Jie

2016-01-01

Low-cost GPS (receiver) has become a ubiquitous and integral part of our daily life. Despite noticeable advantages such as being cheap, small, light, and easy to use, its limited positioning accuracy devalues and hampers its wide applications for reliable mapping and analysis. Two conventional techniques to remove outliers in a GPS trajectory are thresholding and Kalman-based methods, which are difficult in selecting appropriate thresholds and modeling the trajectories. Moreover, they are insensitive to medium and small outliers, especially for low-sample-rate trajectories. This paper proposes a model-based GPS trajectory cleaner. Rather than examining speed and acceleration or assuming a pre-determined trajectory model, we first use cubic smooth spline to adaptively model the trend of the trajectory. The residuals, i.e., the differences between the trend and GPS measurements, are then further modeled by time series method. Outliers are detected by scoring the residuals at every GPS trajectory point. Comparing to the conventional procedures, the trend-residual dual modeling approach has the following features: (a) it is able to model trajectories and detect outliers adaptively; (b) only one critical value for outlier scores needs to be set; (c) it is able to robustly detect unapparent outliers; and (d) it is effective in cleaning outliers for GPS trajectories with low sample rates. Tests are carried out on three real-world GPS trajectories datasets. The evaluation demonstrates an average of 9.27 times better performance in outlier detection for GPS trajectories than thresholding and Kalman-based techniques. PMID:27916944

Automated Detection of Knickpoints and Knickzones Across Transient Landscapes

NASA Astrophysics Data System (ADS)

Gailleton, B.; Mudd, S. M.; Clubb, F. J.

2017-12-01

Mountainous regions are ubiquitously dissected by river channels, which transmit climate and tectonic signals to the rest of the landscape by adjusting their long profiles. Fluvial response to allogenic forcing is often expressed through the upstream propagation of steepened reaches, referred to as knickpoints or knickzones. The identification and analysis of these steepened reaches has numerous applications in geomorphology, such as modelling long-term landscape evolution, understanding controls on fluvial incision, and constraining tectonic uplift histories. Traditionally, the identification of knickpoints or knickzones from fluvial profiles requires manual selection or calibration. This process is both time-consuming and subjective, as different workers may select different steepened reaches within the profile. We propose an objective, statistically-based method to systematically pick knickpoints/knickzones on a landscape scale using an outlier-detection algorithm. Our method integrates river profiles normalised by drainage area (Chi, using the approach of Perron and Royden, 2013), then separates the chi-elevation plots into a series of transient segments using the method of Mudd et al. (2014). This method allows the systematic detection of knickpoints across a DEM, regardless of size, using a high-performance algorithm implemented in the open-source Edinburgh Land Surface Dynamics Topographic Tools (LSDTopoTools) software package. After initial knickpoint identification, outliers are selected using several sorting and binning methods based on the Median Absolute Deviation, to avoid the influence sample size. We test our method on a series of DEMs and grid resolutions, and show that our method consistently identifies accurate knickpoint locations across each landscape tested.
Query-Based Outlier Detection in Heterogeneous Information Networks.

PubMed

Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

2015-03-01

Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.
Query-Based Outlier Detection in Heterogeneous Information Networks

PubMed Central

Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

2015-01-01

Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397
ROBNCA: robust network component analysis for recovering transcription factor activities.

PubMed

Noor, Amina; Ahmad, Aitzaz; Serpedin, Erchin; Nounou, Mohamed; Nounou, Hazem

2013-10-01

Network component analysis (NCA) is an efficient method of reconstructing the transcription factor activity (TFA), which makes use of the gene expression data and prior information available about transcription factor (TF)-gene regulations. Most of the contemporary algorithms either exhibit the drawback of inconsistency and poor reliability, or suffer from prohibitive computational complexity. In addition, the existing algorithms do not possess the ability to counteract the presence of outliers in the microarray data. Hence, robust and computationally efficient algorithms are needed to enable practical applications. We propose ROBust Network Component Analysis (ROBNCA), a novel iterative algorithm that explicitly models the possible outliers in the microarray data. An attractive feature of the ROBNCA algorithm is the derivation of a closed form solution for estimating the connectivity matrix, which was not available in prior contributions. The ROBNCA algorithm is compared with FastNCA and the non-iterative NCA (NI-NCA). ROBNCA estimates the TF activity profiles as well as the TF-gene control strength matrix with a much higher degree of accuracy than FastNCA and NI-NCA, irrespective of varying noise, correlation and/or amount of outliers in case of synthetic data. The ROBNCA algorithm is also tested on Saccharomyces cerevisiae data and Escherichia coli data, and it is observed to outperform the existing algorithms. The run time of the ROBNCA algorithm is comparable with that of FastNCA, and is hundreds of times faster than NI-NCA. The ROBNCA software is available at http://people.tamu.edu/∼amina/ROBNCA
Space Object Maneuver Detection Algorithms Using TLE Data

NASA Astrophysics Data System (ADS)

Pittelkau, M.

2016-09-01

An important aspect of Space Situational Awareness (SSA) is detection of deliberate and accidental orbit changes of space objects. Although space surveillance systems detect orbit maneuvers within their tracking algorithms, maneuver data are not readily disseminated for general use. However, two-line element (TLE) data is available and can be used to detect maneuvers of space objects. This work is an attempt to improve upon existing TLE-based maneuver detection algorithms. Three adaptive maneuver detection algorithms are developed and evaluated: The first is a fading-memory Kalman filter, which is equivalent to the sliding-window least-squares polynomial fit, but computationally more efficient and adaptive to the noise in the TLE data. The second algorithm is based on a sample cumulative distribution function (CDF) computed from a histogram of the magnitude-squared |V|2 of change-in-velocity vectors (V), which is computed from the TLE data. A maneuver detection threshold is computed from the median estimated from the CDF, or from the CDF and a specified probability of false alarm. The third algorithm is a median filter. The median filter is the simplest of a class of nonlinear filters called order statistics filters, which is within the theory of robust statistics. The output of the median filter is practically insensitive to outliers, or large maneuvers. The median of the |V|2 data is proportional to the variance of the V, so the variance is estimated from the output of the median filter. A maneuver is detected when the input data exceeds a constant times the estimated variance.
Crowdtruth validation: a new paradigm for validating algorithms that rely on image correspondences.

PubMed

Maier-Hein, Lena; Kondermann, Daniel; Roß, Tobias; Mersmann, Sven; Heim, Eric; Bodenstedt, Sebastian; Kenngott, Hannes Götz; Sanchez, Alexandro; Wagner, Martin; Preukschas, Anas; Wekerle, Anna-Laura; Helfert, Stefanie; März, Keno; Mehrabi, Arianeb; Speidel, Stefanie; Stock, Christian

2015-08-01

Feature tracking and 3D surface reconstruction are key enabling techniques to computer-assisted minimally invasive surgery. One of the major bottlenecks related to training and validation of new algorithms is the lack of large amounts of annotated images that fully capture the wide range of anatomical/scene variance in clinical practice. To address this issue, we propose a novel approach to obtaining large numbers of high-quality reference image annotations at low cost in an extremely short period of time. The concept is based on outsourcing the correspondence search to a crowd of anonymous users from an online community (crowdsourcing) and comprises four stages: (1) feature detection, (2) correspondence search via crowdsourcing, (3) merging multiple annotations per feature by fitting Gaussian finite mixture models, (4) outlier removal using the result of the clustering as input for a second annotation task. On average, 10,000 annotations were obtained within 24 h at a cost of $100. The annotation of the crowd after clustering and before outlier removal was of expert quality with a median distance of about 1 pixel to a publically available reference annotation. The threshold for the outlier removal task directly determines the maximum annotation error, but also the number of points removed. Our concept is a novel and effective method for fast, low-cost and highly accurate correspondence generation that could be adapted to various other applications related to large-scale data annotation in medical image computing and computer-assisted interventions.
Genetic Algorithm for Initial Orbit Determination with Too Short Arc (Continued)

NASA Astrophysics Data System (ADS)

Li, Xin-ran; Wang, Xin

2017-04-01

When the genetic algorithm is used to solve the problem of too short-arc (TSA) orbit determination, due to the difference of computing process between the genetic algorithm and the classical method, the original method for outlier deletion is no longer applicable. In the genetic algorithm, the robust estimation is realized by introducing different loss functions for the fitness function, then the outlier problem of the TSA orbit determination is solved. Compared with the classical method, the genetic algorithm is greatly simplified by introducing in different loss functions. Through the comparison on the calculations of multiple loss functions, it is found that the least median square (LMS) estimation and least trimmed square (LTS) estimation can greatly improve the robustness of the TSA orbit determination, and have a high breakdown point.
Outlier detection for groundwater data in France

NASA Astrophysics Data System (ADS)

Valmy, Larissa; de Fouquet, Chantal; Bourgine, Bernard

2014-05-01

Quality and quantity water in France are increasingly observed since the 70s. Moreover, in 2000, the EU Water Framework Directive established a framework for community action in the water policy field for the protection of inland surface waters (rivers and lakes), transitional waters (estuaries), coastal waters and groundwater. It will ensure that all aquatic ecosystems and, with regard to their water needs, terrestrial ecosystems and wetlands meet 'good status' by 2015. The Directive requires Member States to establish river basin districts and for each of these a river basin management plan. In France, monitoring programs for the water status were implemented in each basin since 2007. The data collected through these programs feed into an information system which contributes to check the compliance of water environmental legislation implementation, assess the status of water guide management actions (programs of measures) and evaluate their effectiveness, and inform the public. Our work consists in study quality and quantity groundwater data for some basins in France. We propose a specific mathematical approach in order to detect outliers and study trends in time series. In statistic, an outlier is an observation that lies outside the overall pattern of a distribution. Usually, the presence of an outlier indicates some sort of problem, thus, it is important to detect it in order to know the cause. In fact, techniques for temporal data analysis have been developed for several decades in parallel with geostatistical methods. However compared to standard statistical methods, geostatistical analysis allows incomplete or irregular time series analysis. Otherwise, tests carried out by the BRGM showed the potential contribution of geostatistical methods for characterization of environmental data time series. Our approach is to exploit this potential through the development of specific algorithms, tests and validation of methods. We will introduce and explain our method and approach by considering the Loire Bretagne basin case.
Data Mining for Anomaly Detection

NASA Technical Reports Server (NTRS)

Biswas, Gautam; Mack, Daniel; Mylaraswamy, Dinkar; Bharadwaj, Raj

2013-01-01

The Vehicle Integrated Prognostics Reasoner (VIPR) program describes methods for enhanced diagnostics as well as a prognostic extension to current state of art Aircraft Diagnostic and Maintenance System (ADMS). VIPR introduced a new anomaly detection function for discovering previously undetected and undocumented situations, where there are clear deviations from nominal behavior. Once a baseline (nominal model of operations) is established, the detection and analysis is split between on-aircraft outlier generation and off-aircraft expert analysis to characterize and classify events that may not have been anticipated by individual system providers. Offline expert analysis is supported by data curation and data mining algorithms that can be applied in the contexts of supervised learning methods and unsupervised learning. In this report, we discuss efficient methods to implement the Kolmogorov complexity measure using compression algorithms, and run a systematic empirical analysis to determine the best compression measure. Our experiments established that the combination of the DZIP compression algorithm and CiDM distance measure provides the best results for capturing relevant properties of time series data encountered in aircraft operations. This combination was used as the basis for developing an unsupervised learning algorithm to define "nominal" flight segments using historical flight segments.
Outlier detection in a new half-circular distribution

NASA Astrophysics Data System (ADS)

Rambli, Adzhar; Mohamed, Ibrahim Bin; Shimizu, Kunio; Khalidin, Nurliza

2015-10-01

In this paper, we use a discordancy test based on spacing theory to detect outlier in a half-circular data. Up to now, numerous discordancy tests have been proposed to detect outlier in circular distributions which are defined in [0,2π). However, some circular data lie within just half of this range. Therefore, first we introduce a new half-circular distribution developed using the inverse stereographic projection technique on a gamma distributed variable. Then, we develop a new discordancy test to detect single or multiple outliers in the half-circular data based on the spacing theory. We show the practical value of the test by applying it to an eye data set obtained from a glaucoma clinic at the University of Malaya Medical Centre, Malaysia.
Outlier Detection in Hyperspectral Imagery Using Closest Distance to Center with Ellipsoidal Multivariate Trimming

DTIC Science & Technology

2011-01-01

where r << P. The use of PCA for finding outliers in multivariate data is surveyed by Gnanadesikan and Kettenring16 and Rao.17 As alluded to earlier...1984. 16. Gnanadesikan R and Kettenring JR. Robust estimates, residu als, and outlier detection with multiresponse data. Biometrics 1972; 28: 81–124
A Comprehensive review of group level model performance in the presence of heteroscedasticity: Can a single model control Type I errors in the presence of outliers?

PubMed Central

Mumford, Jeanette A.

2017-01-01

Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL’s Flame 1 and FSL’s outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL’s Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall’s Tau. Additionally, subject omission using the Cook’s Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed. PMID:28030782
First order augmentation to tensor voting for boundary inference and multiscale analysis in 3D.

PubMed

Tong, Wai-Shun; Tang, Chi-Keung; Mordohai, Philippos; Medioni, Gérard

2004-05-01

Most computer vision applications require the reliable detection of boundaries. In the presence of outliers, missing data, orientation discontinuities, and occlusion, this problem is particularly challenging. We propose to address it by complementing the tensor voting framework, which was limited to second order properties, with first order representation and voting. First order voting fields and a mechanism to vote for 3D surface and volume boundaries and curve endpoints in 3D are defined. Boundary inference is also useful for a second difficult problem in grouping, namely, automatic scale selection. We propose an algorithm that automatically infers the smallest scale that can preserve the finest details. Our algorithm then proceeds with progressively larger scales to ensure continuity where it has not been achieved. Therefore, the proposed approach does not oversmooth features or delay the handling of boundaries and discontinuities until model misfit occurs. The interaction of smooth features, boundaries, and outliers is accommodated by the unified representation, making possible the perceptual organization of data in curves, surfaces, volumes, and their boundaries simultaneously. We present results on a variety of data sets to show the efficacy of the improved formalism.
Brain tissues volume measurements from 2D MRI using parametric approach

NASA Astrophysics Data System (ADS)

L'vov, A. A.; Toropova, O. A.; Litovka, Yu. V.

2018-04-01

The purpose of the paper is to propose a fully automated method of volume assessment of structures within human brain. Our statistical approach uses maximum interdependency principle for decision making process of measurements consistency and unequal observations. Detecting outliers performed using maximum normalized residual test. We propose a statistical model which utilizes knowledge of tissues distribution in human brain and applies partial data restoration for precision improvement. The approach proposes completed computationally efficient and independent from segmentation algorithm used in the application.
Discovering Structural Regularity in 3D Geometry

PubMed Central

Pauly, Mark; Mitra, Niloy J.; Wallner, Johannes; Pottmann, Helmut; Guibas, Leonidas J.

2010-01-01

We introduce a computational framework for discovering regular or repeated geometric structures in 3D shapes. We describe and classify possible regular structures and present an effective algorithm for detecting such repeated geometric patterns in point- or mesh-based models. Our method assumes no prior knowledge of the geometry or spatial location of the individual elements that define the pattern. Structure discovery is made possible by a careful analysis of pairwise similarity transformations that reveals prominent lattice structures in a suitable model of transformation space. We introduce an optimization method for detecting such uniform grids specifically designed to deal with outliers and missing elements. This yields a robust algorithm that successfully discovers complex regular structures amidst clutter, noise, and missing geometry. The accuracy of the extracted generating transformations is further improved using a novel simultaneous registration method in the spatial domain. We demonstrate the effectiveness of our algorithm on a variety of examples and show applications to compression, model repair, and geometry synthesis. PMID:21170292
Stacked Autoencoders for Outlier Detection in Over-the-Horizon Radar Signals

PubMed Central

Protopapadakis, Eftychios; Doulamis, Anastasios; Doulamis, Nikolaos; Dres, Dimitrios; Bimpas, Matthaios

2017-01-01

Detection of outliers in radar signals is a considerable challenge in maritime surveillance applications. High-Frequency Surface-Wave (HFSW) radars have attracted significant interest as potential tools for long-range target identification and outlier detection at over-the-horizon (OTH) distances. However, a number of disadvantages, such as their low spatial resolution and presence of clutter, have a negative impact on their accuracy. In this paper, we explore the applicability of deep learning techniques for detecting deviations from the norm in behavioral patterns of vessels (outliers) as they are tracked from an OTH radar. The proposed methodology exploits the nonlinear mapping capabilities of deep stacked autoencoders in combination with density-based clustering. A comparative experimental evaluation of the approach shows promising results in terms of the proposed methodology's performance. PMID:29312449
Penalized unsupervised learning with outliers

PubMed Central

Witten, Daniela M.

2013-01-01

We consider the problem of performing unsupervised learning in the presence of outliers – that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an “error” term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations’ errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored. PMID:23875057
Global Bathymetry: Machine Learning for Data Editing

NASA Astrophysics Data System (ADS)

Sandwell, D. T.; Tea, B.; Freund, Y.

2017-12-01

The accuracy of global bathymetry depends primarily on the coverage and accuracy of the sounding data and secondarily on the depth predicted from gravity. A main focus of our research is to add newly-available data to the global compilation. Most data sources have 1-12% of erroneous soundings caused by a wide array of blunders and measurement errors. Over the years we have hand-edited this data using undergraduate employees at UCSD (440 million soundings at 500 m resolution). We are developing a machine learning approach to refine the flagging of the older soundings and provide automated editing of newly-acquired soundings. The approach has three main steps: 1) Combine the sounding data with additional information that may inform the machine learning algorithm. The additional parameters include: depth predicted from gravity; distance to the nearest sounding from other cruises; seafloor age; spreading rate; sediment thickness; and vertical gravity gradient. 2) Use available edit decisions as training data sets for a boosted tree algorithm with a binary logistic objective function and L2 regularization. Initial results with poor quality single beam soundings show that the automated algorithm matches the hand-edited data 89% of the time. The results show that most of the information for detecting outliers comes from predicted depth with secondary contributions from distance to the nearest sounding and longitude. A similar analysis using very high quality multibeam data shows that the automated algorithm matches the hand-edited data 93% of the time. Again, most of the information for detecting outliers comes from predicted depth secondary contributions from distance to the nearest sounding and longitude. 3) The third step in the process is to use the machine learning parameters, derived from the training data, to edit 12 million newly acquired single beam sounding data provided by the National Geospatial-Intelligence Agency. The output of the learning algorithm will be confidence ratedindicating which edits the algorithm is confident on and which it is not confident. We expect the majority ( 90%) of edits to be confident and not require human intervention. Human intervention will be required only on the 10% unconfident decisions, thus reducing the amount of human work by a factor of 10 or more.
Slowing ash mortality: a potential strategy to slam emerald ash borer in outlier sites

Treesearch

Deborah G. McCullough; Nathan W. Siegert; John Bedford

2009-01-01

Several isolated outlier populations of emerald ash borer (Agrilus planipennis Fairmaire) were discovered in 2008 and additional outliers will likely be found as detection surveys and public outreach activities...
Identification of Outliers in Grace Data for Indo-Gangetic Plain Using Various Methods (Z-Score, Modified Z-score and Adjusted Boxplot) and Its Removal

NASA Astrophysics Data System (ADS)

Srivastava, S.

2015-12-01

Gravity Recovery and Climate Experiment (GRACE) data are widely used for the hydrological studies for large scale basins (≥100,000 sq km). GRACE data (Stokes Coefficients or Equivalent Water Height) used for hydrological studies are not direct observations but result from high level processing of raw data from the GRACE mission. Different partner agencies like CSR, GFZ and JPL implement their own methodology and their processing methods are independent from each other. The primary source of errors in GRACE data are due to measurement and modeling errors and the processing strategy of these agencies. Because of different processing methods, the final data from all the partner agencies are inconsistent with each other at some epoch. GRACE data provide spatio-temporal variations in Earth's gravity which is mainly attributed to the seasonal fluctuations in water level on Earth surfaces and subsurface. During the quantification of error/uncertainties, several high positive and negative peaks were observed which do not correspond to any hydrological processes but may emanate from a combination of primary error sources, or some other geophysical processes (e.g. Earthquakes, landslide, etc.) resulting in redistribution of earth's mass. Such peaks can be considered as outliers for hydrological studies. In this work, an algorithm has been designed to extract outliers from the GRACE data for Indo-Gangetic plain, which considers the seasonal variations and the trend in data. Different outlier detection methods have been used such as Z-score, modified Z-score and adjusted boxplot. For verification, assimilated hydrological (GLDAS) and hydro-meteorological data are used as the reference. The results have shown that the consistency amongst all data sets improved significantly after the removal of outliers.

Unsupervised Sequential Outlier Detection With Deep Architectures.

PubMed

Lu, Weining; Cheng, Yu; Xiao, Cao; Chang, Shiyu; Huang, Shuai; Liang, Bin; Huang, Thomas

2017-09-01

Unsupervised outlier detection is a vital task and has high impact on a wide variety of applications domains, such as image analysis and video surveillance. It also gains long-standing attentions and has been extensively studied in multiple research areas. Detecting and taking action on outliers as quickly as possible are imperative in order to protect network and related stakeholders or to maintain the reliability of critical systems. However, outlier detection is difficult due to the one class nature and challenges in feature construction. Sequential anomaly detection is even harder with more challenges from temporal correlation in data, as well as the presence of noise and high dimensionality. In this paper, we introduce a novel deep structured framework to solve the challenging sequential outlier detection problem. We use autoencoder models to capture the intrinsic difference between outliers and normal instances and integrate the models to recurrent neural networks that allow the learning to make use of previous context as well as make the learners more robust to warp along the time axis. Furthermore, we propose to use a layerwise training procedure, which significantly simplifies the training procedure and hence helps achieve efficient and scalable training. In addition, we investigate a fine-tuning step to update all parameters set by incorporating the temporal correlation in the sequence. We further apply our proposed models to conduct systematic experiments on five real-world benchmark data sets. Experimental results demonstrate the effectiveness of our model, compared with other state-of-the-art approaches.
Using Innovative Outliers to Detect Discrete Shifts in Dynamics in Group-Based State-Space Models

ERIC Educational Resources Information Center

Chow, Sy-Miin; Hamaker, Ellen L.; Allaire, Jason C.

2009-01-01

Outliers are typically regarded as data anomalies that should be discarded. However, dynamic or "innovative" outliers can be appropriately utilized to capture unusual but substantively meaningful shifts in a system's dynamics. We extend De Jong and Penzer's 1998 approach for representing outliers in single-subject state-space models to a…
Pose estimation for augmented reality applications using genetic algorithm.

PubMed

Yu, Ying Kin; Wong, Kin Hong; Chang, Michael Ming Yuen

2005-12-01

This paper describes a genetic algorithm that tackles the pose-estimation problem in computer vision. Our genetic algorithm can find the rotation and translation of an object accurately when the three-dimensional structure of the object is given. In our implementation, each chromosome encodes both the pose and the indexes to the selected point features of the object. Instead of only searching for the pose as in the existing work, our algorithm, at the same time, searches for a set containing the most reliable feature points in the process. This mismatch filtering strategy successfully makes the algorithm more robust under the presence of point mismatches and outliers in the images. Our algorithm has been tested with both synthetic and real data with good results. The accuracy of the recovered pose is compared to the existing algorithms. Our approach outperformed the Lowe's method and the other two genetic algorithms under the presence of point mismatches and outliers. In addition, it has been used to estimate the pose of a real object. It is shown that the proposed method is applicable to augmented reality applications.
A tandem regression-outlier analysis of a ligand cellular system for key structural modifications around ligand binding.

PubMed

Lin, Ying-Ting

2013-04-30

A tandem technique of hard equipment is often used for the chemical analysis of a single cell to first isolate and then detect the wanted identities. The first part is the separation of wanted chemicals from the bulk of a cell; the second part is the actual detection of the important identities. To identify the key structural modifications around ligand binding, the present study aims to develop a counterpart of tandem technique for cheminformatics. A statistical regression and its outliers act as a computational technique for separation. A PPARγ (peroxisome proliferator-activated receptor gamma) agonist cellular system was subjected to such an investigation. Results show that this tandem regression-outlier analysis, or the prioritization of the context equations tagged with features of the outliers, is an effective regression technique of cheminformatics to detect key structural modifications, as well as their tendency of impact to ligand binding. The key structural modifications around ligand binding are effectively extracted or characterized out of cellular reactions. This is because molecular binding is the paramount factor in such ligand cellular system and key structural modifications around ligand binding are expected to create outliers. Therefore, such outliers can be captured by this tandem regression-outlier analysis.
[Research on outlier detection methods for determination of oil yield in oil shales using near-infrared spectroscopy].

PubMed

Zhang, Huai-zhu; Lin, Jun; Zhang, Huai-Zhu

2014-06-01

In the present paper, the outlier detection methods for determination of oil yield in oil shale using near-infrared (NIR) diffuse reflection spectroscopy was studied. During the quantitative analysis with near-infrared spectroscopy, environmental change and operator error will both produce outliers. The presence of outliers will affect the overall distribution trend of samples and lead to the decrease in predictive capability. Thus, the detection of outliers are important for the construction of high-quality calibration models. The methods including principal component analysis-Mahalanobis distance (PCA-MD) and resampling by half-means (RHM) were applied to the discrimination and elimination of outliers in this work. The thresholds and confidences for MD and RHM were optimized using the performance of partial least squares (PLS) models constructed after the elimination of outliers, respectively. Compared with the model constructed with the data of full spectrum, the values of RMSEP of the models constructed with the application of PCA-MD with a threshold of a value equal to the sum of average and standard deviation of MD, RHM with the confidence level of 85%, and the combination of PCA-MD and RHM, were reduced by 48.3%, 27.5% and 44.8%, respectively. The predictive ability of the calibration model has been improved effectively.
A post-processing algorithm for time domain pitch trackers

NASA Astrophysics Data System (ADS)

Specker, P.

1983-01-01

This paper describes a powerful post-processing algorithm for time-domain pitch trackers. On two successive passes, the post-processing algorithm eliminates errors produced during a first pass by a time-domain pitch tracker. During the second pass, incorrect pitch values are detected as outliers by computing the distribution of values over a sliding 80 msec window. During the third pass (based on artificial intelligence techniques), remaining pitch pulses are used as anchor points to reconstruct the pitch train from the original waveform. The algorithm produced a decrease in the error rate from 21% obtained with the original time domain pitch tracker to 2% for isolated words and sentences produced in an office environment by 3 male and 3 female talkers. In a noisy computer room errors decreased from 52% to 2.9% for the same stimuli produced by 2 male talkers. The algorithm is efficient, accurate, and resistant to noise. The fundamental frequency micro-structure is tracked sufficiently well to be used in extracting phonetic features in a feature-based recognition system.
Near-real time 3D probabilistic earthquakes locations at Mt. Etna volcano

NASA Astrophysics Data System (ADS)

Barberi, G.; D'Agostino, M.; Mostaccio, A.; Patane', D.; Tuve', T.

2012-04-01

Automatic procedure for locating earthquake in quasi-real time must provide a good estimation of earthquakes location within a few seconds after the event is first detected and is strongly needed for seismic warning system. The reliability of an automatic location algorithm is influenced by several factors such as errors in picking seismic phases, network geometry, and velocity model uncertainties. On Mt. Etna, the seismic network is managed by INGV and the quasi-real time earthquakes locations are performed by using an automatic-picking algorithm based on short-term-average to long-term-average ratios (STA/LTA) calculated from an approximate squared envelope function of the seismogram, which furnish a list of P-wave arrival times, and the location algorithm Hypoellipse, with a 1D velocity model. The main purpose of this work is to investigate the performances of a different automatic procedure to improve the quasi-real time earthquakes locations. In fact, as the automatic data processing may be affected by outliers (wrong picks), the use of a traditional earthquake location techniques based on a least-square misfit function (L2-norm) often yield unstable and unreliable solutions. Moreover, on Mt. Etna, the 1D model is often unable to represent the complex structure of the volcano (in particular the strong lateral heterogeneities), whereas the increasing accuracy in the 3D velocity models at Mt. Etna during recent years allows their use today in routine earthquake locations. Therefore, we selected, as reference locations, all the events occurred on Mt. Etna in the last year (2011) which was automatically detected and located by means of the Hypoellipse code. By using this dataset (more than 300 events), we applied a nonlinear probabilistic earthquake location algorithm using the Equal Differential Time (EDT) likelihood function, (Font et al., 2004; Lomax, 2005) which is much more robust in the presence of outliers in the data. Successively, by using a probabilistic non linear method (NonLinLoc, Lomax, 2001) and the 3D velocity model, derived from the one developed by Patanè et al. (2006) integrated with that obtained by Chiarabba et al. (2004), we obtained the best possible constraint on the location of the focii expressed as a probability density function (PDF) for the hypocenter location in 3D space. As expected, the obtained results, compared with the reference ones, show that the NonLinLoc software (applied to a 3D velocity model) is more reliable than the Hypoellipse code (applied to layered 1D velocity models), leading to more reliable automatic locations also when outliers are present.
Cloud Screening and Quality Control Algorithm for Star Photometer Data: Assessment with Lidar Measurements and with All-sky Images

NASA Technical Reports Server (NTRS)

Ramirez, Daniel Perez; Lyamani, H.; Olmo, F. J.; Whiteman, D. N.; Navas-Guzman, F.; Alados-Arboledas, L.

2012-01-01

This paper presents the development and set up of a cloud screening and data quality control algorithm for a star photometer based on CCD camera as detector. These algorithms are necessary for passive remote sensing techniques to retrieve the columnar aerosol optical depth, delta Ae(lambda), and precipitable water vapor content, W, at nighttime. This cloud screening procedure consists of calculating moving averages of delta Ae() and W under different time-windows combined with a procedure for detecting outliers. Additionally, to avoid undesirable Ae(lambda) and W fluctuations caused by the atmospheric turbulence, the data are averaged on 30 min. The algorithm is applied to the star photometer deployed in the city of Granada (37.16 N, 3.60 W, 680 ma.s.l.; South-East of Spain) for the measurements acquired between March 2007 and September 2009. The algorithm is evaluated with correlative measurements registered by a lidar system and also with all-sky images obtained at the sunset and sunrise of the previous and following days. Promising results are obtained detecting cloud-affected data. Additionally, the cloud screening algorithm has been evaluated under different aerosol conditions including Saharan dust intrusion, biomass burning and pollution events.
SU-E-J-85: Leave-One-Out Perturbation (LOOP) Fitting Algorithm for Absolute Dose Film Calibration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chu, A; Ahmad, M; Chen, Z

2014-06-01

Purpose: To introduce an outliers-recognition fitting routine for film dosimetry. It cannot only be flexible with any linear and non-linear regression but also can provide information for the minimal number of sampling points, critical sampling distributions and evaluating analytical functions for absolute film-dose calibration. Methods: The technique, leave-one-out (LOO) cross validation, is often used for statistical analyses on model performance. We used LOO analyses with perturbed bootstrap fitting called leave-one-out perturbation (LOOP) for film-dose calibration . Given a threshold, the LOO process detects unfit points (“outliers”) compared to other cohorts, and a bootstrap fitting process follows to seek any possibilitiesmore » of using perturbations for further improvement. After that outliers were reconfirmed by a traditional t-test statistics and eliminated, then another LOOP feedback resulted in the final. An over-sampled film-dose- calibration dataset was collected as a reference (dose range: 0-800cGy), and various simulated conditions for outliers and sampling distributions were derived from the reference. Comparisons over the various conditions were made, and the performance of fitting functions, polynomial and rational functions, were evaluated. Results: (1) LOOP can prove its sensitive outlier-recognition by its statistical correlation to an exceptional better goodness-of-fit as outliers being left-out. (2) With sufficient statistical information, the LOOP can correct outliers under some low-sampling conditions that other “robust fits”, e.g. Least Absolute Residuals, cannot. (3) Complete cross-validated analyses of LOOP indicate that the function of rational type demonstrates a much superior performance compared to the polynomial. Even with 5 data points including one outlier, using LOOP with rational function can restore more than a 95% value back to its reference values, while the polynomial fitting completely failed under the same conditions. Conclusion: LOOP can cooperate with any fitting routine functioning as a “robust fit”. In addition, it can be set as a benchmark for film-dose calibration fitting performance.« less
Method for outlier detection: a tool to assess the consistency between laboratory data and ultraviolet-visible absorbance spectra in wastewater samples.

PubMed

Zamora, D; Torres, A

2014-01-01

Reliable estimations of the evolution of water quality parameters by using in situ technologies make it possible to follow the operation of a wastewater treatment plant (WWTP), as well as improving the understanding and control of the operation, especially in the detection of disturbances. However, ultraviolet (UV)-Vis sensors have to be calibrated by means of a local fingerprint laboratory reference concentration-value data-set. The detection of outliers in these data-sets is therefore important. This paper presents a method for detecting outliers in UV-Vis absorbances coupled to water quality reference laboratory concentrations for samples used for calibration purposes. Application to samples from the influent of the San Fernando WWTP (Medellín, Colombia) is shown. After the removal of outliers, improvements in the predictability of the influent concentrations using absorbance spectra were found.
Hybrid online sensor error detection and functional redundancy for systems with time-varying parameters.

PubMed

Feng, Jianyuan; Turksoy, Kamuran; Samadi, Sediqeh; Hajizadeh, Iman; Littlejohn, Elizabeth; Cinar, Ali

2017-12-01

Supervision and control systems rely on signals from sensors to receive information to monitor the operation of a system and adjust manipulated variables to achieve the control objective. However, sensor performance is often limited by their working conditions and sensors may also be subjected to interference by other devices. Many different types of sensor errors such as outliers, missing values, drifts and corruption with noise may occur during process operation. A hybrid online sensor error detection and functional redundancy system is developed to detect errors in online signals, and replace erroneous or missing values detected with model-based estimates. The proposed hybrid system relies on two techniques, an outlier-robust Kalman filter (ORKF) and a locally-weighted partial least squares (LW-PLS) regression model, which leverage the advantages of automatic measurement error elimination with ORKF and data-driven prediction with LW-PLS. The system includes a nominal angle analysis (NAA) method to distinguish between signal faults and large changes in sensor values caused by real dynamic changes in process operation. The performance of the system is illustrated with clinical data continuous glucose monitoring (CGM) sensors from people with type 1 diabetes. More than 50,000 CGM sensor errors were added to original CGM signals from 25 clinical experiments, then the performance of error detection and functional redundancy algorithms were analyzed. The results indicate that the proposed system can successfully detect most of the erroneous signals and substitute them with reasonable estimated values computed by functional redundancy system.
TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees.

PubMed

Mai, Uyen; Mirarab, Siavash

2018-05-08

Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .
An online outlier identification and removal scheme for improving fault detection performance.

PubMed

Ferdowsi, Hasan; Jagannathan, Sarangapani; Zawodniok, Maciej

2014-05-01

Measured data or states for a nonlinear dynamic system is usually contaminated by outliers. Identifying and removing outliers will make the data (or system states) more trustworthy and reliable since outliers in the measured data (or states) can cause missed or false alarms during fault diagnosis. In addition, faults can make the system states nonstationary needing a novel analytical model-based fault detection (FD) framework. In this paper, an online outlier identification and removal (OIR) scheme is proposed for a nonlinear dynamic system. Since the dynamics of the system can experience unknown changes due to faults, traditional observer-based techniques cannot be used to remove the outliers. The OIR scheme uses a neural network (NN) to estimate the actual system states from measured system states involving outliers. With this method, the outlier detection is performed online at each time instant by finding the difference between the estimated and the measured states and comparing its median with its standard deviation over a moving time window. The NN weight update law in OIR is designed such that the detected outliers will have no effect on the state estimation, which is subsequently used for model-based fault diagnosis. In addition, since the OIR estimator cannot distinguish between the faulty or healthy operating conditions, a separate model-based observer is designed for fault diagnosis, which uses the OIR scheme as a preprocessing unit to improve the FD performance. The stability analysis of both OIR and fault diagnosis schemes are introduced. Finally, a three-tank benchmarking system and a simple linear system are used to verify the proposed scheme in simulations, and then the scheme is applied on an axial piston pump testbed. The scheme can be applied to nonlinear systems whose dynamics and underlying distribution of states are subjected to change due to both unknown faults and operating conditions.
An Adaptive Buddy Check for Observational Quality Control

NASA Technical Reports Server (NTRS)

Dee, Dick P.; Rukhovets, Leonid; Todling, Ricardo; DaSilva, Arlindo M.; Larson, Jay W.; Einaudi, Franco (Technical Monitor)

2000-01-01

An adaptive buddy check algorithm is presented that adjusts tolerances for outlier observations based on the variability of surrounding data. The algorithm derives from a statistical hypothesis test combined with maximum-likelihood covariance estimation. Its stability is shown to depend on the initial identification of outliers by a simple background check. The adaptive feature ensures that the final quality control decisions are not very sensitive to prescribed statistics of first-guess and observation errors, nor on other approximations introduced into the algorithm. The implementation of the algorithm in a global atmospheric data assimilation is described. Its performance is contrasted with that of a non-adaptive buddy check, for the surface analysis of an extreme storm that took place in Europe on 27 December 1999. The adaptive algorithm allowed the inclusion of many important observations that differed greatly from the first guess and that would have been excluded on the basis of prescribed statistics. The analysis of the storm development was much improved as a result of these additional observations.
Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate

PubMed Central

Motulsky, Harvey J; Brown, Ronald E

2006-01-01

Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949
Fast clustering using adaptive density peak detection.

PubMed

Wang, Xiao-Feng; Xu, Yifan

2017-12-01

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
[Outlier sample discriminating methods for building calibration model in melons quality detecting using NIR spectra].

PubMed

Tian, Hai-Qing; Wang, Chun-Guang; Zhang, Hai-Jun; Yu, Zhi-Hong; Li, Jian-Kang

2012-11-01

Outlier samples strongly influence the precision of the calibration model in soluble solids content measurement of melons using NIR Spectra. According to the possible sources of outlier samples, three methods (predicted concentration residual test; Chauvenet test; leverage and studentized residual test) were used to discriminate these outliers respectively. Nine suspicious outliers were detected from calibration set which including 85 fruit samples. Considering the 9 suspicious outlier samples maybe contain some no-outlier samples, they were reclaimed to the model one by one to see whether they influence the model and prediction precision or not. In this way, 5 samples which were helpful to the model joined in calibration set again, and a new model was developed with the correlation coefficient (r) 0. 889 and root mean square errors for calibration (RMSEC) 0.6010 Brix. For 35 unknown samples, the root mean square errors prediction (RMSEP) was 0.854 degrees Brix. The performance of this model was more better than that developed with non outlier was eliminated from calibration set (r = 0.797, RMSEC= 0.849 degrees Brix, RMSEP = 1.19 degrees Brix), and more representative and stable with all 9 samples were eliminated from calibration set (r = 0.892, RMSEC = 0.605 degrees Brix, RMSEP = 0.862 degrees).
Standard and Robust Methods in Regression Imputation

ERIC Educational Resources Information Center

Moraveji, Behjat; Jafarian, Koorosh

2014-01-01

The aim of this paper is to provide an introduction of new imputation algorithms for estimating missing values from official statistics in larger data sets of data pre-processing, or outliers. The goal is to propose a new algorithm called IRMI (iterative robust model-based imputation). This algorithm is able to deal with all challenges like…
Real Time Search Algorithm for Observation Outliers During Monitoring Engineering Constructions

NASA Astrophysics Data System (ADS)

Latos, Dorota; Kolanowski, Bogdan; Pachelski, Wojciech; Sołoducha, Ryszard

2017-12-01

Real time monitoring of engineering structures in case of an emergency of disaster requires collection of a large amount of data to be processed by specific analytical techniques. A quick and accurate assessment of the state of the object is crucial for a probable rescue action. One of the more significant evaluation methods of large sets of data, either collected during a specified interval of time or permanently, is the time series analysis. In this paper presented is a search algorithm for those time series elements which deviate from their values expected during monitoring. Quick and proper detection of observations indicating anomalous behavior of the structure allows to take a variety of preventive actions. In the algorithm, the mathematical formulae used provide maximal sensitivity to detect even minimal changes in the object's behavior. The sensitivity analyses were conducted for the algorithm of moving average as well as for the Douglas-Peucker algorithm used in generalization of linear objects in GIS. In addition to determining the size of deviations from the average it was used the so-called Hausdorff distance. The carried out simulation and verification of laboratory survey data showed that the approach provides sufficient sensitivity for automatic real time analysis of large amount of data obtained from different and various sensors (total stations, leveling, camera, radar).
Enhancement of partial robust M-regression (PRM) performance using Bisquare weight function

NASA Astrophysics Data System (ADS)

Mohamad, Mazni; Ramli, Norazan Mohamed; Ghani@Mamat, Nor Azura Md; Ahmad, Sanizah

2014-09-01

Partial Least Squares (PLS) regression is a popular regression technique for handling multicollinearity in low and high dimensional data which fits a linear relationship between sets of explanatory and response variables. Several robust PLS methods are proposed to accommodate the classical PLS algorithms which are easily affected with the presence of outliers. The recent one was called partial robust M-regression (PRM). Unfortunately, the use of monotonous weighting function in the PRM algorithm fails to assign appropriate and proper weights to large outliers according to their severity. Thus, in this paper, a modified partial robust M-regression is introduced to enhance the performance of the original PRM. A re-descending weight function, known as Bisquare weight function is recommended to replace the fair function in the PRM. A simulation study is done to assess the performance of the modified PRM and its efficiency is also tested in both contaminated and uncontaminated simulated data under various percentages of outliers, sample sizes and number of predictors.

A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

PubMed Central

Ying Wah, Teh

2014-01-01

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A fast density-based clustering algorithm for real-time Internet of Things stream.

PubMed

Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

2014-01-01

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
Adaptive vector validation in image velocimetry to minimise the influence of outlier clusters

NASA Astrophysics Data System (ADS)

Masullo, Alessandro; Theunissen, Raf

2016-03-01

The universal outlier detection scheme (Westerweel and Scarano in Exp Fluids 39:1096-1100, 2005) and the distance-weighted universal outlier detection scheme for unstructured data (Duncan et al. in Meas Sci Technol 21:057002, 2010) are the most common PIV data validation routines. However, such techniques rely on a spatial comparison of each vector with those in a fixed-size neighbourhood and their performance subsequently suffers in the presence of clusters of outliers. This paper proposes an advancement to render outlier detection more robust while reducing the probability of mistakenly invalidating correct vectors. Velocity fields undergo a preliminary evaluation in terms of local coherency, which parametrises the extent of the neighbourhood with which each vector will be compared subsequently. Such adaptivity is shown to reduce the number of undetected outliers, even when implemented in the afore validation schemes. In addition, the authors present an alternative residual definition considering vector magnitude and angle adopting a modified Gaussian-weighted distance-based averaging median. This procedure is able to adapt the degree of acceptable background fluctuations in velocity to the local displacement magnitude. The traditional, extended and recommended validation methods are numerically assessed on the basis of flow fields from an isolated vortex, a turbulent channel flow and a DNS simulation of forced isotropic turbulence. The resulting validation method is adaptive, requires no user-defined parameters and is demonstrated to yield the best performances in terms of outlier under- and over-detection. Finally, the novel validation routine is applied to the PIV analysis of experimental studies focused on the near wake behind a porous disc and on a supersonic jet, illustrating the potential gains in spatial resolution and accuracy.
Evaluation of the MV (CAPON) Coherent Doppler Lidar Velocity Estimator

NASA Technical Reports Server (NTRS)

Lottman, B.; Frehlich, R.

1997-01-01

The performance of the CAPON velocity estimator for coherent Doppler lidar is determined for typical space-based and ground-based parameter regimes. Optimal input parameters for the algorithm were determined for each regime. For weak signals, performance is described by the standard deviation of the good estimates and the fraction of outliers. For strong signals, the fraction of outliers is zero. Numerical effort was also determined.
Scalable Robust Principal Component Analysis Using Grassmann Averages.

PubMed

Hauberg, Sren; Feragen, Aasa; Enficiaud, Raffi; Black, Michael J

2016-11-01

In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average ( GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average ( TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.
DigOut: viewing differential expression genes as outliers.

PubMed

Yu, Hui; Tu, Kang; Xie, Lu; Li, Yuan-Yuan

2010-12-01

With regards to well-replicated two-conditional microarray datasets, the selection of differentially expressed (DE) genes is a well-studied computational topic, but for multi-conditional microarray datasets with limited or no replication, the same task is not properly addressed by previous studies. This paper adopts multivariate outlier analysis to analyze replication-lacking multi-conditional microarray datasets, finding that it performs significantly better than the widely used limit fold change (LFC) model in a simulated comparative experiment. Compared with the LFC model, the multivariate outlier analysis also demonstrates improved stability against sample variations in a series of manipulated real expression datasets. The reanalysis of a real non-replicated multi-conditional expression dataset series leads to satisfactory results. In conclusion, a multivariate outlier analysis algorithm, like DigOut, is particularly useful for selecting DE genes from non-replicated multi-conditional gene expression dataset.
A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

NASA Astrophysics Data System (ADS)

Radhakrishnan, Regunathan; Divakaran, Ajay; Xiong, Ziyou; Otsuka, Isao

2006-12-01

We propose a content-adaptive analysis and representation framework to discover events using audio features from "unscripted" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that "interesting" events in unscripted multimedia occur sparsely in a background of usual or "uninteresting" events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select "key audio classes" that are indicative of events of interest in the chosen domain.
Adaptive Kalman filter for indoor localization using Bluetooth Low Energy and inertial measurement unit.

PubMed

Yoon, Paul K; Zihajehzadeh, Shaghayegh; Bong-Soo Kang; Park, Edward J

2015-08-01

This paper proposes a novel indoor localization method using the Bluetooth Low Energy (BLE) and an inertial measurement unit (IMU). The multipath and non-line-of-sight errors from low-power wireless localization systems commonly result in outliers, affecting the positioning accuracy. We address this problem by adaptively weighting the estimates from the IMU and BLE in our proposed cascaded Kalman filter (KF). The positioning accuracy is further improved with the Rauch-Tung-Striebel smoother. The performance of the proposed algorithm is compared against that of the standard KF experimentally. The results show that the proposed algorithm can maintain high accuracy for position tracking the sensor in the presence of the outliers.
MRI Brain Tumor Segmentation and Necrosis Detection Using Adaptive Sobolev Snakes.

PubMed

Nakhmani, Arie; Kikinis, Ron; Tannenbaum, Allen

2014-03-21

Brain tumor segmentation in brain MRI volumes is used in neurosurgical planning and illness staging. It is important to explore the tumor shape and necrosis regions at different points of time to evaluate the disease progression. We propose an algorithm for semi-automatic tumor segmentation and necrosis detection. Our algorithm consists of three parts: conversion of MRI volume to a probability space based on the on-line learned model, tumor probability density estimation, and adaptive segmentation in the probability space. We use manually selected acceptance and rejection classes on a single MRI slice to learn the background and foreground statistical models. Then, we propagate this model to all MRI slices to compute the most probable regions of the tumor. Anisotropic 3D diffusion is used to estimate the probability density. Finally, the estimated density is segmented by the Sobolev active contour (snake) algorithm to select smoothed regions of the maximum tumor probability. The segmentation approach is robust to noise and not very sensitive to the manual initialization in the volumes tested. Also, it is appropriate for low contrast imagery. The irregular necrosis regions are detected by using the outliers of the probability distribution inside the segmented region. The necrosis regions of small width are removed due to a high probability of noisy measurements. The MRI volume segmentation results obtained by our algorithm are very similar to expert manual segmentation.
MRI brain tumor segmentation and necrosis detection using adaptive Sobolev snakes

NASA Astrophysics Data System (ADS)

Nakhmani, Arie; Kikinis, Ron; Tannenbaum, Allen

2014-03-01

Brain tumor segmentation in brain MRI volumes is used in neurosurgical planning and illness staging. It is important to explore the tumor shape and necrosis regions at di erent points of time to evaluate the disease progression. We propose an algorithm for semi-automatic tumor segmentation and necrosis detection. Our algorithm consists of three parts: conversion of MRI volume to a probability space based on the on-line learned model, tumor probability density estimation, and adaptive segmentation in the probability space. We use manually selected acceptance and rejection classes on a single MRI slice to learn the background and foreground statistical models. Then, we propagate this model to all MRI slices to compute the most probable regions of the tumor. Anisotropic 3D di usion is used to estimate the probability density. Finally, the estimated density is segmented by the Sobolev active contour (snake) algorithm to select smoothed regions of the maximum tumor probability. The segmentation approach is robust to noise and not very sensitive to the manual initialization in the volumes tested. Also, it is appropriate for low contrast imagery. The irregular necrosis regions are detected by using the outliers of the probability distribution inside the segmented region. The necrosis regions of small width are removed due to a high probability of noisy measurements. The MRI volume segmentation results obtained by our algorithm are very similar to expert manual segmentation.
Context-specific selection of algorithms for recursive feature tracking in endoscopic image using a new methodology.

PubMed

Selka, F; Nicolau, S; Agnus, V; Bessaid, A; Marescaux, J; Soler, L

2015-03-01

In minimally invasive surgery, the tracking of deformable tissue is a critical component for image-guided applications. Deformation of the tissue can be recovered by tracking features using tissue surface information (texture, color,...). Recent work in this field has shown success in acquiring tissue motion. However, the performance evaluation of detection and tracking algorithms on such images are still difficult and are not standardized. This is mainly due to the lack of ground truth data on real data. Moreover, in order to avoid supplementary techniques to remove outliers, no quantitative work has been undertaken to evaluate the benefit of a pre-process based on image filtering, which can improve feature tracking robustness. In this paper, we propose a methodology to validate detection and feature tracking algorithms, using a trick based on forward-backward tracking that provides an artificial ground truth data. We describe a clear and complete methodology to evaluate and compare different detection and tracking algorithms. In addition, we extend our framework to propose a strategy to identify the best combinations from a set of detector, tracker and pre-process algorithms, according to the live intra-operative data. Experimental results have been performed on in vivo datasets and show that pre-process can have a strong influence on tracking performance and that our strategy to find the best combinations is relevant for a reasonable computation cost. Copyright © 2014 Elsevier Ltd. All rights reserved.
Digital Terrain from a Two-Step Segmentation and Outlier-Based Algorithm

NASA Astrophysics Data System (ADS)

Hingee, Kassel; Caccetta, Peter; Caccetta, Louis; Wu, Xiaoliang; Devereaux, Drew

2016-06-01

We present a novel ground filter for remotely sensed height data. Our filter has two phases: the first phase segments the DSM with a slope threshold and uses gradient direction to identify candidate ground segments; the second phase fits surfaces to the candidate ground points and removes outliers. Digital terrain is obtained by a surface fit to the final set of ground points. We tested the new algorithm on digital surface models (DSMs) for a 9600km2 region around Perth, Australia. This region contains a large mix of land uses (urban, grassland, native forest and plantation forest) and includes both a sandy coastal plain and a hillier region (elevations up to 0.5km). The DSMs are captured annually at 0.2m resolution using aerial stereo photography, resulting in 1.2TB of input data per annum. Overall accuracy of the filter was estimated to be 89.6% and on a small semi-rural subset our algorithm was found to have 40% fewer errors compared to Inpho's Match-T algorithm.
On robust parameter estimation in brain-computer interfacing

NASA Astrophysics Data System (ADS)

Samek, Wojciech; Nakajima, Shinichi; Kawanabe, Motoaki; Müller, Klaus-Robert

2017-12-01

Objective. The reliable estimation of parameters such as mean or covariance matrix from noisy and high-dimensional observations is a prerequisite for successful application of signal processing and machine learning algorithms in brain-computer interfacing (BCI). This challenging task becomes significantly more difficult if the data set contains outliers, e.g. due to subject movements, eye blinks or loose electrodes, as they may heavily bias the estimation and the subsequent statistical analysis. Although various robust estimators have been developed to tackle the outlier problem, they ignore important structural information in the data and thus may not be optimal. Typical structural elements in BCI data are the trials consisting of a few hundred EEG samples and indicating the start and end of a task. Approach. This work discusses the parameter estimation problem in BCI and introduces a novel hierarchical view on robustness which naturally comprises different types of outlierness occurring in structured data. Furthermore, the class of minimum divergence estimators is reviewed and a robust mean and covariance estimator for structured data is derived and evaluated with simulations and on a benchmark data set. Main results. The results show that state-of-the-art BCI algorithms benefit from robustly estimated parameters. Significance. Since parameter estimation is an integral part of various machine learning algorithms, the presented techniques are applicable to many problems beyond BCI.
M-estimation for robust sparse unmixing of hyperspectral images

NASA Astrophysics Data System (ADS)

Toomik, Maria; Lu, Shijian; Nelson, James D. B.

2016-10-01

Hyperspectral unmixing methods often use a conventional least squares based lasso which assumes that the data follows the Gaussian distribution. The normality assumption is an approximation which is generally invalid for real imagery data. We consider a robust (non-Gaussian) approach to sparse spectral unmixing of remotely sensed imagery which reduces the sensitivity of the estimator to outliers and relaxes the linearity assumption. The method consists of several appropriate penalties. We propose to use an lp norm with 0 < p < 1 in the sparse regression problem, which induces more sparsity in the results, but makes the problem non-convex. On the other hand, the problem, though non-convex, can be solved quite straightforwardly with an extensible algorithm based on iteratively reweighted least squares. To deal with the huge size of modern spectral libraries we introduce a library reduction step, similar to the multiple signal classification (MUSIC) array processing algorithm, which not only speeds up unmixing but also yields superior results. In the hyperspectral setting we extend the traditional least squares method to the robust heavy-tailed case and propose a generalised M-lasso solution. M-estimation replaces the Gaussian likelihood with a fixed function ρ(e) that restrains outliers. The M-estimate function reduces the effect of errors with large amplitudes or even assigns the outliers zero weights. Our experimental results on real hyperspectral data show that noise with large amplitudes (outliers) often exists in the data. This ability to mitigate the influence of such outliers can therefore offer greater robustness. Qualitative hyperspectral unmixing results on real hyperspectral image data corroborate the efficacy of the proposed method.
feets: feATURE eXTRACTOR for tIME sERIES

NASA Astrophysics Data System (ADS)

Cabral, Juan; Sanchez, Bruno; Ramos, Felipe; Gurovich, Sebastián; Granitto, Pablo; VanderPlas, Jake

2018-06-01

feets characterizes and analyzes light-curves from astronomical photometric databases for modelling, classification, data cleaning, outlier detection and data analysis. It uses machine learning algorithms to determine the numerical descriptors that characterize and distinguish the different variability classes of light-curves; these range from basic statistical measures such as the mean or standard deviation to complex time-series characteristics such as the autocorrelation function. The library is not restricted to the astronomical field and could also be applied to any kind of time series. This project is a derivative work of FATS (ascl:1711.017).
Robust point matching via vector field consensus.

PubMed

Jiayi Ma; Ji Zhao; Jinwen Tian; Yuille, Alan L; Zhuowen Tu

2014-04-01

In this paper, we propose an efficient algorithm, called vector field consensus, for establishing robust point correspondences between two sets of points. Our algorithm starts by creating a set of putative correspondences which can contain a very large number of false correspondences, or outliers, in addition to a limited number of true correspondences (inliers). Next, we solve for correspondence by interpolating a vector field between the two point sets, which involves estimating a consensus of inlier points whose matching follows a nonparametric geometrical constraint. We formulate this a maximum a posteriori (MAP) estimation of a Bayesian model with hidden/latent variables indicating whether matches in the putative set are outliers or inliers. We impose nonparametric geometrical constraints on the correspondence, as a prior distribution, using Tikhonov regularizers in a reproducing kernel Hilbert space. MAP estimation is performed by the EM algorithm which by also estimating the variance of the prior model (initialized to a large value) is able to obtain good estimates very quickly (e.g., avoiding many of the local minima inherent in this formulation). We illustrate this method on data sets in 2D and 3D and demonstrate that it is robust to a very large number of outliers (even up to 90%). We also show that in the special case where there is an underlying parametric geometrical model (e.g., the epipolar line constraint) that we obtain better results than standard alternatives like RANSAC if a large number of outliers are present. This suggests a two-stage strategy, where we use our nonparametric model to reduce the size of the putative set and then apply a parametric variant of our approach to estimate the geometric parameters. Our algorithm is computationally efficient and we provide code for others to use it. In addition, our approach is general and can be applied to other problems, such as learning with a badly corrupted training data set.
Automated artifact detection and removal for improved tensor estimation in motion-corrupted DTI data sets using the combination of local binary patterns and 2D partial least squares.

PubMed

Zhou, Zhenyu; Liu, Wei; Cui, Jiali; Wang, Xunheng; Arias, Diana; Wen, Ying; Bansal, Ravi; Hao, Xuejun; Wang, Zhishun; Peterson, Bradley S; Xu, Dongrong

2011-02-01

Signal variation in diffusion-weighted images (DWIs) is influenced both by thermal noise and by spatially and temporally varying artifacts, such as rigid-body motion and cardiac pulsation. Motion artifacts are particularly prevalent when scanning difficult patient populations, such as human infants. Although some motion during data acquisition can be corrected using image coregistration procedures, frequently individual DWIs are corrupted beyond repair by sudden, large amplitude motion either within or outside of the imaging plane. We propose a novel approach to identify and reject outlier images automatically using local binary patterns (LBP) and 2D partial least square (2D-PLS) to estimate diffusion tensors robustly. This method uses an enhanced LBP algorithm to extract texture features from a local texture feature of the image matrix from the DWI data. Because the images have been transformed to local texture matrices, we are able to extract discriminating information that identifies outliers in the data set by extending a traditional one-dimensional PLS algorithm to a two-dimension operator. The class-membership matrix in this 2D-PLS algorithm is adapted to process samples that are image matrix, and the membership matrix thus represents varying degrees of importance of local information within the images. We also derive the analytic form of the generalized inverse of the class-membership matrix. We show that this method can effectively extract local features from brain images obtained from a large sample of human infants to identify images that are outliers in their textural features, permitting their exclusion from further processing when estimating tensors using the DWIs. This technique is shown to be superior in performance when compared with visual inspection and other common methods to address motion-related artifacts in DWI data. This technique is applicable to correct motion artifact in other magnetic resonance imaging (MRI) techniques (e.g., the bootstrapping estimation) that use univariate or multivariate regression methods to fit MRI data to a pre-specified model. Copyright © 2011 Elsevier Inc. All rights reserved.
A simple transformation independent method for outlier definition.

PubMed

Johansen, Martin Berg; Christensen, Peter Astrup

2018-04-10

Definition and elimination of outliers is a key element for medical laboratories establishing or verifying reference intervals (RIs). Especially as inclusion of just a few outlying observations may seriously affect the determination of the reference limits. Many methods have been developed for definition of outliers. Several of these methods are developed for the normal distribution and often data require transformation before outlier elimination. We have developed a non-parametric transformation independent outlier definition. The new method relies on drawing reproducible histograms. This is done by using defined bin sizes above and below the median. The method is compared to the method recommended by CLSI/IFCC, which uses Box-Cox transformation (BCT) and Tukey's fences for outlier definition. The comparison is done on eight simulated distributions and an indirect clinical datasets. The comparison on simulated distributions shows that without outliers added the recommended method in general defines fewer outliers. However, when outliers are added on one side the proposed method often produces better results. With outliers on both sides the methods are equally good. Furthermore, it is found that the presence of outliers affects the BCT, and subsequently affects the determined limits of current recommended methods. This is especially seen in skewed distributions. The proposed outlier definition reproduced current RI limits on clinical data containing outliers. We find our simple transformation independent outlier detection method as good as or better than the currently recommended methods.
The detection and correction of outlying determinations that may occur during geochemical analysis

USGS Publications Warehouse

Harvey, P.K.

1974-01-01

'Wild', 'rogue' or outlying determinations occur periodically during geochemical analysis. Existing tests in the literature for the detection of such determinations within a set of replicate measurements are often misleading. This account describes the chances of detecting outliers and the extent to which correction may be made for their presence in sample sizes of three to seven replicate measurements. A systematic procedure for monitoring data for outliers is outlined. The problem of outliers becomes more important as instrumental methods of analysis become faster and more highly automated; a state in which it becomes increasingly difficult for the analyst to examine every determination. The recommended procedure is easily adapted to such analytical systems. ?? 1974.
Effects of rooting via out-groups on in-group topology in phylogeny.

PubMed

Ackerman, Margareta; Brown, Daniel G; Loker, David

2014-01-01

Users of phylogenetic methods require rooted trees, because the direction of time depends on the placement of the root. While phylogenetic trees are typically rooted by using an out-group, this mechanism is inappropriate when the addition of an out-group changes the in-group topology. We perform a formal analysis of phylogenetic algorithms under the inclusion of distant out-groups. It turns out that linkage-based algorithms (including UPGMA) and a class of bisecting methods do not modify the topology of the in-group when an out-group is included. By contrast, the popular neighbour joining algorithm fails this property in a strong sense: every data set can have its structure destroyed by some arbitrarily distant outlier. Furthermore, including multiple outliers can lead to an arbitrary topology on the in-group. The standard rooting approach that uses out-groups may be fundamentally unsuited for neighbour joining.

A robust multi-frequency mixing algorithm for suppression of rivet signal in GMR inspection of riveted structures

NASA Astrophysics Data System (ADS)

Safdernejad, Morteza S.; Karpenko, Oleksii; Ye, Chaofeng; Udpa, Lalita; Udpa, Satish

2016-02-01

The advent of Giant Magneto-Resistive (GMR) technology permits development of novel highly sensitive array probes for Eddy Current (EC) inspection of multi-layer riveted structures. Multi-frequency GMR measurements with different EC pene-tration depths show promise for detection of bottom layer notches at fastener sites. However, the distortion of the induced magnetic field due to flaws is dominated by the strong fastener signal, which makes defect detection and classification a challenging prob-lem. This issue is more pronounced for ferromagnetic fasteners that concentrate most of the magnetic flux. In the present work, a novel multi-frequency mixing algorithm is proposed to suppress rivet signal response and enhance defect detection capability of the GMR array probe. The algorithm is baseline-free and does not require any assumptions about the sample geometry being inspected. Fastener signal suppression is based upon the random sample consensus (RANSAC) method, which iteratively estimates parameters of a mathematical model from a set of observed data with outliers. Bottom layer defects at fastener site are simulated as EDM notches of different length. Performance of the proposed multi-frequency mixing approach is evaluated on finite element data and experimental GMR measurements obtained with unidirectional planar current excitation. Initial results are promising demonstrating the feasibility of the approach.
Detection of outliers in the response and explanatory variables of the simple circular regression model

NASA Astrophysics Data System (ADS)

Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

2016-06-01

The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Helicopter rotor blade frequency evolution with damage growth and signal processing

NASA Astrophysics Data System (ADS)

Roy, Niranjan; Ganguli, Ranjan

2005-05-01

Structural damage in materials evolves over time due to growth of fatigue cracks in homogenous materials and a complicated process of matrix cracking, delamination, fiber breakage and fiber matrix debonding in composite materials. In this study, a finite element model of the helicopter rotor blade is used to analyze the effect of damage growth on the modal frequencies in a qualitative manner. Phenomenological models of material degradation for homogenous and composite materials are used. Results show that damage can be detected by monitoring changes in lower as well as higher mode flap (out-of-plane bending), lag (in-plane bending) and torsion rotating frequencies, especially for composite materials where the onset of the last stage of damage of fiber breakage is most critical. Curve fits are also proposed for mathematical modeling of the relationship between rotating frequencies and cycles. Finally, since operational data are noisy and also contaminated with outliers, denoising algorithms based on recursive median filters and radial basis function neural networks and wavelets are studied and compared with a moving average filter using simulated data for improved health-monitoring application. A novel recursive median filter is designed using integer programming through genetic algorithm and is found to have comparable performance to neural networks with much less complexity and is better than wavelet denoising for outlier removal. This filter is proposed as a tool for denoising time series of damage indicators.
Ellipsoids for anomaly detection in remote sensing imagery

NASA Astrophysics Data System (ADS)

Grosklos, Guenchik; Theiler, James

2015-05-01

For many target and anomaly detection algorithms, a key step is the estimation of a centroid (relatively easy) and a covariance matrix (somewhat harder) that characterize the background clutter. For a background that can be modeled as a multivariate Gaussian, the centroid and covariance lead to an explicit probability density function that can be used in likelihood ratio tests for optimal detection statistics. But ellipsoidal contours can characterize a much larger class of multivariate density function, and the ellipsoids that characterize the outer periphery of the distribution are most appropriate for detection in the low false alarm rate regime. Traditionally the sample mean and sample covariance are used to estimate ellipsoid location and shape, but these quantities are confounded both by large lever-arm outliers and non-Gaussian distributions within the ellipsoid of interest. This paper compares a variety of centroid and covariance estimation schemes with the aim of characterizing the periphery of the background distribution. In particular, we will consider a robust variant of the Khachiyan algorithm for minimum-volume enclosing ellipsoid. The performance of these different approaches is evaluated on multispectral and hyperspectral remote sensing imagery using coverage plots of ellipsoid volume versus false alarm rate.
Defining ‘Unhealthy’: A Systematic Analysis of Alignment between the Australian Dietary Guidelines and the Health Star Rating System

PubMed Central

Rådholm, Karin; Neal, Bruce

2018-01-01

The Australian Dietary Guidelines (ADGs) and Health Star Rating (HSR) front-of-pack labelling system are two national interventions to promote healthier diets. Our aim was to assess the degree of alignment between the two policies. Methods: Nutrition information was extracted for 65,660 packaged foods available in The George Institute’s Australian FoodSwitch database. Products were classified ‘core’ or ‘discretionary’ based on the ADGs, and a HSR generated irrespective of whether currently displayed on pack. Apparent outliers were identified as those products classified ‘core’ that received HSR ≤ 2.0; and those classified ‘discretionary’ that received HSR ≥ 3.5. Nutrient cut-offs were applied to determine whether apparent outliers were ‘high in’ salt, total sugar or saturated fat, and outlier status thereby attributed to a failure of the ADGs or HSR algorithm. Results: 47,116 products (23,460 core; 23,656 discretionary) were included. Median (Q1, Q3) HSRs were 4.0 (3.0 to 4.5) for core and 2.0 (1.0 to 3.0) for discretionary products. Overall alignment was good: 86.6% of products received a HSR aligned with their ADG classification. Among 6324 products identified as apparent outliers, 5246 (83.0%) were ultimately determined to be ADG failures, largely caused by challenges in defining foods as ‘core’ or ‘discretionary’. In total, 1078 (17.0%) were determined to be true failures of the HSR algorithm. Conclusion: The scope of genuine misalignment between the ADGs and HSR algorithm is very small. We provide evidence-informed recommendations for strengthening both policies to more effectively guide Australians towards healthier choices. PMID:29670024
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sheng, Y; Ge, Y; Yuan, L

Purpose: To investigate the impact of outliers on knowledge modeling in radiation therapy, and develop a systematic workflow for identifying and analyzing geometric and dosimetric outliers using pelvic cases. Methods: Four groups (G1-G4) of pelvic plans were included: G1 (37 prostate cases), G2 (37 prostate plus lymph node cases), and G3 (37 prostate bed cases) are all clinical IMRT cases. G4 are 10 plans outside G1 re-planned with dynamic-arc to simulate dosimetric outliers. The workflow involves 2 steps: 1. identify geometric outliers, assess impact and clean up; 2. identify dosimetric outliers, assess impact and clean up.1. A baseline model wasmore » trained with all G1 cases. G2/G3 cases were then individually added to the baseline model as geometric outliers. The impact on the model was assessed by comparing leverage statistic of inliers (G1) and outliers (G2/G3). Receiver-operating-characteristics (ROC) analysis was performed to determine optimal threshold. 2. A separate baseline model was trained with 32 G1 cases. Each G4 case (dosimetric outliers) was then progressively added to perturb this model. DVH predictions were performed using these perturbed models for remaining 5 G1 cases. Normal tissue complication probability (NTCP) calculated from predicted DVH were used to evaluate dosimetric outliers’ impact. Results: The leverage of inliers and outliers was significantly different. The Area-Under-Curve (AUC) for differentiating G2 from G1 was 0.94 (threshold: 0.22) for bladder; and 0.80 (threshold: 0.10) for rectum. For differentiating G3 from G1, the AUC (threshold) was 0.68 (0.09) for bladder, 0.76 (0.08) for rectum. Significant increase in NTCP started from models with 4 dosimetric outliers for bladder (p<0.05), and with only 1 dosimetric outlier for rectum (p<0.05). Conclusion: We established a systematic workflow for identifying and analyzing geometric and dosimetric outliers, and investigated statistical metrics for detecting. Results validated the necessity for outlier detection and clean-up to enhance model quality in clinical practice. Research Grant: Varian master research grant.« less
Improving the space surveillance telescope's performance using multi-hypothesis testing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chris Zingarelli, J.; Cain, Stephen; Pearce, Eric

2014-05-01

The Space Surveillance Telescope (SST) is a Defense Advanced Research Projects Agency program designed to detect objects in space like near Earth asteroids and space debris in the geosynchronous Earth orbit (GEO) belt. Binary hypothesis test (BHT) methods have historically been used to facilitate the detection of new objects in space. In this paper a multi-hypothesis detection strategy is introduced to improve the detection performance of SST. In this context, the multi-hypothesis testing (MHT) determines if an unresolvable point source is in either the center, a corner, or a side of a pixel in contrast to BHT, which only testsmore » whether an object is in the pixel or not. The images recorded by SST are undersampled such as to cause aliasing, which degrades the performance of traditional detection schemes. The equations for the MHT are derived in terms of signal-to-noise ratio (S/N), which is computed by subtracting the background light level around the pixel being tested and dividing by the standard deviation of the noise. A new method for determining the local noise statistics that rejects outliers is introduced in combination with the MHT. An experiment using observations of a known GEO satellite are used to demonstrate the improved detection performance of the new algorithm over algorithms previously reported in the literature. The results show a significant improvement in the probability of detection by as much as 50% over existing algorithms. In addition to detection, the S/N results prove to be linearly related to the least-squares estimates of point source irradiance, thus improving photometric accuracy.« less
Residual Error Based Anomaly Detection Using Auto-Encoder in SMD Machine Sound.

PubMed

Oh, Dong Yul; Yun, Il Dong

2018-04-24

Detecting an anomaly or an abnormal situation from given noise is highly useful in an environment where constantly verifying and monitoring a machine is required. As deep learning algorithms are further developed, current studies have focused on this problem. However, there are too many variables to define anomalies, and the human annotation for a large collection of abnormal data labeled at the class-level is very labor-intensive. In this paper, we propose to detect abnormal operation sounds or outliers in a very complex machine along with reducing the data-driven annotation cost. The architecture of the proposed model is based on an auto-encoder, and it uses the residual error, which stands for its reconstruction quality, to identify the anomaly. We assess our model using Surface-Mounted Device (SMD) machine sound, which is very complex, as experimental data, and state-of-the-art performance is successfully achieved for anomaly detection.
A Finite Mixture Method for Outlier Detection and Robustness in Meta-Analysis

ERIC Educational Resources Information Center

Beath, Ken J.

2014-01-01

When performing a meta-analysis unexplained variation above that predicted by within study variation is usually modeled by a random effect. However, in some cases, this is not sufficient to explain all the variation because of outlier or unusual studies. A previously described method is to define an outlier as a study requiring a higher random…
Case-Deletion Diagnostics for Maximum Likelihood Multipoint Quantitative Trait Locus Linkage Analysis

PubMed Central

Mendoza, Maria C.B.; Burns, Trudy L.; Jones, Michael P.

2009-01-01

Objectives Case-deletion diagnostic methods are tools that allow identification of influential observations that may affect parameter estimates and model fitting conclusions. The goal of this paper was to develop two case-deletion diagnostics, the exact case deletion (ECD) and the empirical influence function (EIF), for detecting outliers that can affect results of sib-pair maximum likelihood quantitative trait locus (QTL) linkage analysis. Methods Subroutines to compute the ECD and EIF were incorporated into the maximum likelihood QTL variance estimation components of the linkage analysis program MAPMAKER/SIBS. Performance of the diagnostics was compared in simulation studies that evaluated the proportion of outliers correctly identified (sensitivity), and the proportion of non-outliers correctly identified (specificity). Results Simulations involving nuclear family data sets with one outlier showed EIF sensitivities approximated ECD sensitivities well for outlier-affected parameters. Sensitivities were high, indicating the outlier was identified a high proportion of the time. Simulations also showed the enormous computational time advantage of the EIF. Diagnostics applied to body mass index in nuclear families detected observations influential on the lod score and model parameter estimates. Conclusions The EIF is a practical diagnostic tool that has the advantages of high sensitivity and quick computation. PMID:19172086
The Outlier Detection for Ordinal Data Using Scalling Technique of Regression Coefficients

NASA Astrophysics Data System (ADS)

Adnan, Arisman; Sugiarto, Sigit

2017-06-01

The aims of this study is to detect the outliers by using coefficients of Ordinal Logistic Regression (OLR) for the case of k category responses where the score from 1 (the best) to 8 (the worst). We detect them by using the sum of moduli of the ordinal regression coefficients calculated by jackknife technique. This technique is improved by scalling the regression coefficients to their means. R language has been used on a set of ordinal data from reference distribution. Furthermore, we compare this approach by using studentised residual plots of jackknife technique for ANOVA (Analysis of Variance) and OLR. This study shows that the jackknifing technique along with the proper scaling may lead us to reveal outliers in ordinal regression reasonably well.
A Novel Real-Time Reference Key Frame Scan Matching Method.

PubMed

Mohamed, Haytham; Moussa, Adel; Elhabiby, Mohamed; El-Sheimy, Naser; Sesay, Abu

2017-05-07

Unmanned aerial vehicles represent an effective technology for indoor search and rescue operations. Typically, most indoor missions' environments would be unknown, unstructured, and/or dynamic. Navigation of UAVs in such environments is addressed by simultaneous localization and mapping approach using either local or global approaches. Both approaches suffer from accumulated errors and high processing time due to the iterative nature of the scan matching method. Moreover, point-to-point scan matching is prone to outlier association processes. This paper proposes a low-cost novel method for 2D real-time scan matching based on a reference key frame (RKF). RKF is a hybrid scan matching technique comprised of feature-to-feature and point-to-point approaches. This algorithm aims at mitigating errors accumulation using the key frame technique, which is inspired from video streaming broadcast process. The algorithm depends on the iterative closest point algorithm during the lack of linear features which is typically exhibited in unstructured environments. The algorithm switches back to the RKF once linear features are detected. To validate and evaluate the algorithm, the mapping performance and time consumption are compared with various algorithms in static and dynamic environments. The performance of the algorithm exhibits promising navigational, mapping results and very short computational time, that indicates the potential use of the new algorithm with real-time systems.
Supervised Outlier Detection in Large-Scale Mvs Point Clouds for 3d City Modeling Applications

NASA Astrophysics Data System (ADS)

Stucker, C.; Richard, A.; Wegner, J. D.; Schindler, K.

2018-05-01

We propose to use a discriminative classifier for outlier detection in large-scale point clouds of cities generated via multi-view stereo (MVS) from densely acquired images. What makes outlier removal hard are varying distributions of inliers and outliers across a scene. Heuristic outlier removal using a specific feature that encodes point distribution often delivers unsatisfying results. Although most outliers can be identified correctly (high recall), many inliers are erroneously removed (low precision), too. This aggravates object 3D reconstruction due to missing data. We thus propose to discriminatively learn class-specific distributions directly from the data to achieve high precision. We apply a standard Random Forest classifier that infers a binary label (inlier or outlier) for each 3D point in the raw, unfiltered point cloud and test two approaches for training. In the first, non-semantic approach, features are extracted without considering the semantic interpretation of the 3D points. The trained model approximates the average distribution of inliers and outliers across all semantic classes. Second, semantic interpretation is incorporated into the learning process, i.e. we train separate inlieroutlier classifiers per semantic class (building facades, roof, ground, vegetation, fields, and water). Performance of learned filtering is evaluated on several large SfM point clouds of cities. We find that results confirm our underlying assumption that discriminatively learning inlier-outlier distributions does improve precision over global heuristics by up to ≍ 12 percent points. Moreover, semantically informed filtering that models class-specific distributions further improves precision by up to ≍ 10 percent points, being able to remove very isolated building, roof, and water points while preserving inliers on building facades and vegetation.
Log Pearson type 3 quantile estimators with regional skew information and low outlier adjustments

USGS Publications Warehouse

Griffis, V.W.; Stedinger, Jery R.; Cohn, T.A.

2004-01-01

The recently developed expected moments algorithm (EMA) [Cohn et al., 1997] does as well as maximum likelihood estimations at estimating log‐Pearson type 3 (LP3) flood quantiles using systematic and historical flood information. Needed extensions include use of a regional skewness estimator and its precision to be consistent with Bulletin 17B. Another issue addressed by Bulletin 17B is the treatment of low outliers. A Monte Carlo study compares the performance of Bulletin 17B using the entire sample with and without regional skew with estimators that use regional skew and censor low outliers, including an extended EMA estimator, the conditional probability adjustment (CPA) from Bulletin 17B, and an estimator that uses probability plot regression (PPR) to compute substitute values for low outliers. Estimators that neglect regional skew information do much worse than estimators that use an informative regional skewness estimator. For LP3 data the low outlier rejection procedure generally results in no loss of overall accuracy, and the differences between the MSEs of the estimators that used an informative regional skew are generally modest in the skewness range of real interest. Samples contaminated to model actual flood data demonstrate that estimators which give special treatment to low outliers significantly outperform estimators that make no such adjustment.
Log Pearson type 3 quantile estimators with regional skew information and low outlier adjustments

NASA Astrophysics Data System (ADS)

Griffis, V. W.; Stedinger, J. R.; Cohn, T. A.

2004-07-01

The recently developed expected moments algorithm (EMA) [, 1997] does as well as maximum likelihood estimations at estimating log-Pearson type 3 (LP3) flood quantiles using systematic and historical flood information. Needed extensions include use of a regional skewness estimator and its precision to be consistent with Bulletin 17B. Another issue addressed by Bulletin 17B is the treatment of low outliers. A Monte Carlo study compares the performance of Bulletin 17B using the entire sample with and without regional skew with estimators that use regional skew and censor low outliers, including an extended EMA estimator, the conditional probability adjustment (CPA) from Bulletin 17B, and an estimator that uses probability plot regression (PPR) to compute substitute values for low outliers. Estimators that neglect regional skew information do much worse than estimators that use an informative regional skewness estimator. For LP3 data the low outlier rejection procedure generally results in no loss of overall accuracy, and the differences between the MSEs of the estimators that used an informative regional skew are generally modest in the skewness range of real interest. Samples contaminated to model actual flood data demonstrate that estimators which give special treatment to low outliers significantly outperform estimators that make no such adjustment.
Observed to expected or logistic regression to identify hospitals with high or low 30-day mortality?

PubMed Central

Helgeland, Jon; Clench-Aas, Jocelyne; Laake, Petter; Veierød, Marit B.

2018-01-01

Introduction A common quality indicator for monitoring and comparing hospitals is based on death within 30 days of admission. An important use is to determine whether a hospital has higher or lower mortality than other hospitals. Thus, the ability to identify such outliers correctly is essential. Two approaches for detection are: 1) calculating the ratio of observed to expected number of deaths (OE) per hospital and 2) including all hospitals in a logistic regression (LR) comparing each hospital to a form of average over all hospitals. The aim of this study was to compare OE and LR with respect to correctly identifying 30-day mortality outliers. Modifications of the methods, i.e., variance corrected approach of OE (OE-Faris), bias corrected LR (LR-Firth), and trimmed mean variants of LR and LR-Firth were also studied. Materials and methods To study the properties of OE and LR and their variants, we performed a simulation study by generating patient data from hospitals with known outlier status (low mortality, high mortality, non-outlier). Data from simulated scenarios with varying number of hospitals, hospital volume, and mortality outlier status, were analysed by the different methods and compared by level of significance (ability to falsely claim an outlier) and power (ability to reveal an outlier). Moreover, administrative data for patients with acute myocardial infarction (AMI), stroke, and hip fracture from Norwegian hospitals for 2012–2014 were analysed. Results None of the methods achieved the nominal (test) level of significance for both low and high mortality outliers. For low mortality outliers, the levels of significance were increased four- to fivefold for OE and OE-Faris. For high mortality outliers, OE and OE-Faris, LR 25% trimmed and LR-Firth 10% and 25% trimmed maintained approximately the nominal level. The methods agreed with respect to outlier status for 94.1% of the AMI hospitals, 98.0% of the stroke, and 97.8% of the hip fracture hospitals. Conclusion We recommend, on the balance, LR-Firth 10% or 25% trimmed for detection of both low and high mortality outliers. PMID:29652941
Outlier Detection in Infrared Signatures

DTIC Science & Technology

1992-01-01

for model idcntification. Gnanadcsikan (1977) pointed out that Hampci’s influence function (Huampcl (1974)) can bc used to estimate the effect...individual outliers have on sample estimates of parameters. Chernick noted that the influence function for parameters of intcrcst to the users of a data...important outliers, while those with amall estimated influence are not). In this way the influence function provides a "distance" measure for multi
Parallel and Scalable Clustering and Classification for Big Data in Geosciences

NASA Astrophysics Data System (ADS)

Riedel, M.

2015-12-01

Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.
A clustering algorithm for sample data based on environmental pollution characteristics

NASA Astrophysics Data System (ADS)

Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

2015-04-01

Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
Multiple approaches to detect outliers in a genome scan for selection in ocellated lizards (Lacerta lepida) along an environmental gradient.

PubMed

Nunes, Vera L; Beaumont, Mark A; Butlin, Roger K; Paulo, Octávio S

2011-01-01

Identification of loci with adaptive importance is a key step to understand the speciation process in natural populations, because those loci are responsible for phenotypic variation that affects fitness in different environments. We conducted an AFLP genome scan in populations of ocellated lizards (Lacerta lepida) to search for candidate loci influenced by selection along an environmental gradient in the Iberian Peninsula. This gradient is strongly influenced by climatic variables, and two subspecies can be recognized at the opposite extremes: L. lepida iberica in the northwest and L. lepida nevadensis in the southeast. Both subspecies show substantial morphological differences that may be involved in their local adaptation to the climatic extremes. To investigate how the use of a particular outlier detection method can influence the results, a frequentist method, DFDIST, and a Bayesian method, BayeScan, were used to search for outliers influenced by selection. Additionally, the spatial analysis method was used to test for associations of AFLP marker band frequencies with 54 climatic variables by logistic regression. Results obtained with each method highlight differences in their sensitivity. DFDIST and BayeScan detected a similar proportion of outliers (3-4%), but only a few loci were simultaneously detected by both methods. Several loci detected as outliers were also associated with temperature, insolation or precipitation according to spatial analysis method. These results are in accordance with reported data in the literature about morphological and life-history variation of L. lepida subspecies along the environmental gradient. © 2010 Blackwell Publishing Ltd.

Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks.

PubMed

Azcorra, A; Chiroque, L F; Cuevas, R; Fernández Anta, A; Laniado, H; Lillo, R E; Romo, J; Sguera, C

2018-05-03

Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.
System and Method for Outlier Detection via Estimating Clusters

NASA Technical Reports Server (NTRS)

Iverson, David J. (Inventor)

2016-01-01

An efficient method and system for real-time or offline analysis of multivariate sensor data for use in anomaly detection, fault detection, and system health monitoring is provided. Models automatically derived from training data, typically nominal system data acquired from sensors in normally operating conditions or from detailed simulations, are used to identify unusual, out of family data samples (outliers) that indicate possible system failure or degradation. Outliers are determined through analyzing a degree of deviation of current system behavior from the models formed from the nominal system data. The deviation of current system behavior is presented as an easy to interpret numerical score along with a measure of the relative contribution of each system parameter to any off-nominal deviation. The techniques described herein may also be used to "clean" the training data.
Identifying Outliers of Non-Gaussian Groundwater State Data Based on Ensemble Estimation for Long-Term Trends

NASA Astrophysics Data System (ADS)

Park, E.; Jeong, J.; Choi, J.; Han, W. S.; Yun, S. T.

2016-12-01

Three modified outlier identification methods: the three sigma rule (3s), inter quantile range (IQR) and median absolute deviation (MAD), which take advantage of the ensemble regression method are proposed. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method is found to have a limitation in the false identification of excessive outliers, which may be supplemented by joint applications with the other methods (i.e., the 3s rule and MAD methods). The proposed methods can be also applied as a potential tool for future anomaly detection by model training based on currently available data.
AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au

In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less
Improving the Space Surveillance Telescope's Performance Using Multi-Hypothesis Testing

NASA Astrophysics Data System (ADS)

Zingarelli, J. Chris; Pearce, Eric; Lambour, Richard; Blake, Travis; Peterson, Curtis J. R.; Cain, Stephen

2014-05-01

The Space Surveillance Telescope (SST) is a Defense Advanced Research Projects Agency program designed to detect objects in space like near Earth asteroids and space debris in the geosynchronous Earth orbit (GEO) belt. Binary hypothesis test (BHT) methods have historically been used to facilitate the detection of new objects in space. In this paper a multi-hypothesis detection strategy is introduced to improve the detection performance of SST. In this context, the multi-hypothesis testing (MHT) determines if an unresolvable point source is in either the center, a corner, or a side of a pixel in contrast to BHT, which only tests whether an object is in the pixel or not. The images recorded by SST are undersampled such as to cause aliasing, which degrades the performance of traditional detection schemes. The equations for the MHT are derived in terms of signal-to-noise ratio (S/N), which is computed by subtracting the background light level around the pixel being tested and dividing by the standard deviation of the noise. A new method for determining the local noise statistics that rejects outliers is introduced in combination with the MHT. An experiment using observations of a known GEO satellite are used to demonstrate the improved detection performance of the new algorithm over algorithms previously reported in the literature. The results show a significant improvement in the probability of detection by as much as 50% over existing algorithms. In addition to detection, the S/N results prove to be linearly related to the least-squares estimates of point source irradiance, thus improving photometric accuracy. The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Detection of Outliers in TWSTFT Data Used in TAI

DTIC Science & Technology

2009-11-01

41st Annual Precise Time and Time Interval (PTTI) Meeting 421 DETECTION OF OUTLIERS IN TWSTFT DATA USED IN TAI A...data in two-way satellite time and frequency transfer ( TWSTFT ) time links. In the case of TWSTFT data used to calculate International Atomic Time...data; that TWSTFT links can show an underlying slope which renders the standard treatment more difficult. Using phase and frequency filtering
Robust prediction of protein subcellular localization combining PCA and WSVMs.

PubMed

Tian, Jiang; Gu, Hong; Liu, Wenqi; Gao, Chiyang

2011-08-01

Automated prediction of protein subcellular localization is an important tool for genome annotation and drug discovery, and Support Vector Machines (SVMs) can effectively solve this problem in a supervised manner. However, the datasets obtained from real experiments are likely to contain outliers or noises, which can lead to poor generalization ability and classification accuracy. To explore this problem, we adopt strategies to lower the effect of outliers. First we design a method based on Weighted SVMs, different weights are assigned to different data points, so the training algorithm will learn the decision boundary according to the relative importance of the data points. Second we analyse the influence of Principal Component Analysis (PCA) on WSVM classification, propose a hybrid classifier combining merits of both PCA and WSVM. After performing dimension reduction operations on the datasets, kernel-based possibilistic c-means algorithm can generate more suitable weights for the training, as PCA transforms the data into a new coordinate system with largest variances affected greatly by the outliers. Experiments on benchmark datasets show promising results, which confirms the effectiveness of the proposed method in terms of prediction accuracy. Copyright © 2011 Elsevier Ltd. All rights reserved.
PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data.

PubMed

Mejia, Amanda F; Nebel, Mary Beth; Eloyan, Ani; Caffo, Brian; Lindquist, Martin A

2017-07-01

Outlier detection for high-dimensional (HD) data is a popular topic in modern statistical research. However, one source of HD data that has received relatively little attention is functional magnetic resonance images (fMRI), which consists of hundreds of thousands of measurements sampled at hundreds of time points. At a time when the availability of fMRI data is rapidly growing-primarily through large, publicly available grassroots datasets-automated quality control and outlier detection methods are greatly needed. We propose principal components analysis (PCA) leverage and demonstrate how it can be used to identify outlying time points in an fMRI run. Furthermore, PCA leverage is a measure of the influence of each observation on the estimation of principal components, which are often of interest in fMRI data. We also propose an alternative measure, PCA robust distance, which is less sensitive to outliers and has controllable statistical properties. The proposed methods are validated through simulation studies and are shown to be highly accurate. We also conduct a reliability study using resting-state fMRI data from the Autism Brain Imaging Data Exchange and find that removal of outliers using the proposed methods results in more reliable estimation of subject-level resting-state networks using independent components analysis. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Applying robust variant of Principal Component Analysis as a damage detector in the presence of outliers

NASA Astrophysics Data System (ADS)

Gharibnezhad, Fahit; Mujica, Luis E.; Rodellar, José

2015-01-01

Using Principal Component Analysis (PCA) for Structural Health Monitoring (SHM) has received considerable attention over the past few years. PCA has been used not only as a direct method to identify, classify and localize damages but also as a significant primary step for other methods. Despite several positive specifications that PCA conveys, it is very sensitive to outliers. Outliers are anomalous observations that can affect the variance and the covariance as vital parts of PCA method. Therefore, the results based on PCA in the presence of outliers are not fully satisfactory. As a main contribution, this work suggests the use of robust variant of PCA not sensitive to outliers, as an effective way to deal with this problem in SHM field. In addition, the robust PCA is compared with the classical PCA in the sense of detecting probable damages. The comparison between the results shows that robust PCA can distinguish the damages much better than using classical one, and even in many cases allows the detection where classic PCA is not able to discern between damaged and non-damaged structures. Moreover, different types of robust PCA are compared with each other as well as with classical counterpart in the term of damage detection. All the results are obtained through experiments with an aircraft turbine blade using piezoelectric transducers as sensors and actuators and adding simulated damages.
Robustly Aligning a Shape Model and Its Application to Car Alignment of Unknown Pose.

PubMed

Li, Yan; Gu, Leon; Kanade, Takeo

2011-09-01

Precisely localizing in an image a set of feature points that form a shape of an object, such as car or face, is called alignment. Previous shape alignment methods attempted to fit a whole shape model to the observed data, based on the assumption of Gaussian observation noise and the associated regularization process. However, such an approach, though able to deal with Gaussian noise in feature detection, turns out not to be robust or precise because it is vulnerable to gross feature detection errors or outliers resulting from partial occlusions or spurious features from the background or neighboring objects. We address this problem by adopting a randomized hypothesis-and-test approach. First, a Bayesian inference algorithm is developed to generate a shape-and-pose hypothesis of the object from a partial shape or a subset of feature points. For alignment, a large number of hypotheses are generated by randomly sampling subsets of feature points, and then evaluated to find the one that minimizes the shape prediction error. This method of randomized subset-based matching can effectively handle outliers and recover the correct object shape. We apply this approach on a challenging data set of over 5,000 different-posed car images, spanning a wide variety of car types, lighting, background scenes, and partial occlusions. Experimental results demonstrate favorable improvements over previous methods on both accuracy and robustness.
Optimization of seasonal ARIMA models using differential evolution - simulated annealing (DESA) algorithm in forecasting dengue cases in Baguio City

NASA Astrophysics Data System (ADS)

Addawe, Rizavel C.; Addawe, Joel M.; Magadia, Joselito C.

2016-10-01

Accurate forecasting of dengue cases would significantly improve epidemic prevention and control capabilities. This paper attempts to provide useful models in forecasting dengue epidemic specific to the young and adult population of Baguio City. To capture the seasonal variations in dengue incidence, this paper develops a robust modeling approach to identify and estimate seasonal autoregressive integrated moving average (SARIMA) models in the presence of additive outliers. Since the least squares estimators are not robust in the presence of outliers, we suggest a robust estimation based on winsorized and reweighted least squares estimators. A hybrid algorithm, Differential Evolution - Simulated Annealing (DESA), is used to identify and estimate the parameters of the optimal SARIMA model. The method is applied to the monthly reported dengue cases in Baguio City, Philippines.
A quick method based on SIMPLISMA-KPLS for simultaneously selecting outlier samples and informative samples for model standardization in near infrared spectroscopy

NASA Astrophysics Data System (ADS)

Li, Li-Na; Ma, Chang-Ming; Chang, Ming; Zhang, Ren-Cheng

2017-12-01

A novel method based on SIMPLe-to-use Interactive Self-modeling Mixture Analysis (SIMPLISMA) and Kernel Partial Least Square (KPLS), named as SIMPLISMA-KPLS, is proposed in this paper for selection of outlier samples and informative samples simultaneously. It is a quick algorithm used to model standardization (or named as model transfer) in near infrared (NIR) spectroscopy. The NIR experiment data of the corn for analysis of the protein content is introduced to evaluate the proposed method. Piecewise direct standardization (PDS) is employed in model transfer. And the comparison of SIMPLISMA-PDS-KPLS and KS-PDS-KPLS is given in this research by discussion of the prediction accuracy of protein content and calculation speed of each algorithm. The conclusions include that SIMPLISMA-KPLS can be utilized as an alternative sample selection method for model transfer. Although it has similar accuracy to Kennard-Stone (KS), it is different from KS as it employs concentration information in selection program. This means that it ensures analyte information is involved in analysis, and the spectra (X) of the selected samples is interrelated with concentration (y). And it can be used for outlier sample elimination simultaneously by validation of calibration. According to the statistical data results of running time, it is clear that the sample selection process is more rapid when using KPLS. The quick algorithm of SIMPLISMA-KPLS is beneficial to improve the speed of online measurement using NIR spectroscopy.
A Novel Real-Time Reference Key Frame Scan Matching Method

PubMed Central

Mohamed, Haytham; Moussa, Adel; Elhabiby, Mohamed; El-Sheimy, Naser; Sesay, Abu

2017-01-01

Unmanned aerial vehicles represent an effective technology for indoor search and rescue operations. Typically, most indoor missions’ environments would be unknown, unstructured, and/or dynamic. Navigation of UAVs in such environments is addressed by simultaneous localization and mapping approach using either local or global approaches. Both approaches suffer from accumulated errors and high processing time due to the iterative nature of the scan matching method. Moreover, point-to-point scan matching is prone to outlier association processes. This paper proposes a low-cost novel method for 2D real-time scan matching based on a reference key frame (RKF). RKF is a hybrid scan matching technique comprised of feature-to-feature and point-to-point approaches. This algorithm aims at mitigating errors accumulation using the key frame technique, which is inspired from video streaming broadcast process. The algorithm depends on the iterative closest point algorithm during the lack of linear features which is typically exhibited in unstructured environments. The algorithm switches back to the RKF once linear features are detected. To validate and evaluate the algorithm, the mapping performance and time consumption are compared with various algorithms in static and dynamic environments. The performance of the algorithm exhibits promising navigational, mapping results and very short computational time, that indicates the potential use of the new algorithm with real-time systems. PMID:28481285
Identifying outliers of non-Gaussian groundwater state data based on ensemble estimation for long-term trends

NASA Astrophysics Data System (ADS)

Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kueyoung; Choung, Sungwook; Chung, Il Moon

2017-05-01

A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods - the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) - that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.
Micro- and macro-geographic scale effect on the molecular imprint of selection and adaptation in Norway spruce.

PubMed

Scalfi, Marta; Mosca, Elena; Di Pierro, Erica Adele; Troggio, Michela; Vendramin, Giovanni Giuseppe; Sperisen, Christoph; La Porta, Nicola; Neale, David B

2014-01-01

Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F(ST)-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F(ST)-outlier methods detected together 11 F(ST)-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F(ST)-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an integrative approach combining different outlier detection methods and population sampling at different geographic scales is useful to identify loci potentially involved in adaptation.
Micro- and Macro-Geographic Scale Effect on the Molecular Imprint of Selection and Adaptation in Norway Spruce

PubMed Central

Scalfi, Marta; Mosca, Elena; Di Pierro, Erica Adele; Troggio, Michela; Vendramin, Giovanni Giuseppe; Sperisen, Christoph; La Porta, Nicola; Neale, David B.

2014-01-01

Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F ST-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F ST-outlier methods detected together 11 F ST-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F ST-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an integrative approach combining different outlier detection methods and population sampling at different geographic scales is useful to identify loci potentially involved in adaptation. PMID:25551624
Track-Before-Detect Algorithm for Faint Moving Objects based on Random Sampling and Consensus

NASA Astrophysics Data System (ADS)

Dao, P.; Rast, R.; Schlaegel, W.; Schmidt, V.; Dentamaro, A.

2014-09-01

There are many algorithms developed for tracking and detecting faint moving objects in congested backgrounds. One obvious application is detection of targets in images where each pixel corresponds to the received power in a particular location. In our application, a visible imager operated in stare mode observes geostationary objects as fixed, stars as moving and non-geostationary objects as drifting in the field of view. We would like to achieve high sensitivity detection of the drifters. The ability to improve SNR with track-before-detect (TBD) processing, where target information is collected and collated before the detection decision is made, allows respectable performance against dim moving objects. Generally, a TBD algorithm consists of a pre-processing stage that highlights potential targets and a temporal filtering stage. However, the algorithms that have been successfully demonstrated, e.g. Viterbi-based and Bayesian-based, demand formidable processing power and memory. We propose an algorithm that exploits the quasi constant velocity of objects, the predictability of the stellar clutter and the intrinsically low false alarm rate of detecting signature candidates in 3-D, based on an iterative method called "RANdom SAmple Consensus” and one that can run real-time on a typical PC. The technique is tailored for searching objects with small telescopes in stare mode. Our RANSAC-MT (Moving Target) algorithm estimates parameters of a mathematical model (e.g., linear motion) from a set of observed data which contains a significant number of outliers while identifying inliers. In the pre-processing phase, candidate blobs were selected based on morphology and an intensity threshold that would normally generate unacceptable level of false alarms. The RANSAC sampling rejects candidates that conform to the predictable motion of the stars. Data collected with a 17 inch telescope by AFRL/RH and a COTS lens/EM-CCD sensor by the AFRL/RD Satellite Assessment Center is used to assess the performance of the algorithm. In the second application, a visible imager operated in sidereal mode observes geostationary objects as moving, stars as fixed except for field rotation, and non-geostationary objects as drifting. RANSAC-MT is used to detect the drifter. In this set of data, the drifting space object was detected at a distance of 13800 km. The AFRL/RH set of data, collected in the stare mode, contained the signature of two geostationary satellites. The signature of a moving object was simulated and added to the sequence of frames to determine the sensitivity in magnitude. The performance compares well with the more intensive TBD algorithms reported in the literature.
A Fast Algorithm of Convex Hull Vertices Selection for Online Classification.

PubMed

Ding, Shuguang; Nie, Xiangli; Qiao, Hong; Zhang, Bo

2018-04-01

Reducing samples through convex hull vertices selection (CHVS) within each class is an important and effective method for online classification problems, since the classifier can be trained rapidly with the selected samples. However, the process of CHVS is NP-hard. In this paper, we propose a fast algorithm to select the convex hull vertices, based on the convex hull decomposition and the property of projection. In the proposed algorithm, the quadratic minimization problem of computing the distance between a point and a convex hull is converted into a linear equation problem with a low computational complexity. When the data dimension is high, an approximate, instead of exact, convex hull is allowed to be selected by setting an appropriate termination condition in order to delete more nonimportant samples. In addition, the impact of outliers is also considered, and the proposed algorithm is improved by deleting the outliers in the initial procedure. Furthermore, a dimension convention technique via the kernel trick is used to deal with nonlinearly separable problems. An upper bound is theoretically proved for the difference between the support vector machines based on the approximate convex hull vertices selected and all the training samples. Experimental results on both synthetic and real data sets show the effectiveness and validity of the proposed algorithm.
Assessing Class-Wide Consistency and Randomness in Responses to True or False Questions Administered Online

ERIC Educational Resources Information Center

Pawl, Andrew; Teodorescu, Raluca E.; Peterson, Joseph D.

2013-01-01

We have developed simple data-mining algorithms to assess the consistency and the randomness of student responses to problems consisting of multiple true or false statements. In this paper we describe the algorithms and use them to analyze data from introductory physics courses. We investigate statements that emerge as outliers because the class…
The effectiveness of robust RMCD control chart as outliers’ detector

NASA Astrophysics Data System (ADS)

Darmanto; Astutik, Suci

2017-12-01

A well-known control chart to monitor a multivariate process is Hotelling’s T 2 which its parameters are estimated classically, very sensitive and also marred by masking and swamping of outliers data effect. To overcome these situation, robust estimators are strongly recommended. One of robust estimators is re-weighted minimum covariance determinant (RMCD) which has robust characteristics as same as MCD. In this paper, the effectiveness term is accuracy of the RMCD control chart in detecting outliers as real outliers. In other word, how effectively this control chart can identify and remove masking and swamping effects of outliers. We assessed the effectiveness the robust control chart based on simulation by considering different scenarios: n sample sizes, proportion of outliers, number of p quality characteristics. We found that in some scenarios, this RMCD robust control chart works effectively.

Iterative outlier removal: A method for identifying outliers in laboratory recalibration studies

PubMed Central

Parrinello, Christina M.; Grams, Morgan E.; Sang, Yingying; Couper, David; Wruck, Lisa M.; Li, Danni; Eckfeldt, John H.; Selvin, Elizabeth; Coresh, Josef

2016-01-01

Background Extreme values that arise for any reason, including through non-laboratory measurement procedure-related processes (inadequate mixing, evaporation, mislabeling), lead to outliers and inflate errors in recalibration studies. We present an approach termed iterative outlier removal (IOR) for identifying such outliers. Methods We previously identified substantial laboratory drift in uric acid measurements in the Atherosclerosis Risk in Communities (ARIC) Study over time. Serum uric acid was originally measured in 1990–92 on a Coulter DACOS instrument using an uricase-based measurement procedure. To recalibrate previous measured concentrations to a newer enzymatic colorimetric measurement procedure, uric acid was re-measured in 200 participants from stored plasma in 2011–13 on a Beckman Olympus 480 autoanalyzer. To conduct IOR, we excluded data points >3 standard deviations (SDs) from the mean difference. We continued this process using the resulting data until no outliers remained. Results IOR detected more outliers and yielded greater precision in simulation. The original mean difference (SD) in uric acid was 1.25 (0.62) mg/dL. After four iterations, 9 outliers were excluded, and the mean difference (SD) was 1.23 (0.45) mg/dL. Conducting only one round of outlier removal (standard approach) would have excluded 4 outliers (mean difference [SD] = 1.22 [0.51] mg/dL). Applying the recalibration (derived from Deming regression) from each approach to the original measurements, the prevalence of hyperuricemia (>7 mg/dL) was 28.5% before IOR and 8.5% after IOR. Conclusion IOR is a useful method for removal of extreme outliers irrelevant to recalibrating laboratory measurements, and identifies more extraneous outliers than the standard approach. PMID:27197675
Investigating the Consistency of Stellar Evolution Models with Globular Cluster Observations via the Red Giant Branch Bump

NASA Astrophysics Data System (ADS)

Joyce, Meridith; Chaboyer, Brian

2016-01-01

Synthetic Red Giant Branch Bump (RGBB) magnitudes are generated with the most recent theoretical stellar evolution models computed with the Dartmouth Stellar Evolution Program (DSEP) code. They are compared to the observational work of Nataf et al. (2013), who present RGBB magnitudes for 72 globular clusters. A DSEP model using a chemical composition with enhanced α capture [α/Fe] =+0.4 and an age of 13 Gyr shows agreement with observations over metallicities ranging from [Fe/H] = 0 to [Fe/H] ≈-1.5, with discrepancy emerging at lower metallicities. A model-independent, density-based outlier detection routine known as the Local Outlying Factor (LOF) algorithm is applied to the observations in order to identify clusters that deviate most in magnitude-metallicity space from the bulk of the observations. Our model's fit is scrutinized with a series of χ^2 routines performed on subsets of the data from which highly anomalous clusters have been selectively removed based on LOF identification. In particular, NGCs 6254, 6681, 6218, and 1904 are tagged recurrently as outliers. The effects of systematic and non-systematic error in metallicity are assessed, and the robustness of observational error bars is investigated.
Moving object detection via low-rank total variation regularization

NASA Astrophysics Data System (ADS)

Wang, Pengcheng; Chen, Qian; Shao, Na

2016-09-01

Moving object detection is a challenging task in video surveillance. Recently proposed Robust Principal Component Analysis (RPCA) can recover the outlier patterns from the low-rank data under some mild conditions. However, the l-penalty in RPCA doesn't work well in moving object detection because the irrepresentable condition is often not satisfied. In this paper, a method based on total variation (TV) regularization scheme is proposed. In our model, image sequences captured with a static camera are highly related, which can be described using a low-rank matrix. Meanwhile, the low-rank matrix can absorb background motion, e.g. periodic and random perturbation. The foreground objects in the sequence are usually sparsely distributed and drifting continuously, and can be treated as group outliers from the highly-related background scenes. Instead of l-penalty, we exploit the total variation of the foreground. By minimizing the total variation energy, the outliers tend to collapse and finally converge to be the exact moving objects. The TV-penalty is superior to the l-penalty especially when the outlier is in the majority for some pixels, and our method can estimate the outlier explicitly with less bias but higher variance. To solve the problem, a joint optimization function is formulated and can be effectively solved through the inexact Augmented Lagrange Multiplier (ALM) method. We evaluate our method along with several state-of-the-art approaches in MATLAB. Both qualitative and quantitative results demonstrate that our proposed method works effectively on a large range of complex scenarios.
Detection of outliers in water quality monitoring samples using functional data analysis in San Esteban estuary (Northern Spain).

PubMed

Díaz Muñiz, C; García Nieto, P J; Alonso Fernández, J R; Martínez Torres, J; Taboada, J

2012-11-15

Water quality controls involve large number of variables and observations, often subject to some outliers. An outlier is an observation that is numerically distant from the rest of the data or that appears to deviate markedly from other members of the sample in which it occurs. An interesting analysis is to find those observations that produce measurements that are different from the pattern established in the sample. Therefore, identification of atypical observations is an important concern in water quality monitoring and a difficult task because of the multivariate nature of water quality data. Our study provides a new method for detecting outliers in water quality monitoring parameters, using oxygen and turbidity as indicator variables. Until now, methods were based on considering the different parameters as a vector whose components were their concentration values. Our approach lies in considering water quality monitoring through time as curves instead of vectors, that is to say, the data set of the problem is considered as a time-dependent function and not as a set of discrete values in different time instants. The methodology, which is based on the concept of functional depth, was applied to the detection of outliers in water quality monitoring samples in San Esteban estuary. Results were discussed in terms of origin, causes, etc., and compared with those obtained using the conventional method based on vector comparison. Finally, the advantages of the functional method are exposed. Copyright © 2012 Elsevier B.V. All rights reserved.
Detecting isotopic ratio outliers

NASA Astrophysics Data System (ADS)

Bayne, C. K.; Smith, D. H.

An alternative method is proposed for improving isotopic ratio estimates. This method mathematically models pulse-count data and uses iterative reweighted Poisson regression to estimate model parameters to calculate the isotopic ratios. This computer-oriented approach provides theoretically better methods than conventional techniques to establish error limits and to identify outliers.
Robust Huber-based iterated divided difference filtering with application to cooperative localization of autonomous underwater vehicles.

PubMed

Gao, Wei; Liu, Yalong; Xu, Bo

2014-12-19

A new algorithm called Huber-based iterated divided difference filtering (HIDDF) is derived and applied to cooperative localization of autonomous underwater vehicles (AUVs) supported by a single surface leader. The position states are estimated using acoustic range measurements relative to the leader, in which some disadvantages such as weak observability, large initial error and contaminated measurements with outliers are inherent. By integrating both merits of iterated divided difference filtering (IDDF) and Huber's M-estimation methodology, the new filtering method could not only achieve more accurate estimation and faster convergence contrast to standard divided difference filtering (DDF) in conditions of weak observability and large initial error, but also exhibit robustness with respect to outlier measurements, for which the standard IDDF would exhibit severe degradation in estimation accuracy. The correctness as well as validity of the algorithm is demonstrated through experiment results.
Robust estimation of adaptive tensors of curvature by tensor voting.

PubMed

Tong, Wai-Shun; Tang, Chi-Keung

2005-03-01

Although curvature estimation from a given mesh or regularly sampled point set is a well-studied problem, it is still challenging when the input consists of a cloud of unstructured points corrupted by misalignment error and outlier noise. Such input is ubiquitous in computer vision. In this paper, we propose a three-pass tensor voting algorithm to robustly estimate curvature tensors, from which accurate principal curvatures and directions can be calculated. Our quantitative estimation is an improvement over the previous two-pass algorithm, where only qualitative curvature estimation (sign of Gaussian curvature) is performed. To overcome misalignment errors, our improved method automatically corrects input point locations at subvoxel precision, which also rejects outliers that are uncorrectable. To adapt to different scales locally, we define the RadiusHit of a curvature tensor to quantify estimation accuracy and applicability. Our curvature estimation algorithm has been proven with detailed quantitative experiments, performing better in a variety of standard error metrics (percentage error in curvature magnitudes, absolute angle difference in curvature direction) in the presence of a large amount of misalignment noise.
Groundspeed filtering for CTAS

NASA Technical Reports Server (NTRS)

Slater, Gary L.

1994-01-01

Ground speed is one of the radar observables which is obtained along with position and heading from NASA Ames Center radar. Within the Center TRACON Automation System (CTAS), groundspeed is converted into airspeed using the wind speeds which CTAS obtains from the NOAA weather grid. This airspeed is then used in the trajectory synthesis logic which computes the trajectory for each individual aircraft. The time history of the typical radar groundspeed data is generally quite noisy, with high frequency variations on the order of five knots, and occasional 'outliers' which can be significantly different from the probable true speed. To try to smooth out these speeds and make the ETA estimate less erratic, filtering of the ground speed is done within CTAS. In its base form, the CTAS filter is a 'moving average' filter which averages the last ten radar values. In addition, there is separate logic to detect and correct for 'outliers', and acceleration logic which limits the groundspeed change in adjacent time samples. As will be shown, these additional modifications do cause significant changes in the actual groundspeed filter output. The conclusion is that the current ground speed filter logic is unable to track accurately the speed variations observed on many aircraft. The Kalman filter logic however, appears to be an improvement to the current algorithm used to smooth ground speed variations, while being simpler and more efficient to implement. Additional logic which can test for true 'outliers' can easily be added by looking at the difference in the a priori and post priori Kalman estimates, and not updating if the difference in these quantities is too large.
Latent Space Tracking from Heterogeneous Data with an Application for Anomaly Detection

DTIC Science & Technology

2015-11-01

specific, if the anomaly behaves as a sudden outlier after which the data stream goes back to normal state, then the anomalous data point should be...introduced three types of anomalies , all of them are sudden outliers . 438 J. Huang and X. Ning Table 2. Synthetic dataset: AUC and parameters method...Latent Space Tracking from Heterogeneous Data with an Application for Anomaly Detection Jiaji Huang1(B) and Xia Ning2 1 Department of Electrical
Optimum outlier model for potential improvement of environmental cleaning and disinfection.

PubMed

Rupp, Mark E; Huerta, Tomas; Cavalieri, R J; Lyden, Elizabeth; Van Schooneveld, Trevor; Carling, Philip; Smith, Philip W

2014-06-01

The effectiveness and efficiency of 17 housekeepers in terminal cleaning 292 hospital rooms was evaluated through adenosine triphosphate detection. A subgroup of housekeepers was identified who were significantly more effective and efficient than their coworkers. These optimum outliers may be used in performance improvement to optimize environmental cleaning.
Novel Hyperspectral Anomaly Detection Methods Based on Unsupervised Nearest Regularized Subspace

NASA Astrophysics Data System (ADS)

Hou, Z.; Chen, Y.; Tan, K.; Du, P.

2018-04-01

Anomaly detection has been of great interest in hyperspectral imagery analysis. Most conventional anomaly detectors merely take advantage of spectral and spatial information within neighboring pixels. In this paper, two methods of Unsupervised Nearest Regularized Subspace-based with Outlier Removal Anomaly Detector (UNRSORAD) and Local Summation UNRSORAD (LSUNRSORAD) are proposed, which are based on the concept that each pixel in background can be approximately represented by its spatial neighborhoods, while anomalies cannot. Using a dual window, an approximation of each testing pixel is a representation of surrounding data via a linear combination. The existence of outliers in the dual window will affect detection accuracy. Proposed detectors remove outlier pixels that are significantly different from majority of pixels. In order to make full use of various local spatial distributions information with the neighboring pixels of the pixels under test, we take the local summation dual-window sliding strategy. The residual image is constituted by subtracting the predicted background from the original hyperspectral imagery, and anomalies can be detected in the residual image. Experimental results show that the proposed methods have greatly improved the detection accuracy compared with other traditional detection method.
Avoidance of speckle noise in laser vibrometry by the use of kurtosis ratio: Application to mechanical fault diagnostics

NASA Astrophysics Data System (ADS)

Vass, J.; Šmíd, R.; Randall, R. B.; Sovka, P.; Cristalli, C.; Torcianti, B.

2008-04-01

This paper presents a statistical technique to enhance vibration signals measured by laser Doppler vibrometry (LDV). The method has been optimised for LDV signals measured on bearings of universal electric motors and applied to quality control of washing machines. Inherent problems of LDV are addressed, particularly the speckle noise occurring when rough surfaces are measured. The presence of speckle noise is detected using a new scalar indicator kurtosis ratio (KR), specifically designed to quantify the amount of random impulses generated by this noise. The KR is a ratio of the standard kurtosis and a robust estimate of kurtosis, thus indicating the outliers in the data. Since it is inefficient to reject the signals affected by the speckle noise, an algorithm for selecting an undistorted portion of a signal is proposed. The algorithm operates in the time domain and is thus fast and simple. The algorithm includes band-pass filtering and segmentation of the signal, as well as thresholding of the KR computed for each filtered signal segment. Algorithm parameters are discussed in detail and instructions for optimisation are provided. Experimental results demonstrate that speckle noise is effectively avoided in severely distorted signals, thus improving the signal-to-noise ratio (SNR) significantly. Typical faults are finally detected using squared envelope analysis. It is also shown that the KR of the band-pass filtered signal is related to the spectral kurtosis (SK).
DynPeak: An Algorithm for Pulse Detection and Frequency Analysis in Hormonal Time Series

PubMed Central

Vidal, Alexandre; Zhang, Qinghua; Médigue, Claire; Fabre, Stéphane; Clément, Frédérique

2012-01-01

The endocrine control of the reproductive function is often studied from the analysis of luteinizing hormone (LH) pulsatile secretion by the pituitary gland. Whereas measurements in the cavernous sinus cumulate anatomical and technical difficulties, LH levels can be easily assessed from jugular blood. However, plasma levels result from a convolution process due to clearance effects when LH enters the general circulation. Simultaneous measurements comparing LH levels in the cavernous sinus and jugular blood have revealed clear differences in the pulse shape, the amplitude and the baseline. Besides, experimental sampling occurs at a relatively low frequency (typically every 10 min) with respect to LH highest frequency release (one pulse per hour) and the resulting LH measurements are noised by both experimental and assay errors. As a result, the pattern of plasma LH may be not so clearly pulsatile. Yet, reliable information on the InterPulse Intervals (IPI) is a prerequisite to study precisely the steroid feedback exerted on the pituitary level. Hence, there is a real need for robust IPI detection algorithms. In this article, we present an algorithm for the monitoring of LH pulse frequency, basing ourselves both on the available endocrinological knowledge on LH pulse (shape and duration with respect to the frequency regime) and synthetic LH data generated by a simple model. We make use of synthetic data to make clear some basic notions underlying our algorithmic choices. We focus on explaining how the process of sampling affects drastically the original pattern of secretion, and especially the amplitude of the detectable pulses. We then describe the algorithm in details and perform it on different sets of both synthetic and experimental LH time series. We further comment on how to diagnose possible outliers from the series of IPIs which is the main output of the algorithm. PMID:22802933
Toward the detection of abnormal chest radiographs the way radiologists do it

NASA Astrophysics Data System (ADS)

Alzubaidi, Mohammad; Patel, Ameet; Panchanathan, Sethuraman; Black, John A., Jr.

2011-03-01

Computer Aided Detection (CADe) and Computer Aided Diagnosis (CADx) are relatively recent areas of research that attempt to employ feature extraction, pattern recognition, and machine learning algorithms to aid radiologists in detecting and diagnosing abnormalities in medical images. However, these computational methods are based on the assumption that there are distinct classes of abnormalities, and that each class has some distinguishing features that set it apart from other classes. However, abnormalities in chest radiographs tend to be very heterogeneous. The literature suggests that thoracic (chest) radiologists develop their ability to detect abnormalities by developing a sense of what is normal, so that anything that is abnormal attracts their attention. This paper discusses an approach to CADe that is based on a technique called anomaly detection (which aims to detect outliers in data sets) for the purpose of detecting atypical regions in chest radiographs. However, in order to apply anomaly detection to chest radiographs, it is necessary to develop a basis for extracting features from corresponding anatomical locations in different chest radiographs. This paper proposes a method for doing this, and describes how it can be used to support CADe.
Probabilistic n/γ discrimination with robustness against outliers for use in neutron profile monitors

NASA Astrophysics Data System (ADS)

Uchida, Y.; Takada, E.; Fujisaki, A.; Kikuchi, T.; Ogawa, K.; Isobe, M.

2017-08-01

A method to stochastically discriminate neutron and γ-ray signals measured with a stilbene organic scintillator is proposed. Each pulse signal was stochastically categorized into two groups: neutron and γ-ray. In previous work, the Expectation Maximization (EM) algorithm was used with the assumption that the measured data followed a Gaussian mixture distribution. It was shown that probabilistic discrimination between these groups is possible. Moreover, by setting the initial parameters for the Gaussian mixture distribution with a k-means algorithm, the possibility of automatic discrimination was demonstrated. In this study, the Student's t-mixture distribution was used as a probabilistic distribution with the EM algorithm to improve the robustness against the effect of outliers caused by pileup of the signals. To validate the proposed method, the figures of merit (FOMs) were compared for the EM algorithm assuming a t-mixture distribution and a Gaussian mixture distribution. The t-mixture distribution resulted in an improvement of the FOMs compared with the Gaussian mixture distribution. The proposed data processing technique is a promising tool not only for neutron and γ-ray discrimination in fusion experiments but also in other fields, for example, homeland security, cancer therapy with high energy particles, nuclear reactor decommissioning, pattern recognition, and so on.
Modeling Data Containing Outliers using ARIMA Additive Outlier (ARIMA-AO)

NASA Astrophysics Data System (ADS)

Saleh Ahmar, Ansari; Guritno, Suryo; Abdurakhman; Rahman, Abdul; Awi; Alimuddin; Minggi, Ilham; Arif Tiro, M.; Kasim Aidid, M.; Annas, Suwardi; Utami Sutiksno, Dian; Ahmar, Dewi S.; Ahmar, Kurniawan H.; Abqary Ahmar, A.; Zaki, Ahmad; Abdullah, Dahlan; Rahim, Robbi; Nurdiyanto, Heri; Hidayat, Rahmat; Napitupulu, Darmawan; Simarmata, Janner; Kurniasih, Nuning; Andretti Abdillah, Leon; Pranolo, Andri; Haviluddin; Albra, Wahyudin; Arifin, A. Nurani M.

2018-01-01

The aim this study is discussed on the detection and correction of data containing the additive outlier (AO) on the model ARIMA (p, d, q). The process of detection and correction of data using an iterative procedure popularized by Box, Jenkins, and Reinsel (1994). By using this method we obtained an ARIMA models were fit to the data containing AO, this model is added to the original model of ARIMA coefficients obtained from the iteration process using regression methods. In the simulation data is obtained that the data contained AO initial models are ARIMA (2,0,0) with MSE = 36,780, after the detection and correction of data obtained by the iteration of the model ARIMA (2,0,0) with the coefficients obtained from the regression Zt = 0,106+0,204Z t-1+0,401Z t-2-329X 1(t)+115X 2(t)+35,9X 3(t) and MSE = 19,365. This shows that there is an improvement of forecasting error rate data.
Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

NASA Astrophysics Data System (ADS)

Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

2015-09-01

Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.
Open-Source Radiation Exposure Extraction Engine (RE3) with Patient-Specific Outlier Detection.

PubMed

Weisenthal, Samuel J; Folio, Les; Kovacs, William; Seff, Ari; Derderian, Vana; Summers, Ronald M; Yao, Jianhua

2016-08-01

We present an open-source, picture archiving and communication system (PACS)-integrated radiation exposure extraction engine (RE3) that provides study-, series-, and slice-specific data for automated monitoring of computed tomography (CT) radiation exposure. RE3 was built using open-source components and seamlessly integrates with the PACS. RE3 calculations of dose length product (DLP) from the Digital imaging and communications in medicine (DICOM) headers showed high agreement (R (2) = 0.99) with the vendor dose pages. For study-specific outlier detection, RE3 constructs robust, automatically updating multivariable regression models to predict DLP in the context of patient gender and age, scan length, water-equivalent diameter (D w), and scanned body volume (SBV). As proof of concept, the model was trained on 811 CT chest, abdomen + pelvis (CAP) exams and 29 outliers were detected. The continuous variables used in the outlier detection model were scan length (R (2) = 0.45), D w (R (2) = 0.70), SBV (R (2) = 0.80), and age (R (2) = 0.01). The categorical variables were gender (male average 1182.7 ± 26.3 and female 1047.1 ± 26.9 mGy cm) and pediatric status (pediatric average 710.7 ± 73.6 mGy cm and adult 1134.5 ± 19.3 mGy cm).
Clustering analysis of line indices for LAMOST spectra with AstroStat

NASA Astrophysics Data System (ADS)

Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

2018-06-01

The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.
Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method.

PubMed

Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

2016-01-25

To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.

Robust group-wise rigid registration of point sets using t-mixture model

NASA Astrophysics Data System (ADS)

Ravikumar, Nishant; Gooya, Ali; Frangi, Alejandro F.; Taylor, Zeike A.

2016-03-01

A probabilistic framework for robust, group-wise rigid alignment of point-sets using a mixture of Students t-distribution especially when the point sets are of varying lengths, are corrupted by an unknown degree of outliers or in the presence of missing data. Medical images (in particular magnetic resonance (MR) images), their segmentations and consequently point-sets generated from these are highly susceptible to corruption by outliers. This poses a problem for robust correspondence estimation and accurate alignment of shapes, necessary for training statistical shape models (SSMs). To address these issues, this study proposes to use a t-mixture model (TMM), to approximate the underlying joint probability density of a group of similar shapes and align them to a common reference frame. The heavy-tailed nature of t-distributions provides a more robust registration framework in comparison to state of the art algorithms. Significant reduction in alignment errors is achieved in the presence of outliers, using the proposed TMM-based group-wise rigid registration method, in comparison to its Gaussian mixture model (GMM) counterparts. The proposed TMM-framework is compared with a group-wise variant of the well-known Coherent Point Drift (CPD) algorithm and two other group-wise methods using GMMs, using both synthetic and real data sets. Rigid alignment errors for groups of shapes are quantified using the Hausdorff distance (HD) and quadratic surface distance (QSD) metrics.
Robust Gaussian Graphical Modeling via l1 Penalization

PubMed Central

Sun, Hokeun; Li, Hongzhe

2012-01-01

Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified-likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re-estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso. PMID:23020775
Anomaly Detection Based on Sensor Data in Petroleum Industry Applications

PubMed Central

Martí, Luis; Sanchez-Pi, Nayat; Molina, José Manuel; Garcia, Ana Cristina Bicharra

2015-01-01

Anomaly detection is the problem of finding patterns in data that do not conform to an a priori expected behavior. This is related to the problem in which some samples are distant, in terms of a given metric, from the rest of the dataset, where these anomalous samples are indicated as outliers. Anomaly detection has recently attracted the attention of the research community, because of its relevance in real-world applications, like intrusion detection, fraud detection, fault detection and system health monitoring, among many others. Anomalies themselves can have a positive or negative nature, depending on their context and interpretation. However, in either case, it is important for decision makers to be able to detect them in order to take appropriate actions. The petroleum industry is one of the application contexts where these problems are present. The correct detection of such types of unusual information empowers the decision maker with the capacity to act on the system in order to correctly avoid, correct or react to the situations associated with them. In that application context, heavy extraction machines for pumping and generation operations, like turbomachines, are intensively monitored by hundreds of sensors each that send measurements with a high frequency for damage prevention. In this paper, we propose a combination of yet another segmentation algorithm (YASA), a novel fast and high quality segmentation algorithm, with a one-class support vector machine approach for efficient anomaly detection in turbomachines. The proposal is meant for dealing with the aforementioned task and to cope with the lack of labeled training data. As a result, we perform a series of empirical studies comparing our approach to other methods applied to benchmark problems and a real-life application related to oil platform turbomachinery anomaly detection. PMID:25633599
A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

PubMed

Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun

2014-01-01

As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.
Example-based human motion denoising.

PubMed

Lou, Hui; Chai, Jinxiang

2010-01-01

With the proliferation of motion capture data, interest in removing noise and outliers from motion capture data has increased. In this paper, we introduce an efficient human motion denoising technique for the simultaneous removal of noise and outliers from input human motion data. The key idea of our approach is to learn a series of filter bases from precaptured motion data and use them along with robust statistics techniques to filter noisy motion data. Mathematically, we formulate the motion denoising process in a nonlinear optimization framework. The objective function measures the distance between the noisy input and the filtered motion in addition to how well the filtered motion preserves spatial-temporal patterns embedded in captured human motion data. Optimizing the objective function produces an optimal filtered motion that keeps spatial-temporal patterns in captured motion data. We also extend the algorithm to fill in the missing values in input motion data. We demonstrate the effectiveness of our system by experimenting with both real and simulated motion data. We also show the superior performance of our algorithm by comparing it with three baseline algorithms and to those in state-of-art motion capture data processing software such as Vicon Blade.
Applied automatic offset detection using HECTOR within EPOS-IP

NASA Astrophysics Data System (ADS)

Fernandes, R. M. S.; Bos, M. S.

2016-12-01

It is well known that offsets are present in most GNSS coordinate time series. These offsets need to be taken into account in the analysis to avoid incorrect estimation of the tectonic motions. The time of the offsets are normally determined by visual inspection of the time series but with the ever increasing amount of GNSS stations, this is becoming too time consuming and automatic offset detection algorithms are required. This is particularly true in projects like EPOS (European Plate Observing System), where the routinely analysis of thousands of daily time-series will be required. It is also planned to include stations installed for technical applications which metadata is also not always properly maintained. In this research we present an offset detection scheme that uses the Bayesian Information Criterion (BIC) to determine the most likely time of an offset. The novelty of this scheme is that it takes the temporal correlation of the noise into account. This aspect is normally ignored due to the fact that it significantly increases the computation time. However, it needs to be taken into account to ensure that the estimated BIC value is correct. We were able to create a fast algorithm by adopting the methodology implemented in HECTOR (Bos et al., 2013). We evaluate the feasibility of the approach using the core IGS network, where most of the offsets have been accurately determined, which permit to have an external evaluation of this new outlier detection approach to be included in HECTOR. We also apply the scheme to regional networks in Iberia where such offsets are often not documented properly in order to compare the normal manual approach with the new automatic approach. Finally, we also compare the optimal approach used by HECTOR with other algorithms such as MIDAS and STARS.
Accuracy of GIPSY PPP from version 6.2: a robust method to remove outliers

NASA Astrophysics Data System (ADS)

Hayal, Adem G.; Ugur Sanli, D.

2014-05-01

In this paper, we figure out the accuracy of GIPSY PPP from the latest version, version 6.2. As the research community prepares for the real-time PPP, it would be interesting to revise the accuracy of static GPS from the latest version of well established research software, the first among its kinds. Although the results do not significantly differ from the previous version, version 6.1.1, we still observe the slight improvement on the vertical component due to an enhanced second order ionospheric modeling which came out with the latest version. However, in this study, we rather turned our attention into outlier detection. Outliers usually occur among the solutions from shorter observation sessions and degrade the quality of the accuracy modeling. In our previous analysis from version 6.1.1, we argued that the elimination of outliers was cumbersome with the traditional method since repeated trials were needed, and subjectivity that could affect the statistical significance of the solutions might have been existed among the results (Hayal and Sanli, 2013). Here we overcome this problem using a robust outlier elimination method. Median is perhaps the simplest of the robust outlier detection methods in terms of applicability. At the same time, it might be considered to be the most efficient one with its highest breakdown point. In our analysis, we used a slightly different version of the median as introduced in Tut et al. 2013. Hence, we were able to remove suspected outliers at one run; which were, with the traditional methods, more problematic to remove this time from the solutions produced using the latest version of the software. References Hayal, AG, Sanli DU, Accuracy of GIPSY PPP from version 6, GNSS Precise Point Positioning Workshop: Reaching Full Potential, Vol. 1, pp. 41-42, (2013) Tut,İ., Sanli D.U., Erdogan B., Hekimoglu S., Efficiency of BERNESE single baseline rapid static positioning solutions with SEARCH strategy, Survey Review, Vol. 45, Issue 331, pp.296-304, (2013)
Orthogonal sparse linear discriminant analysis

NASA Astrophysics Data System (ADS)

Liu, Zhonghua; Liu, Gang; Pu, Jiexin; Wang, Xiaohong; Wang, Haijun

2018-03-01

Linear discriminant analysis (LDA) is a linear feature extraction approach, and it has received much attention. On the basis of LDA, researchers have done a lot of research work on it, and many variant versions of LDA were proposed. However, the inherent problem of LDA cannot be solved very well by the variant methods. The major disadvantages of the classical LDA are as follows. First, it is sensitive to outliers and noises. Second, only the global discriminant structure is preserved, while the local discriminant information is ignored. In this paper, we present a new orthogonal sparse linear discriminant analysis (OSLDA) algorithm. The k nearest neighbour graph is first constructed to preserve the locality discriminant information of sample points. Then, L2,1-norm constraint on the projection matrix is used to act as loss function, which can make the proposed method robust to outliers in data points. Extensive experiments have been performed on several standard public image databases, and the experiment results demonstrate the performance of the proposed OSLDA algorithm.
Performance metrics for the assessment of satellite data products: an ocean color case study

PubMed Central

Seegers, Bridget N.; Stumpf, Richard P.; Schaeffer, Blake A.; Loftin, Keith A.; Werdell, P. Jeremy

2018-01-01

Performance assessment of ocean color satellite data has generally relied on statistical metrics chosen for their common usage and the rationale for selecting certain metrics is infrequently explained. Commonly reported statistics based on mean squared errors, such as the coefficient of determination (r2), root mean square error, and regression slopes, are most appropriate for Gaussian distributions without outliers and, therefore, are often not ideal for ocean color algorithm performance assessment, which is often limited by sample availability. In contrast, metrics based on simple deviations, such as bias and mean absolute error, as well as pair-wise comparisons, often provide more robust and straightforward quantities for evaluating ocean color algorithms with non-Gaussian distributions and outliers. This study uses a SeaWiFS chlorophyll-a validation data set to demonstrate a framework for satellite data product assessment and recommends a multi-metric and user-dependent approach that can be applied within science, modeling, and resource management communities. PMID:29609296
Outlier Detection for Patient Monitoring and Alerting

PubMed Central

Hauskrecht, Milos; Batal, Iyad; Valko, Michal; Visweswaran, Shyam; Cooper, Gregory F.; Clermont, Gilles

2012-01-01

We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management decisions using past patient cases stored in electronic health records (EHRs). Our hypothesis is that a patient-management decision that is unusual with respect to past patient care may be due to an error and that it is worthwhile to generate an alert if such a decision is encountered. We evaluate this hypothesis using data obtained from EHRs of 4,486 post-cardiac surgical patients and a subset of 222 alerts generated from the data. We base the evaluation on the opinions of a panel of experts. The results of the study support our hypothesis that the outlier-based alerting can lead to promising true alert rates. We observed true alert rates that ranged from 25% to 66% for a variety of patient-management actions, with 66% corresponding to the strongest outliers. PMID:22944172
Application of a fast skyline computation algorithm for serendipitous searching problems

NASA Astrophysics Data System (ADS)

Koizumi, Kenichi; Hiraki, Kei; Inaba, Mary

2018-02-01

Skyline computation is a method of extracting interesting entries from a large population with multiple attributes. These entries, called skyline or Pareto optimal entries, are known to have extreme characteristics that cannot be found by outlier detection methods. Skyline computation is an important task for characterizing large amounts of data and selecting interesting entries with extreme features. When the population changes dynamically, the task of calculating a sequence of skyline sets is called continuous skyline computation. This task is known to be difficult to perform for the following reasons: (1) information of non-skyline entries must be stored since they may join the skyline in the future; (2) the appearance or disappearance of even a single entry can change the skyline drastically; (3) it is difficult to adopt a geometric acceleration algorithm for skyline computation tasks with high-dimensional datasets. Our new algorithm called jointed rooted-tree (JR-tree) manages entries using a rooted tree structure. JR-tree delays extend the tree to deep levels to accelerate tree construction and traversal. In this study, we presented the difficulties in extracting entries tagged with a rare label in high-dimensional space and the potential of fast skyline computation in low-latency cell identification technology.
Rumbling Orchids: How To Assess Divergent Evolution Between Chloroplast Endosymbionts and the Nuclear Host.

PubMed

Pérez-Escobar, Oscar Alejandro; Balbuena, Juan Antonio; Gottschling, Marc

2016-01-01

Phylogenetic relationships inferred from multilocus organellar and nuclear DNA data are often difficult to resolve because of evolutionary conflicts among gene trees. However, conflicting or "outlier" associations (i.e., linked pairs of "operational terminal units" in two phylogenies) among these data sets often provide valuable information on evolutionary processes such as chloroplast capture following hybridization, incomplete lineage sorting, and horizontal gene transfer. Statistical tools that to date have been used in cophylogenetic studies only also have the potential to test for the degree of topological congruence between organellar and nuclear data sets and reliably detect outlier associations. Two distance-based methods, namely ParaFit and Procrustean Approach to Cophylogeny (PACo), were used in conjunction to detect those outliers contributing to conflicting phylogenies independently derived from chloroplast and nuclear sequence data. We explored their efficiency of retrieving outlier associations, and the impact of input data (unit branch length and additive trees) between data sets, by using several simulation approaches. To test their performance using real data sets, we additionally inferred the phylogenetic relationships within Neotropical Catasetinae (Epidendroideae, Orchidaceae), which is a suitable group to investigate phylogenetic incongruence because of hybridization processes between some of its constituent species. A comparison between trees derived from chloroplast and nuclear sequence data reflected strong, well-supported incongruence within Catasetum, Cycnoches, and Mormodes. As a result, outliers among chloroplast and nuclear data sets, and in experimental simulations, were successfully detected by PACo when using patristic distance matrices obtained from phylograms, but not from unit branch length trees. The performance of ParaFit was overall inferior compared to PACo, using either phylograms or unit branch lengths as input data. Because workflows for applying cophylogenetic analyses are not standardized yet, we provide a pipeline for executing PACo and ParaFit as well as displaying outlier associations in plots and trees by using the software R. The pipeline renders a method to identify outliers with high reliability and to assess the combinability of the independently derived data sets by means of statistical analyses. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A new morphology algorithm for shoreline extraction from DEM data

NASA Astrophysics Data System (ADS)

Yousef, Amr H.; Iftekharuddin, Khan; Karim, Mohammad

2013-03-01

Digital elevation models (DEMs) are a digital representation of elevations at regularly spaced points. They provide an accurate tool to extract the shoreline profiles. One of the emerging sources of creating them is light detection and ranging (LiDAR) that can capture a highly dense cloud points with high resolution that can reach 15 cm and 100 cm in the vertical and horizontal directions respectively in short periods of time. In this paper we present a multi-step morphological algorithm to extract shorelines locations from the DEM data and a predefined tidal datum. Unlike similar approaches, it utilizes Lowess nonparametric regression to estimate the missing values within the DEM file. Also, it will detect and eliminate the outliers and errors that result from waves, ships, etc by means of anomality test with neighborhood constrains. Because, there might be some significant broken regions such as branches and islands, it utilizes a constrained morphological open and close to reduce these artifacts that can affect the extracted shorelines. In addition, it eliminates docks, bridges and fishing piers along the extracted shorelines by means of Hough transform. Based on a specific tidal datum, the algorithm will segment the DEM data into water and land objects. Without sacrificing the accuracy and the spatial details of the extracted boundaries, the algorithm should smooth and extract the shoreline profiles by tracing the boundary pixels between the land and the water segments. For given tidal values, we qualitatively assess the visual quality of the extracted shorelines by superimposing them on the available aerial photographs.
Comparison of outlier identification methods in hospital surgical quality improvement programs.

PubMed

Bilimoria, Karl Y; Cohen, Mark E; Merkow, Ryan P; Wang, Xue; Bentrem, David J; Ingraham, Angela M; Richards, Karen; Hall, Bruce L; Ko, Clifford Y

2010-10-01

Surgeons and hospitals are being increasingly assessed by third parties regarding surgical quality and outcomes, and much of this information is reported publicly. Our objective was to compare various methods used to classify hospitals as outliers in established surgical quality assessment programs by applying each approach to a single data set. Using American College of Surgeons National Surgical Quality Improvement Program data (7/2008-6/2009), hospital risk-adjusted 30-day morbidity and mortality were assessed for general surgery at 231 hospitals (cases = 217,630) and for colorectal surgery at 109 hospitals (cases = 17,251). The number of outliers (poor performers) identified using different methods and criteria were compared. The overall morbidity was 10.3% for general surgery and 25.3% for colorectal surgery. The mortality was 1.6% for general surgery and 4.0% for colorectal surgery. Programs used different methods (logistic regression, hierarchical modeling, partitioning) and criteria (P < 0.01, P < 0.05, P < 0.10) to identify outliers. Depending on outlier identification methods and criteria employed, when each approach was applied to this single dataset, the number of outliers ranged from 7 to 57 hospitals for general surgery morbidity, 1 to 57 hospitals for general surgery mortality, 4 to 27 hospitals for colorectal morbidity, and 0 to 27 hospitals for colorectal mortality. There was considerable variation in the number of outliers identified using different detection approaches. Quality programs seem to be utilizing outlier identification methods contrary to what might be expected, thus they should justify their methodology based on the intent of the program (i.e., quality improvement vs. reimbursement). Surgeons and hospitals should be aware of variability in methods used to assess their performance as these outlier designations will likely have referral and reimbursement consequences.
Patient classification as an outlier detection problem: An application of the One-Class Support Vector Machine

PubMed Central

Mourão-Miranda, Janaina; Hardoon, David R.; Hahn, Tim; Marquand, Andre F.; Williams, Steve C.R.; Shawe-Taylor, John; Brammer, Michael

2011-01-01

Pattern recognition approaches, such as the Support Vector Machine (SVM), have been successfully used to classify groups of individuals based on their patterns of brain activity or structure. However these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In the present work we propose an application of the one-class SVM (OC-SVM) to investigate if patterns of fMRI response to sad facial expressions in depressed patients would be classified as outliers in relation to patterns of healthy control subjects. We defined features based on whole brain voxels and anatomical regions. In both cases we found a significant correlation between the OC-SVM predictions and the patients' Hamilton Rating Scale for Depression (HRSD), i.e. the more depressed the patients were the more of an outlier they were. In addition the OC-SVM split the patient groups into two subgroups whose membership was associated with future response to treatment. When applied to region-based features the OC-SVM classified 52% of patients as outliers. However among the patients classified as outliers 70% did not respond to treatment and among those classified as non-outliers 89% responded to treatment. In addition 89% of the healthy controls were classified as non-outliers. PMID:21723950
Outlier Analysis Defines Zinc Finger Gene Family DNA Methylation in Tumors and Saliva of Head and Neck Cancer Patients.

PubMed

Gaykalova, Daria A; Vatapalli, Rajita; Wei, Yingying; Tsai, Hua-Ling; Wang, Hao; Zhang, Chi; Hennessey, Patrick T; Guo, Theresa; Tan, Marietta; Li, Ryan; Ahn, Julie; Khan, Zubair; Westra, William H; Bishop, Justin A; Zaboli, David; Koch, Wayne M; Khan, Tanbir; Ochs, Michael F; Califano, Joseph A

2015-01-01

Head and Neck Squamous Cell Carcinoma (HNSCC) is the fifth most common cancer, annually affecting over half a million people worldwide. Presently, there are no accepted biomarkers for clinical detection and surveillance of HNSCC. In this work, a comprehensive genome-wide analysis of epigenetic alterations in primary HNSCC tumors was employed in conjunction with cancer-specific outlier statistics to define novel biomarker genes which are differentially methylated in HNSCC. The 37 identified biomarker candidates were top-scoring outlier genes with prominent differential methylation in tumors, but with no signal in normal tissues. These putative candidates were validated in independent HNSCC cohorts from our institution and TCGA (The Cancer Genome Atlas). Using the top candidates, ZNF14, ZNF160, and ZNF420, an assay was developed for detection of HNSCC cancer in primary tissue and saliva samples with 100% specificity when compared to normal control samples. Given the high detection specificity, the analysis of ZNF DNA methylation in combination with other DNA methylation biomarkers may be useful in the clinical setting for HNSCC detection and surveillance, particularly in high-risk patients. Several additional candidates identified through this work can be further investigated toward future development of a multi-gene panel of biomarkers for the surveillance and detection of HNSCC.
Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method

PubMed Central

Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

2016-01-01

To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026
Locally adaptive decision in detection of clustered microcalcifications in mammograms.

PubMed

Sainz de Cea, María V; Nishikawa, Robert M; Yang, Yongyi

2018-02-15

In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value <10 -4 ). There was also a reduction in case-to-case variability in detected FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.
Locally adaptive decision in detection of clustered microcalcifications in mammograms

NASA Astrophysics Data System (ADS)

Sainz de Cea, María V.; Nishikawa, Robert M.; Yang, Yongyi

2018-02-01

In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value <10-4). There was also a reduction in case-to-case variability in detected FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.
Short-term change detection for UAV video

NASA Astrophysics Data System (ADS)

Saur, Günter; Krüger, Wolfgang

2012-11-01

In the last years, there has been an increased use of unmanned aerial vehicles (UAV) for video reconnaissance and surveillance. An important application in this context is change detection in UAV video data. Here we address short-term change detection, in which the time between observations ranges from several minutes to a few hours. We distinguish this task from video motion detection (shorter time scale) and from long-term change detection, based on time series of still images taken between several days, weeks, or even years. Examples for relevant changes we are looking for are recently parked or moved vehicles. As a pre-requisite, a precise image-to-image registration is needed. Images are selected on the basis of the geo-coordinates of the sensor's footprint and with respect to a certain minimal overlap. The automatic imagebased fine-registration adjusts the image pair to a common geometry by using a robust matching approach to handle outliers. The change detection algorithm has to distinguish between relevant and non-relevant changes. Examples for non-relevant changes are stereo disparity at 3D structures of the scene, changed length of shadows, and compression or transmission artifacts. To detect changes in image pairs we analyzed image differencing, local image correlation, and a transformation-based approach (multivariate alteration detection). As input we used color and gradient magnitude images. To cope with local misalignment of image structures we extended the approaches by a local neighborhood search. The algorithms are applied to several examples covering both urban and rural scenes. The local neighborhood search in combination with intensity and gradient magnitude differencing clearly improved the results. Extended image differencing performed better than both the correlation based approach and the multivariate alternation detection. The algorithms are adapted to be used in semi-automatic workflows for the ABUL video exploitation system of Fraunhofer IOSB, see Heinze et. al. 2010.1 In a further step we plan to incorporate more information from the video sequences to the change detection input images, e.g., by image enhancement or by along-track stereo which are available in the ABUL system.

Locating Structural Centers: A Density-Based Clustering Method for Community Detection

PubMed Central

Liu, Gongshen; Li, Jianhua; Nees, Jan P.

2017-01-01

Uncovering underlying community structures in complex networks has received considerable attention because of its importance in understanding structural attributes and group characteristics of networks. The algorithmic identification of such structures is a significant challenge. Local expanding methods have proven to be efficient and effective in community detection, but most methods are sensitive to initial seeds and built-in parameters. In this paper, we present a local expansion method by density-based clustering, which aims to uncover the intrinsic network communities by locating the structural centers of communities based on a proposed structural centrality. The structural centrality takes into account local density of nodes and relative distance between nodes. The proposed algorithm expands a community from the structural center to the border with a single local search procedure. The local expanding procedure follows a heuristic strategy as allowing it to find complete community structures. Moreover, it can identify different node roles (cores and outliers) in communities by defining a border region. The experiments involve both on real-world and artificial networks, and give a comparison view to evaluate the proposed method. The result of these experiments shows that the proposed method performs more efficiently with a comparative clustering performance than current state of the art methods. PMID:28046030
Study of Track Irregularity Time Series Calibration and Variation Pattern at Unit Section

PubMed Central

Jia, Chaolong; Wei, Lili; Wang, Hanning; Yang, Jiulin

2014-01-01

Focusing on problems existing in track irregularity time series data quality, this paper first presents abnormal data identification, data offset correction algorithm, local outlier data identification, and noise cancellation algorithms. And then proposes track irregularity time series decomposition and reconstruction through the wavelet decomposition and reconstruction approach. Finally, the patterns and features of track irregularity standard deviation data sequence in unit sections are studied, and the changing trend of track irregularity time series is discovered and described. PMID:25435869
Sources of Artefacts in Synthetic Aperture Radar Interferometry Data Sets

NASA Astrophysics Data System (ADS)

Becek, K.; Borkowski, A.

2012-07-01

In recent years, much attention has been devoted to digital elevation models (DEMs) produced using Synthetic Aperture Radar Interferometry (InSAR). This has been triggered by the relative novelty of the InSAR method and its world-famous product—the Shuttle Radar Topography Mission (SRTM) DEM. However, much less attention, if at all, has been paid to sources of artefacts in SRTM. In this work, we focus not on the missing pixels (null pixels) due to shadows or the layover effect, but rather on outliers that were undetected by the SRTM validation process. The aim of this study is to identify some of the causes of the elevation outliers in SRTM. Such knowledge may be helpful to mitigate similar problems in future InSAR DEMs, notably the ones currently being developed from data acquired by the TanDEM-X mission. We analysed many cross-sections derived from SRTM. These cross-sections were extracted over the elevation test areas, which are available from the Global Elevation Data Testing Facility (GEDTF) whose database contains about 8,500 runways with known vertical profiles. Whenever a significant discrepancy between the known runway profile and the SRTM cross-section was detected, a visual interpretation of the high-resolution satellite image was carried out to identify the objects causing the irregularities. A distance and a bearing from the outlier to the object were recorded. Moreover, we considered the SRTM look direction parameter. A comprehensive analysis of the acquired data allows us to establish that large metallic structures, such as hangars or car parking lots, are causing the outliers. Water areas or plain wet terrains may also cause an InSAR outlier. The look direction and the depression angle of the InSAR system in relation to the suspected objects influence the magnitude of the outliers. We hope that these findings will be helpful in designing the error detection routines of future InSAR or, in fact, any microwave aerial- or space-based survey. The presence of outliers in SRTM was first reported in Becek, K. (2008). Investigating error structure of shuttle radar topography mission elevation data product, Geophys. Res. Lett., 35, L15403.
Automated novelty detection in the WISE survey with one-class support vector machines

NASA Astrophysics Data System (ADS)

Solarz, A.; Bilicki, M.; Gromadzki, M.; Pollo, A.; Durkalec, A.; Wypych, M.

2017-10-01

Wide-angle photometric surveys of previously uncharted sky areas or wavelength regimes will always bring in unexpected sources - novelties or even anomalies - whose existence and properties cannot be easily predicted from earlier observations. Such objects can be efficiently located with novelty detection algorithms. Here we present an application of such a method, called one-class support vector machines (OCSVM), to search for anomalous patterns among sources preselected from the mid-infrared AllWISE catalogue covering the whole sky. To create a model of expected data we train the algorithm on a set of objects with spectroscopic identifications from the SDSS DR13 database, present also in AllWISE. The OCSVM method detects as anomalous those sources whose patterns - WISE photometric measurements in this case - are inconsistent with the model. Among the detected anomalies we find artefacts, such as objects with spurious photometry due to blending, but more importantly also real sources of genuine astrophysical interest. Among the latter, OCSVM has identified a sample of heavily reddened AGN/quasar candidates distributed uniformly over the sky and in a large part absent from other WISE-based AGN catalogues. It also allowed us to find a specific group of sources of mixed types, mostly stars and compact galaxies. By combining the semi-supervised OCSVM algorithm with standard classification methods it will be possible to improve the latter by accounting for sources which are not present in the training sample, but are otherwise well-represented in the target set. Anomaly detection adds flexibility to automated source separation procedures and helps verify the reliability and representativeness of the training samples. It should be thus considered as an essential step in supervised classification schemes to ensure completeness and purity of produced catalogues. The catalogues of outlier data are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/606/A39
A closed-form solution to tensor voting: theory and applications.

PubMed

Wu, Tai-Pang; Yeung, Sai-Kit; Jia, Jiaya; Tang, Chi-Keung; Medioni, Gérard

2012-08-01

We prove a closed-form solution to tensor voting (CFTV): Given a point set in any dimensions, our closed-form solution provides an exact, continuous, and efficient algorithm for computing a structure-aware tensor that simultaneously achieves salient structure detection and outlier attenuation. Using CFTV, we prove the convergence of tensor voting on a Markov random field (MRF), thus termed as MRFTV, where the structure-aware tensor at each input site reaches a stationary state upon convergence in structure propagation. We then embed structure-aware tensor into expectation maximization (EM) for optimizing a single linear structure to achieve efficient and robust parameter estimation. Specifically, our EMTV algorithm optimizes both the tensor and fitting parameters and does not require random sampling consensus typically used in existing robust statistical techniques. We performed quantitative evaluation on its accuracy and robustness, showing that EMTV performs better than the original TV and other state-of-the-art techniques in fundamental matrix estimation for multiview stereo matching. The extensions of CFTV and EMTV for extracting multiple and nonlinear structures are underway.
Conformal Prediction Based on K-Nearest Neighbors for Discrimination of Ginsengs by a Home-Made Electronic Nose

PubMed Central

Sun, Xiyang; Miao, Jiacheng; Wang, You; Luo, Zhiyuan; Li, Guang

2017-01-01

An estimate on the reliability of prediction in the applications of electronic nose is essential, which has not been paid enough attention. An algorithm framework called conformal prediction is introduced in this work for discriminating different kinds of ginsengs with a home-made electronic nose instrument. Nonconformity measure based on k-nearest neighbors (KNN) is implemented separately as underlying algorithm of conformal prediction. In offline mode, the conformal predictor achieves a classification rate of 84.44% based on 1NN and 80.63% based on 3NN, which is better than that of simple KNN. In addition, it provides an estimate of reliability for each prediction. In online mode, the validity of predictions is guaranteed, which means that the error rate of region predictions never exceeds the significance level set by a user. The potential of this framework for detecting borderline examples and outliers in the application of E-nose is also investigated. The result shows that conformal prediction is a promising framework for the application of electronic nose to make predictions with reliability and validity. PMID:28805721
Autocorrelation Analysis Combined with a Wavelet Transform Method to Detect and Remove Cosmic Rays in a Single Raman Spectrum.

PubMed

Maury, Augusto; Revilla, Reynier I

2015-08-01

Cosmic rays (CRs) occasionally affect charge-coupled device (CCD) detectors, introducing large spikes with very narrow bandwidth in the spectrum. These CR features can distort the chemical information expressed by the spectra. Consequently, we propose here an algorithm to identify and remove significant spikes in a single Raman spectrum. An autocorrelation analysis is first carried out to accentuate the CRs feature as outliers. Subsequently, with an adequate selection of the threshold, a discrete wavelet transform filter is used to identify CR spikes. Identified data points are then replaced by interpolated values using the weighted-average interpolation technique. This approach only modifies the data in a close vicinity of the CRs. Additionally, robust wavelet transform parameters are proposed (a desirable property for automation) after optimizing them with the application of the method in a great number of spectra. However, this algorithm, as well as all the single-spectrum analysis procedures, is limited to the cases in which CRs have much narrower bandwidth than the Raman bands. This might not be the case when low-resolution Raman instruments are used.
Analysis and detection of functional outliers in water quality parameters from different automated monitoring stations in the Nalón river basin (Northern Spain).

PubMed

Piñeiro Di Blasi, J I; Martínez Torres, J; García Nieto, P J; Alonso Fernández, J R; Díaz Muñiz, C; Taboada, J

2015-01-01

The purposes and intent of the authorities in establishing water quality standards are to provide enhancement of water quality and prevention of pollution to protect the public health or welfare in accordance with the public interest for drinking water supplies, conservation of fish, wildlife and other beneficial aquatic life, and agricultural, industrial, recreational, and other reasonable and necessary uses as well as to maintain and improve the biological integrity of the waters. In this way, water quality controls involve a large number of variables and observations, often subject to some outliers. An outlier is an observation that is numerically distant from the rest of the data or that appears to deviate markedly from other members of the sample in which it occurs. An interesting analysis is to find those observations that produce measurements that are different from the pattern established in the sample. Therefore, identification of atypical observations is an important concern in water quality monitoring and a difficult task because of the multivariate nature of water quality data. Our study provides a new method for detecting outliers in water quality monitoring parameters, using turbidity, conductivity and ammonium ion as indicator variables. Until now, methods were based on considering the different parameters as a vector whose components were their concentration values. This innovative approach lies in considering water quality monitoring over time as continuous curves instead of discrete points, that is to say, the dataset of the problem are considered as a time-dependent function and not as a set of discrete values in different time instants. This new methodology, which is based on the concept of functional depth, was applied to the detection of outliers in water quality monitoring samples in the Nalón river basin with success. Results of this study were discussed here in terms of origin, causes, etc. Finally, the conclusions as well as advantages of the functional method are exposed.
Improved Cloud and Snow Screening in MAIAC Aerosol Retrievals Using Spectral and Spatial Analysis

NASA Technical Reports Server (NTRS)

Lyapustin, A.; Wang, Y.; Laszlo, I.; Kokrkin, S.

2012-01-01

An improved cloud/snow screening technique in the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm is described. It is implemented as part of MAIAC aerosol retrievals based on analysis of spectral residuals and spatial variability. Comparisons with AERONET aerosol observations and a large-scale MODIS data analysis show strong suppression of aerosol optical thickness outliers due to unresolved clouds and snow. At the same time, the developed filter does not reduce the aerosol retrieval capability at high 1 km resolution in strongly inhomogeneous environments, such as near centers of the active fires. Despite significant improvement, the optical depth outliers in high spatial resolution data are and will remain the problem to be addressed by the application-dependent specialized filtering techniques.
Geometry-based populated chessboard recognition

NASA Astrophysics Data System (ADS)

Xie, Youye; Tang, Gongguo; Hoff, William

2018-04-01

Chessboards are commonly used to calibrate cameras, and many robust methods have been developed to recognize the unoccupied boards. However, when the chessboard is populated with chess pieces, such as during an actual game, the problem of recognizing the board is much harder. Challenges include occlusion caused by the chess pieces, the presence of outlier lines and low viewing angles of the chessboard. In this paper, we present a novel approach to address the above challenges and recognize the chessboard. The Canny edge detector and Hough transform are used to capture all possible lines in the scene. The k-means clustering and a k-nearest-neighbors inspired algorithm are applied to cluster and reject the outlier lines based on their Euclidean distances to the nearest neighbors in a scaled Hough transform space. Finally, based on prior knowledge of the chessboard structure, a geometric constraint is used to find the correspondences between image lines and the lines on the chessboard through the homography transformation. The proposed algorithm works for a wide range of the operating angles and achieves high accuracy in experiments.
Nonlinear optimization-based device-free localization with outlier link rejection.

PubMed

Xiao, Wendong; Song, Biao; Yu, Xiting; Chen, Peiyuan

2015-04-07

Device-free localization (DFL) is an emerging wireless technique for estimating the location of target that does not have any attached electronic device. It has found extensive use in Smart City applications such as healthcare at home and hospitals, location-based services at smart spaces, city emergency response and infrastructure security. In DFL, wireless devices are used as sensors that can sense the target by transmitting and receiving wireless signals collaboratively. Many DFL systems are implemented based on received signal strength (RSS) measurements and the location of the target is estimated by detecting the changes of the RSS measurements of the wireless links. Due to the uncertainty of the wireless channel, certain links may be seriously polluted and result in erroneous detection. In this paper, we propose a novel nonlinear optimization approach with outlier link rejection (NOOLR) for RSS-based DFL. It consists of three key strategies, including: (1) affected link identification by differential RSS detection; (2) outlier link rejection via geometrical positional relationship among links; (3) target location estimation by formulating and solving a nonlinear optimization problem. Experimental results demonstrate that NOOLR is robust to the fluctuation of the wireless signals with superior localization accuracy compared with the existing Radio Tomographic Imaging (RTI) approach.
Efficient robust doubly adaptive regularized regression with applications.

PubMed

Karunamuni, Rohana J; Kong, Linglong; Tu, Wei

2018-01-01

We consider the problem of estimation and variable selection for general linear regression models. Regularized regression procedures have been widely used for variable selection, but most existing methods perform poorly in the presence of outliers. We construct a new penalized procedure that simultaneously attains full efficiency and maximum robustness. Furthermore, the proposed procedure satisfies the oracle properties. The new procedure is designed to achieve sparse and robust solutions by imposing adaptive weights on both the decision loss and the penalty function. The proposed method of estimation and variable selection attains full efficiency when the model is correct and, at the same time, achieves maximum robustness when outliers are present. We examine the robustness properties using the finite-sample breakdown point and an influence function. We show that the proposed estimator attains the maximum breakdown point. Furthermore, there is no loss in efficiency when there are no outliers or the error distribution is normal. For practical implementation of the proposed method, we present a computational algorithm. We examine the finite-sample and robustness properties using Monte Carlo studies. Two datasets are also analyzed.
Detecting short spatial scale local adaptation and epistatic selection in climate-related candidate genes in European beech (Fagus sylvatica) populations.

PubMed

Csilléry, Katalin; Lalagüe, Hadrien; Vendramin, Giovanni G; González-Martínez, Santiago C; Fady, Bruno; Oddou-Muratorio, Sylvie

2014-10-01

Detecting signatures of selection in tree populations threatened by climate change is currently a major research priority. Here, we investigated the signature of local adaptation over a short spatial scale using 96 European beech (Fagus sylvatica L.) individuals originating from two pairs of populations on the northern and southern slopes of Mont Ventoux (south-eastern France). We performed both single and multilocus analysis of selection based on 53 climate-related candidate genes containing 546 SNPs. FST outlier methods at the SNP level revealed a weak signal of selection, with three marginally significant outliers in the northern populations. At the gene level, considering haplotypes as alleles, two additional marginally significant outliers were detected, one on each slope. To account for the uncertainty of haplotype inference, we averaged the Bayes factors over many possible phase reconstructions. Epistatic selection offers a realistic multilocus model of selection in natural populations. Here, we used a test suggested by Ohta based on the decomposition of the variance of linkage disequilibrium. Overall populations, 0.23% of the SNP pairs (haplotypes) showed evidence of epistatic selection, with nearly 80% of them being within genes. One of the between gene epistatic selection signals arose between an FST outlier and a nonsynonymous mutation in a drought response gene. Additionally, we identified haplotypes containing selectively advantageous allele combinations which were unique to high or low elevations and northern or southern populations. Several haplotypes contained nonsynonymous mutations situated in genes with known functional importance for adaptation to climatic factors. © 2014 John Wiley & Sons Ltd.
An analytic approach to the relation between GPS attitude determination accuracy and antenna configuration geometry

NASA Astrophysics Data System (ADS)

Kozlov, Alexander; Nikulin, Alexei

2017-01-01

The reliability and accuracy of GPS attitude determination are still the main relevant theoretical questions in this particular field of study. While the first one derives from the probabilistic nature of phase ambiguity resolution algorithms, outlier measurement detection and effectiveness of multipath reduction, the second is additionally affected by geometric properties of the GNSS antenna configuration. Being trivial in two-antenna system, the relation between GPS attitude determination accuracy and antenna spatial layout becomes much less intuitive for multi-antenna configurations, and seems to have been examined analytically in some specific cases only. For example, most of research papers in the field use Euler angles as attitude representation, which have singularity in some cases, and consider the number of antennas of not more than four. We present some further investigation in this area.
Shadow Areas Robust Matching Among Image Sequence in Planetary Landing

NASA Astrophysics Data System (ADS)

Ruoyan, Wei; Xiaogang, Ruan; Naigong, Yu; Xiaoqing, Zhu; Jia, Lin

2017-01-01

In this paper, an approach for robust matching shadow areas in autonomous visual navigation and planetary landing is proposed. The approach begins with detecting shadow areas, which are extracted by Maximally Stable Extremal Regions (MSER). Then, an affine normalization algorithm is applied to normalize the areas. Thirdly, a descriptor called Multiple Angles-SIFT (MA-SIFT) that coming from SIFT is proposed, the descriptor can extract more features of an area. Finally, for eliminating the influence of outliers, a method of improved RANSAC based on Skinner Operation Condition is proposed to extract inliers. At last, series of experiments are conducted to test the performance of the approach this paper proposed, the results show that the approach can maintain the matching accuracy at a high level even the differences among the images are obvious with no attitude measurements supplied.
Outlier analyses to test for local adaptation to breeding grounds in a migratory arctic seabird.

PubMed

Tigano, Anna; Shultz, Allison J; Edwards, Scott V; Robertson, Gregory J; Friesen, Vicki L

2017-04-01

Investigating the extent (or the existence) of local adaptation is crucial to understanding how populations adapt. When experiments or fitness measurements are difficult or impossible to perform in natural populations, genomic techniques allow us to investigate local adaptation through the comparison of allele frequencies and outlier loci along environmental clines. The thick-billed murre ( Uria lomvia ) is a highly philopatric colonial arctic seabird that occupies a significant environmental gradient, shows marked phenotypic differences among colonies, and has large effective population sizes. To test whether thick-billed murres from five colonies along the eastern Canadian Arctic coast show genomic signatures of local adaptation to their breeding grounds, we analyzed geographic variation in genome-wide markers mapped to a newly assembled thick-billed murre reference genome. We used outlier analyses to detect loci putatively under selection, and clustering analyses to investigate patterns of differentiation based on 2220 genomewide single nucleotide polymorphisms (SNPs) and 137 outlier SNPs. We found no evidence of population structure among colonies using all loci but found population structure based on outliers only, where birds from the two northernmost colonies (Minarets and Prince Leopold) grouped with birds from the southernmost colony (Gannet), and birds from Coats and Akpatok were distinct from all other colonies. Although results from our analyses did not support local adaptation along the latitudinal cline of breeding colonies, outlier loci grouped birds from different colonies according to their non-breeding distributions, suggesting that outliers may be informative about adaptation and/or demographic connectivity associated with their migration patterns or nonbreeding grounds.
Clustering for Binary Data Sets by Using Genetic Algorithm-Incremental K-means

NASA Astrophysics Data System (ADS)

Saharan, S.; Baragona, R.; Nor, M. E.; Salleh, R. M.; Asrah, N. M.

2018-04-01

This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental K-means (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers.
HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations

PubMed Central

2017-01-01

The detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60–95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript. PMID:28423003
An approach to the analysis of SDSS spectroscopic outliers based on self-organizing maps. Designing the outlier analysis software package for the next Gaia survey

NASA Astrophysics Data System (ADS)

Fustes, D.; Manteiga, M.; Dafonte, C.; Arcay, B.; Ulla, A.; Smith, K.; Borrachero, R.; Sordo, R.

2013-11-01

Aims: A new method applied to the segmentation and further analysis of the outliers resulting from the classification of astronomical objects in large databases is discussed. The method is being used in the framework of the Gaia satellite Data Processing and Analysis Consortium (DPAC) activities to prepare automated software tools that will be used to derive basic astrophysical information that is to be included in final Gaia archive. Methods: Our algorithm has been tested by means of simulated Gaia spectrophotometry, which is based on SDSS observations and theoretical spectral libraries covering a wide sample of astronomical objects. Self-organizing maps networks are used to organize the information in clusters of objects, as homogeneously as possible according to their spectral energy distributions, and to project them onto a 2D grid where the data structure can be visualized. Results: We demonstrate the usefulness of the method by analyzing the spectra that were rejected by the SDSS spectroscopic classification pipeline and thus classified as "UNKNOWN". First, our method can help distinguish between astrophysical objects and instrumental artifacts. Additionally, the application of our algorithm to SDSS objects of unknown nature has allowed us to identify classes of objects with similar astrophysical natures. In addition, the method allows for the potential discovery of hundreds of new objects, such as white dwarfs and quasars. Therefore, the proposed method is shown to be very promising for data exploration and knowledge discovery in very large astronomical databases, such as the archive from the upcoming Gaia mission.
Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers

PubMed Central

Jackson, Monica C; Huang, Lan; Luo, Jun; Hachey, Mark; Feuer, Eric

2009-01-01

Background The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated. Methods We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*pop; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. Results For simulated data with outlier patterns, Tango's MEET, Moran's I and I*pop had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*pop (with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. Conclusion SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*pop perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan. PMID:19822013

Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers.

PubMed

Jackson, Monica C; Huang, Lan; Luo, Jun; Hachey, Mark; Feuer, Eric

2009-10-12

The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated. We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*(pop); and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. For simulated data with outlier patterns, Tango's MEET, Moran's I and I*(pop) had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*(pop) (with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*(pop) perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan.
Robust w-Estimators for Cryo-EM Class Means

PubMed Central

Huang, Chenxi; Tagare, Hemant D.

2016-01-01

A critical step in cryogenic electron microscopy (cryo-EM) image analysis is to calculate the average of all images aligned to a projection direction. This average, called the “class mean”, improves the signal-to-noise ratio in single particle reconstruction (SPR). The averaging step is often compromised because of outlier images of ice, contaminants, and particle fragments. Outlier detection and rejection in the majority of current cryo-EM methods is done using cross-correlation with a manually determined threshold. Empirical assessment shows that the performance of these methods is very sensitive to the threshold. This paper proposes an alternative: a “w-estimator” of the average image, which is robust to outliers and which does not use a threshold. Various properties of the estimator, such as consistency and influence function are investigated. An extension of the estimator to images with different contrast transfer functions (CTFs) is also provided. Experiments with simulated and real cryo-EM images show that the proposed estimator performs quite well in the presence of outliers. PMID:26841397
Robust w-Estimators for Cryo-EM Class Means.

PubMed

Huang, Chenxi; Tagare, Hemant D

2016-02-01

A critical step in cryogenic electron microscopy (cryo-EM) image analysis is to calculate the average of all images aligned to a projection direction. This average, called the class mean, improves the signal-to-noise ratio in single-particle reconstruction. The averaging step is often compromised because of the outlier images of ice, contaminants, and particle fragments. Outlier detection and rejection in the majority of current cryo-EM methods are done using cross-correlation with a manually determined threshold. Empirical assessment shows that the performance of these methods is very sensitive to the threshold. This paper proposes an alternative: a w-estimator of the average image, which is robust to outliers and which does not use a threshold. Various properties of the estimator, such as consistency and influence function are investigated. An extension of the estimator to images with different contrast transfer functions is also provided. Experiments with simulated and real cryo-EM images show that the proposed estimator performs quite well in the presence of outliers.
Robust Surface Reconstruction via Laplace-Beltrami Eigen-Projection and Boundary Deformation

PubMed Central

Shi, Yonggang; Lai, Rongjie; Morra, Jonathan H.; Dinov, Ivo; Thompson, Paul M.; Toga, Arthur W.

2010-01-01

In medical shape analysis, a critical problem is reconstructing a smooth surface of correct topology from a binary mask that typically has spurious features due to segmentation artifacts. The challenge is the robust removal of these outliers without affecting the accuracy of other parts of the boundary. In this paper, we propose a novel approach for this problem based on the Laplace-Beltrami (LB) eigen-projection and properly designed boundary deformations. Using the metric distortion during the LB eigen-projection, our method automatically detects the location of outliers and feeds this information to a well-composed and topology-preserving deformation. By iterating between these two steps of outlier detection and boundary deformation, we can robustly filter out the outliers without moving the smooth part of the boundary. The final surface is the eigen-projection of the filtered mask boundary that has the correct topology, desired accuracy and smoothness. In our experiments, we illustrate the robustness of our method on different input masks of the same structure, and compare with the popular SPHARM tool and the topology preserving level set method to show that our method can reconstruct accurate surface representations without introducing artificial oscillations. We also successfully validate our method on a large data set of more than 900 hippocampal masks and demonstrate that the reconstructed surfaces retain volume information accurately. PMID:20624704
How immunogenetically different are domestic pigs from wild boars: a perspective from single-nucleotide polymorphisms of 19 immunity-related candidate genes.

PubMed

Chen, Shanyuan; Gomes, Rui; Costa, Vânia; Santos, Pedro; Charneca, Rui; Zhang, Ya-ping; Liu, Xue-hong; Wang, Shao-qing; Bento, Pedro; Nunes, Jose-Luis; Buzgó, József; Varga, Gyula; Anton, István; Zsolnai, Attila; Beja-Pereira, Albano

2013-10-01

The coexistence of wild boars and domestic pigs across Eurasia makes it feasible to conduct comparative genetic or genomic analyses for addressing how genetically different a domestic species is from its wild ancestor. To test whether there are differences in patterns of genetic variability between wild and domestic pigs at immunity-related genes and to detect outlier loci putatively under selection that may underlie differences in immune responses, here we analyzed 54 single-nucleotide polymorphisms (SNPs) of 19 immunity-related candidate genes on 11 autosomes in three pairs of wild boar and domestic pig populations from China, Iberian Peninsula, and Hungary. Our results showed no statistically significant differences in allele frequency and heterozygosity across SNPs between three pairs of wild and domestic populations. This observation was more likely due to the widespread and long-lasting gene flow between wild boars and domestic pigs across Eurasia. In addition, we detected eight coding SNPs from six genes as outliers being under selection consistently by three outlier tests (BayeScan2.1, FDIST2, and Arlequin3.5). Among four non-synonymous outlier SNPs, one from TLR4 gene was identified as being subject to positive (diversifying) selection and three each from CD36, IFNW1, and IL1B genes were suggested as under balancing selection. All of these four non-synonymous variants were predicted as being benign by PolyPhen-2. Our results were supported by other independent lines of evidence for positive selection or balancing selection acting on these four immune genes (CD36, IFNW1, IL1B, and TLR4). Our study showed an example applying a candidate gene approach to identify functionally important mutations (i.e., outlier loci) in wild and domestic pigs for subsequent functional experiments.
Aberrant Gene Expression in Humans

PubMed Central

Yang, Ence; Ji, Guoli; Brinkmeyer-Langford, Candice L.; Cai, James J.

2015-01-01

Gene expression as an intermediate molecular phenotype has been a focus of research interest. In particular, studies of expression quantitative trait loci (eQTL) have offered promise for understanding gene regulation through the discovery of genetic variants that explain variation in gene expression levels. Existing eQTL methods are designed for assessing the effects of common variants, but not rare variants. Here, we address the problem by establishing a novel analytical framework for evaluating the effects of rare or private variants on gene expression. Our method starts from the identification of outlier individuals that show markedly different gene expression from the majority of a population, and then reveals the contributions of private SNPs to the aberrant gene expression in these outliers. Using population-scale mRNA sequencing data, we identify outlier individuals using a multivariate approach. We find that outlier individuals are more readily detected with respect to gene sets that include genes involved in cellular regulation and signal transduction, and less likely to be detected with respect to the gene sets with genes involved in metabolic pathways and other fundamental molecular functions. Analysis of polymorphic data suggests that private SNPs of outlier individuals are enriched in the enhancer and promoter regions of corresponding aberrantly-expressed genes, suggesting a specific regulatory role of private SNPs, while the commonly-occurring regulatory genetic variants (i.e., eQTL SNPs) show little evidence of involvement. Additional data suggest that non-genetic factors may also underlie aberrant gene expression. Taken together, our findings advance a novel viewpoint relevant to situations wherein common eQTLs fail to predict gene expression when heritable, rare inter-individual variation exists. The analytical framework we describe, taking into consideration the reality of differential phenotypic robustness, may be valuable for investigating complex traits and conditions. PMID:25617623
A fast automatic target detection method for detecting ships in infrared scenes

NASA Astrophysics Data System (ADS)

Özertem, Kemal Arda

2016-05-01

Automatic target detection in infrared scenes is a vital task for many application areas like defense, security and border surveillance. For anti-ship missiles, having a fast and robust ship detection algorithm is crucial for overall system performance. In this paper, a straight-forward yet effective ship detection method for infrared scenes is introduced. First, morphological grayscale reconstruction is applied to the input image, followed by an automatic thresholding onto the suppressed image. For the segmentation step, connected component analysis is employed to obtain target candidate regions. At this point, it can be realized that the detection is defenseless to outliers like small objects with relatively high intensity values or the clouds. To deal with this drawback, a post-processing stage is introduced. For the post-processing stage, two different methods are used. First, noisy detection results are rejected with respect to target size. Second, the waterline is detected by using Hough transform and the detection results that are located above the waterline with a small margin are rejected. After post-processing stage, there are still undesired holes remaining, which cause to detect one object as multi objects or not to detect an object as a whole. To improve the detection performance, another automatic thresholding is implemented only to target candidate regions. Finally, two detection results are fused and post-processing stage is repeated to obtain final detection result. The performance of overall methodology is tested with real world infrared test data.
An Integrated Ransac and Graph Based Mismatch Elimination Approach for Wide-Baseline Image Matching

NASA Astrophysics Data System (ADS)

Hasheminasab, M.; Ebadi, H.; Sedaghat, A.

2015-12-01

In this paper we propose an integrated approach in order to increase the precision of feature point matching. Many different algorithms have been developed as to optimizing the short-baseline image matching while because of illumination differences and viewpoints changes, wide-baseline image matching is so difficult to handle. Fortunately, the recent developments in the automatic extraction of local invariant features make wide-baseline image matching possible. The matching algorithms which are based on local feature similarity principle, using feature descriptor as to establish correspondence between feature point sets. To date, the most remarkable descriptor is the scale-invariant feature transform (SIFT) descriptor , which is invariant to image rotation and scale, and it remains robust across a substantial range of affine distortion, presence of noise, and changes in illumination. The epipolar constraint based on RANSAC (random sample consensus) method is a conventional model for mismatch elimination, particularly in computer vision. Because only the distance from the epipolar line is considered, there are a few false matches in the selected matching results based on epipolar geometry and RANSAC. Aguilariu et al. proposed Graph Transformation Matching (GTM) algorithm to remove outliers which has some difficulties when the mismatched points surrounded by the same local neighbor structure. In this study to overcome these limitations, which mentioned above, a new three step matching scheme is presented where the SIFT algorithm is used to obtain initial corresponding point sets. In the second step, in order to reduce the outliers, RANSAC algorithm is applied. Finally, to remove the remained mismatches, based on the adjacent K-NN graph, the GTM is implemented. Four different close range image datasets with changes in viewpoint are utilized to evaluate the performance of the proposed method and the experimental results indicate its robustness and capability.
Real-time Raman spectroscopy for in vivo, online gastric cancer diagnosis during clinical endoscopic examination.

PubMed

Duraipandian, Shiyamala; Sylvest Bergholt, Mads; Zheng, Wei; Yu Ho, Khek; Teh, Ming; Guan Yeoh, Khay; Bok Yan So, Jimmy; Shabbir, Asim; Huang, Zhiwei

2012-08-01

Optical spectroscopic techniques including reflectance, fluorescence and Raman spectroscopy have shown promising potential for in vivo precancer and cancer diagnostics in a variety of organs. However, data-analysis has mostly been limited to post-processing and off-line algorithm development. In this work, we develop a fully automated on-line Raman spectral diagnostics framework integrated with a multimodal image-guided Raman technique for real-time in vivo cancer detection at endoscopy. A total of 2748 in vivo gastric tissue spectra (2465 normal and 283 cancer) were acquired from 305 patients recruited to construct a spectral database for diagnostic algorithms development. The novel diagnostic scheme developed implements on-line preprocessing, outlier detection based on principal component analysis statistics (i.e., Hotelling's T2 and Q-residuals) for tissue Raman spectra verification as well as for organ specific probabilistic diagnostics using different diagnostic algorithms. Free-running optical diagnosis and processing time of < 0.5 s can be achieved, which is critical to realizing real-time in vivo tissue diagnostics during clinical endoscopic examination. The optimized partial least squares-discriminant analysis (PLS-DA) models based on the randomly resampled training database (80% for learning and 20% for testing) provide the diagnostic accuracy of 85.6% [95% confidence interval (CI): 82.9% to 88.2%] [sensitivity of 80.5% (95% CI: 71.4% to 89.6%) and specificity of 86.2% (95% CI: 83.6% to 88.7%)] for the detection of gastric cancer. The PLS-DA algorithms are further applied prospectively on 10 gastric patients at gastroscopy, achieving the predictive accuracy of 80.0% (60/75) [sensitivity of 90.0% (27/30) and specificity of 73.3% (33/45)] for in vivo diagnosis of gastric cancer. The receiver operating characteristics curves further confirmed the efficacy of Raman endoscopy together with PLS-DA algorithms for in vivo prospective diagnosis of gastric cancer. This work successfully moves biomedical Raman spectroscopic technique into real-time, on-line clinical cancer diagnosis, especially in routine endoscopic diagnostic applications.
Real-time Raman spectroscopy for in vivo, online gastric cancer diagnosis during clinical endoscopic examination

NASA Astrophysics Data System (ADS)

Duraipandian, Shiyamala; Sylvest Bergholt, Mads; Zheng, Wei; Yu Ho, Khek; Teh, Ming; Guan Yeoh, Khay; Bok Yan So, Jimmy; Shabbir, Asim; Huang, Zhiwei

2012-08-01

Optical spectroscopic techniques including reflectance, fluorescence and Raman spectroscopy have shown promising potential for in vivo precancer and cancer diagnostics in a variety of organs. However, data-analysis has mostly been limited to post-processing and off-line algorithm development. In this work, we develop a fully automated on-line Raman spectral diagnostics framework integrated with a multimodal image-guided Raman technique for real-time in vivo cancer detection at endoscopy. A total of 2748 in vivo gastric tissue spectra (2465 normal and 283 cancer) were acquired from 305 patients recruited to construct a spectral database for diagnostic algorithms development. The novel diagnostic scheme developed implements on-line preprocessing, outlier detection based on principal component analysis statistics (i.e., Hotelling's T2 and Q-residuals) for tissue Raman spectra verification as well as for organ specific probabilistic diagnostics using different diagnostic algorithms. Free-running optical diagnosis and processing time of < 0.5 s can be achieved, which is critical to realizing real-time in vivo tissue diagnostics during clinical endoscopic examination. The optimized partial least squares-discriminant analysis (PLS-DA) models based on the randomly resampled training database (80% for learning and 20% for testing) provide the diagnostic accuracy of 85.6% [95% confidence interval (CI): 82.9% to 88.2%] [sensitivity of 80.5% (95% CI: 71.4% to 89.6%) and specificity of 86.2% (95% CI: 83.6% to 88.7%)] for the detection of gastric cancer. The PLS-DA algorithms are further applied prospectively on 10 gastric patients at gastroscopy, achieving the predictive accuracy of 80.0% (60/75) [sensitivity of 90.0% (27/30) and specificity of 73.3% (33/45)] for in vivo diagnosis of gastric cancer. The receiver operating characteristics curves further confirmed the efficacy of Raman endoscopy together with PLS-DA algorithms for in vivo prospective diagnosis of gastric cancer. This work successfully moves biomedical Raman spectroscopic technique into real-time, on-line clinical cancer diagnosis, especially in routine endoscopic diagnostic applications.
A New Adaptive H-Infinity Filtering Algorithm for the GPS/INS Integrated Navigation

PubMed Central

Jiang, Chen; Zhang, Shu-Bi; Zhang, Qiu-Zhao

2016-01-01

The Kalman filter is an optimal estimator with numerous applications in technology, especially in systems with Gaussian distributed noise. Moreover, the adaptive Kalman filtering algorithms, based on the Kalman filter, can control the influence of dynamic model errors. In contrast to the adaptive Kalman filtering algorithms, the H-infinity filter is able to address the interference of the stochastic model by minimization of the worst-case estimation error. In this paper, a novel adaptive H-infinity filtering algorithm, which integrates the adaptive Kalman filter and the H-infinity filter in order to perform a comprehensive filtering algorithm, is presented. In the proposed algorithm, a robust estimation method is employed to control the influence of outliers. In order to verify the proposed algorithm, experiments with real data of the Global Positioning System (GPS) and Inertial Navigation System (INS) integrated navigation, were conducted. The experimental results have shown that the proposed algorithm has multiple advantages compared to the other filtering algorithms. PMID:27999361
A New Adaptive H-Infinity Filtering Algorithm for the GPS/INS Integrated Navigation.

PubMed

Jiang, Chen; Zhang, Shu-Bi; Zhang, Qiu-Zhao

2016-12-19

The Kalman filter is an optimal estimator with numerous applications in technology, especially in systems with Gaussian distributed noise. Moreover, the adaptive Kalman filtering algorithms, based on the Kalman filter, can control the influence of dynamic model errors. In contrast to the adaptive Kalman filtering algorithms, the H-infinity filter is able to address the interference of the stochastic model by minimization of the worst-case estimation error. In this paper, a novel adaptive H-infinity filtering algorithm, which integrates the adaptive Kalman filter and the H-infinity filter in order to perform a comprehensive filtering algorithm, is presented. In the proposed algorithm, a robust estimation method is employed to control the influence of outliers. In order to verify the proposed algorithm, experiments with real data of the Global Positioning System (GPS) and Inertial Navigation System (INS) integrated navigation, were conducted. The experimental results have shown that the proposed algorithm has multiple advantages compared to the other filtering algorithms.
An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations

NASA Astrophysics Data System (ADS)

Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza

2018-03-01

In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.
Active impulsive noise control using maximum correntropy with adaptive kernel size

NASA Astrophysics Data System (ADS)

Lu, Lu; Zhao, Haiquan

2017-03-01

The active noise control (ANC) based on the principle of superposition is an attractive method to attenuate the noise signals. However, the impulsive noise in the ANC systems will degrade the performance of the controller. In this paper, a filtered-x recursive maximum correntropy (FxRMC) algorithm is proposed based on the maximum correntropy criterion (MCC) to reduce the effect of outliers. The proposed FxRMC algorithm does not requires any priori information of the noise characteristics and outperforms the filtered-x least mean square (FxLMS) algorithm for impulsive noise. Meanwhile, in order to adjust the kernel size of FxRMC algorithm online, a recursive approach is proposed through taking into account the past estimates of error signals over a sliding window. Simulation and experimental results in the context of active impulsive noise control demonstrate that the proposed algorithms achieve much better performance than the existing algorithms in various noise environments.
Robust image modeling techniques with an image restoration application

NASA Astrophysics Data System (ADS)

Kashyap, Rangasami L.; Eom, Kie-Bum

1988-08-01

A robust parameter-estimation algorithm for a nonsymmetric half-plane (NSHP) autoregressive model, where the driving noise is a mixture of a Gaussian and an outlier process, is presented. The convergence of the estimation algorithm is proved. An algorithm to estimate parameters and original image intensity simultaneously from the impulse-noise-corrupted image, where the model governing the image is not available, is also presented. The robustness of the parameter estimates is demonstrated by simulation. Finally, an algorithm to restore realistic images is presented. The entire image generally does not obey a simple image model, but a small portion (e.g., 8 x 8) of the image is assumed to obey an NSHP model. The original image is divided into windows and the robust estimation algorithm is applied for each window. The restoration algorithm is tested by comparing it to traditional methods on several different images.
A curvature-based weighted fuzzy c-means algorithm for point clouds de-noising

NASA Astrophysics Data System (ADS)

Cui, Xin; Li, Shipeng; Yan, Xiutian; He, Xinhua

2018-04-01

In order to remove the noise of three-dimensional scattered point cloud and smooth the data without damnify the sharp geometric feature simultaneity, a novel algorithm is proposed in this paper. The feature-preserving weight is added to fuzzy c-means algorithm which invented a curvature weighted fuzzy c-means clustering algorithm. Firstly, the large-scale outliers are removed by the statistics of r radius neighboring points. Then, the algorithm estimates the curvature of the point cloud data by using conicoid parabolic fitting method and calculates the curvature feature value. Finally, the proposed clustering algorithm is adapted to calculate the weighted cluster centers. The cluster centers are regarded as the new points. The experimental results show that this approach is efficient to different scale and intensities of noise in point cloud with a high precision, and perform a feature-preserving nature at the same time. Also it is robust enough to different noise model.
MODVOLC2: A Hybrid Time Series Analysis for Detecting Thermal Anomalies Applied to Thermal Infrared Satellite Data

NASA Astrophysics Data System (ADS)

Koeppen, W. C.; Wright, R.; Pilger, E.

2009-12-01

We developed and tested a new, automated algorithm, MODVOLC2, which analyzes thermal infrared satellite time series data to detect and quantify the excess energy radiated from thermal anomalies such as active volcanoes, fires, and gas flares. MODVOLC2 combines two previously developed algorithms, a simple point operation algorithm (MODVOLC) and a more complex time series analysis (Robust AVHRR Techniques, or RAT) to overcome the limitations of using each approach alone. MODVOLC2 has four main steps: (1) it uses the original MODVOLC algorithm to process the satellite data on a pixel-by-pixel basis and remove thermal outliers, (2) it uses the remaining data to calculate reference and variability images for each calendar month, (3) it compares the original satellite data and any newly acquired data to the reference images normalized by their variability, and it detects pixels that fall outside the envelope of normal thermal behavior, (4) it adds any pixels detected by MODVOLC to those detected in the time series analysis. Using test sites at Anatahan and Kilauea volcanoes, we show that MODVOLC2 was able to detect ~15% more thermal anomalies than using MODVOLC alone, with very few, if any, known false detections. Using gas flares from the Cantarell oil field in the Gulf of Mexico, we show that MODVOLC2 provided results that were unattainable using a time series-only approach. Some thermal anomalies (e.g., Cantarell oil field flares) are so persistent that an additional, semi-automated 12-µm correction must be applied in order to correctly estimate both the number of anomalies and the total excess radiance being emitted by them. Although all available data should be included to make the best possible reference and variability images necessary for the MODVOLC2, we estimate that at least 80 images per calendar month are required to generate relatively good statistics from which to run MODVOLC2, a condition now globally met by a decade of MODIS observations. We also found that MODVOLC2 achieved good results on multiple sensors (MODIS and GOES), which provides confidence that MODVOLC2 can be run on future instruments regardless of their spatial and temporal resolutions. The improved performance of MODVOLC2 over MODVOLC makes possible the detection of lower temperature thermal anomalies that will be useful in improving our ability to document Earth’s volcanic eruptions as well as detect possible low temperature thermal precursors to larger eruptions.
Remote Sensing of Lake Ice Phenology in Alaska

NASA Astrophysics Data System (ADS)

Zhang, S.; Pavelsky, T.

2017-12-01

Lake ice phenology (e.g. ice break-up and freeze-up timing) in Alaska is potentially sensitive to climate change. However, there are few current lake ice records in this region, which hinders the comprehensive understanding of interactions between climate change and lake processes. To provide a lake ice database with over a comparatively long time period (2000 - 2017) and large spatial coverage (4000+ lakes) in Alaska, we have developed an algorithm to detect the timing of lake ice using Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data. This approach generally consists of three major steps. First, we use a cloud mask (MOD09GA) to filter out satellite images with heavy cloud contamination. Second, daily MODIS reflectance values (MOD09GQ) of lake surface are used to extract ice pixels from water pixels. The ice status of lakes can be further identified based on the fraction of ice pixels. Third, to improve the accuracy of ice phenology detection, we execute post-processing quality control to reduce false ice events caused by outliers. We validate the proposed algorithm over six lakes by comparing with Landsat-based reference data. Validation results indicate a high correlation between the MODIS results and reference data, with normalized root mean square error (NRMSE) ranging from 1.7% to 4.6%. The time series of this lake ice product is then examined to analyze the spatial and temporal patterns of lake ice phenology.
Risk adjustment in the American College of Surgeons National Surgical Quality Improvement Program: a comparison of logistic versus hierarchical modeling.

PubMed

Cohen, Mark E; Dimick, Justin B; Bilimoria, Karl Y; Ko, Clifford Y; Richards, Karen; Hall, Bruce Lee

2009-12-01

Although logistic regression has commonly been used to adjust for risk differences in patient and case mix to permit quality comparisons across hospitals, hierarchical modeling has been advocated as the preferred methodology, because it accounts for clustering of patients within hospitals. It is unclear whether hierarchical models would yield important differences in quality assessments compared with logistic models when applied to American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) data. Our objective was to evaluate differences in logistic versus hierarchical modeling for identifying hospitals with outlying outcomes in the ACS-NSQIP. Data from ACS-NSQIP patients who underwent colorectal operations in 2008 at hospitals that reported at least 100 operations were used to generate logistic and hierarchical prediction models for 30-day morbidity and mortality. Differences in risk-adjusted performance (ratio of observed-to-expected events) and outlier detections from the two models were compared. Logistic and hierarchical models identified the same 25 hospitals as morbidity outliers (14 low and 11 high outliers), but the hierarchical model identified 2 additional high outliers. Both models identified the same eight hospitals as mortality outliers (five low and three high outliers). The values of observed-to-expected events ratios and p values from the two models were highly correlated. Results were similar when data were permitted from hospitals providing < 100 patients. When applied to ACS-NSQIP data, logistic and hierarchical models provided nearly identical results with respect to identification of hospitals' observed-to-expected events ratio outliers. As hierarchical models are prone to implementation problems, logistic regression will remain an accurate and efficient method for performing risk adjustment of hospital quality comparisons.
Use of Mahalanobis Distance for Detecting Outliers and Outlier Clusters in Markedly Non-Normal Data: A Vehicular Traffic Example

DTIC Science & Technology

2011-06-01

usually walking on the right of on-coming people, and cars discouraged from passing on the right of a car traveling in the same direction. “Usually...forces a loss of detail due to horizontal compression: Valleys or troughs are squeezed into oblivion . To enable valleys to be seen, Figures 20 and 21...Volume. Left Panel: North- bound Traffic. Right Panel: Southbound Traffic. Northbound and Southbound Volume Ranges are Different 5.5 Fractional

Genomic Changes Associated with Reproductive and Migratory Ecotypes in Sockeye Salmon (Oncorhynchus nerka)

PubMed Central

Veale, Andrew J.

2017-01-01

Mechanisms underlying adaptive evolution can best be explored using paired populations displaying similar phenotypic divergence, illuminating the genomic changes associated with specific life history traits. Here, we used paired migratory [anadromous vs. resident (kokanee)] and reproductive [shore- vs. stream-spawning] ecotypes of sockeye salmon (Oncorhynchus nerka) sampled from seven lakes and two rivers spanning three catchments (Columbia, Fraser, and Skeena) in British Columbia, Canada to investigate the patterns and processes underlying their divergence. Restriction-site associated DNA sequencing was used to genotype this sampling at 7,347 single nucleotide polymorphisms, 334 of which were identified as outlier loci and candidates for divergent selection within at least one ecotype comparison. Sixty-eight of these outliers were present in two or more comparisons, with 33 detected across multiple catchments. Of particular note, one locus was detected as the most significant outlier between shore and stream-spawning ecotypes in multiple comparisons and across catchments (Columbia, Fraser, and Snake). We also detected several genomic islands of divergence, some shared among comparisons, potentially showing linked signals of differential selection. The single nucleotide polymorphisms and genomic regions identified in our study offer a range of mechanistic hypotheses associated with the genetic basis of O. nerka life history variation and provide novel tools for informing fisheries management. PMID:29045601
Model diagnostics in reduced-rank estimation

PubMed Central

Chen, Kun

2016-01-01

Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding. Under Stein's unbiased risk estimation framework, we propose a set of tools, including leverage score and generalized information score, to perform model diagnostics and outlier detection in large-scale reduced-rank estimation. The leverage scores give an exact decomposition of the so-called model degrees of freedom to the observation level, which lead to exact decomposition of many commonly-used information criteria; the resulting quantities are thus named information scores of the observations. The proposed information score approach provides a principled way of combining the residuals and leverage scores for anomaly detection. Simulation studies confirm that the proposed diagnostic tools work well. A pattern recognition example with hand-writing digital images and a time series analysis example with monthly U.S. macroeconomic data further demonstrate the efficacy of the proposed approaches. PMID:28003860
Model diagnostics in reduced-rank estimation.

PubMed

Chen, Kun

2016-01-01

Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding. Under Stein's unbiased risk estimation framework, we propose a set of tools, including leverage score and generalized information score, to perform model diagnostics and outlier detection in large-scale reduced-rank estimation. The leverage scores give an exact decomposition of the so-called model degrees of freedom to the observation level, which lead to exact decomposition of many commonly-used information criteria; the resulting quantities are thus named information scores of the observations. The proposed information score approach provides a principled way of combining the residuals and leverage scores for anomaly detection. Simulation studies confirm that the proposed diagnostic tools work well. A pattern recognition example with hand-writing digital images and a time series analysis example with monthly U.S. macroeconomic data further demonstrate the efficacy of the proposed approaches.
LOSITAN: a workbench to detect molecular adaptation based on a Fst-outlier method.

PubMed

Antao, Tiago; Lopes, Ana; Lopes, Ricardo J; Beja-Pereira, Albano; Luikart, Gordon

2008-07-28

Testing for selection is becoming one of the most important steps in the analysis of multilocus population genetics data sets. Existing applications are difficult to use, leaving many non-trivial, error-prone tasks to the user. Here we present LOSITAN, a selection detection workbench based on a well evaluated Fst-outlier detection method. LOSITAN greatly facilitates correct approximation of model parameters (e.g., genome-wide average, neutral Fst), provides data import and export functions, iterative contour smoothing and generation of graphics in a easy to use graphical user interface. LOSITAN is able to use modern multi-core processor architectures by locally parallelizing fdist, reducing computation time by half in current dual core machines and with almost linear performance gains in machines with more cores. LOSITAN makes selection detection feasible to a much wider range of users, even for large population genomic datasets, by both providing an easy to use interface and essential functionality to complete the whole selection detection process.
Evaluation of two outlier-detection-based methods for detecting tissue-selective genes from microarray data.

PubMed

Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro

2007-05-01

Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent's non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent's method is not suitable for ROKU.
A genome scan for selection signatures comparing farmed Atlantic salmon with two wild populations: Testing colocalization among outlier markers, candidate genes, and quantitative trait loci for production traits.

PubMed

Liu, Lei; Ang, Keng Pee; Elliott, J A K; Kent, Matthew Peter; Lien, Sigbjørn; MacDonald, Danielle; Boulding, Elizabeth Grace

2017-03-01

Comparative genome scans can be used to identify chromosome regions, but not traits, that are putatively under selection. Identification of targeted traits may be more likely in recently domesticated populations under strong artificial selection for increased production. We used a North American Atlantic salmon 6K SNP dataset to locate genome regions of an aquaculture strain (Saint John River) that were highly diverged from that of its putative wild founder population (Tobique River). First, admixed individuals with partial European ancestry were detected using STRUCTURE and removed from the dataset. Outlier loci were then identified as those showing extreme differentiation between the aquaculture population and the founder population. All Arlequin methods identified an overlapping subset of 17 outlier loci, three of which were also identified by BayeScan. Many outlier loci were near candidate genes and some were near published quantitative trait loci (QTLs) for growth, appetite, maturity, or disease resistance. Parallel comparisons using a wild, nonfounder population (Stewiacke River) yielded only one overlapping outlier locus as well as a known maturity QTL. We conclude that genome scans comparing a recently domesticated strain with its wild founder population can facilitate identification of candidate genes for traits known to have been under strong artificial selection.
Application of ant colony optimization in development of models for prediction of anti-HIV-1 activity of HEPT derivatives.

PubMed

Zare-Shahabadi, Vali; Abbasitabar, Fatemeh

2010-09-01

Quantitative structure-activity relationship models were derived for 107 analogs of 1-[(2-hydroxyethoxy) methyl]-6-(phenylthio)thymine, a potent inhibitor of the HIV-1 reverse transcriptase. The activities of these compounds were investigated by means of multiple linear regression (MLR) technique. An ant colony optimization algorithm, called Memorized_ACS, was applied for selecting relevant descriptors and detecting outliers. This algorithm uses an external memory based upon knowledge incorporation from previous iterations. At first, the memory is empty, and then it is filled by running several ACS algorithms. In this respect, after each ACS run, the elite ant is stored in the memory and the process is continued to fill the memory. Here, pheromone updating is performed by all elite ants collected in the memory; this results in improvements in both exploration and exploitation behaviors of the ACS algorithm. The memory is then made empty and is filled again by performing several ACS algorithms using updated pheromone trails. This process is repeated for several iterations. At the end, the memory contains several top solutions for the problem. Number of appearance of each descriptor in the external memory is a good criterion for its importance. Finally, prediction is performed by the elitist ant, and interpretation is carried out by considering the importance of each descriptor. The best MLR model has a training error of 0.47 log (1/EC(50)) units (R(2) = 0.90) and a prediction error of 0.76 log (1/EC(50)) units (R(2) = 0.88). Copyright 2010 Wiley Periodicals, Inc.
An improved initialization center k-means clustering algorithm based on distance and density

NASA Astrophysics Data System (ADS)

Duan, Yanling; Liu, Qun; Xia, Shuyin

2018-04-01

Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
An automatic editing algorithm for GPS data

NASA Technical Reports Server (NTRS)

Blewitt, Geoffrey

1990-01-01

An algorithm has been developed to edit automatically Global Positioning System data such that outlier deletion, cycle slip identification, and correction are independent of clock instability, selective availability, receiver-satellite kinematics, and tropospheric conditions. This algorithm, called TurboEdit, operates on undifferenced, dual frequency carrier phase data, and requires the use of P code pseudorange data and a smoothly varying ionospheric electron content. TurboEdit was tested on the large data set from the CASA Uno experiment, which contained over 2500 cycle slips.Analyst intervention was required on 1 percent of the station-satellite passes, almost all of these problems being due to difficulties in extrapolating variations in the ionospheric delay. The algorithm is presently being adapted for real time data editing in the Rogue receiver for continuous monitoring applications.
User Behavior Analytics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Turcotte, Melissa; Moore, Juston Shane

User Behaviour Analytics is the tracking, collecting and assessing of user data and activities. The goal is to detect misuse of user credentials by developing models for the normal behaviour of user credentials within a computer network and detect outliers with respect to their baseline.
"Contrasting patterns of selection at Pinus pinaster Ait. Drought stress candidate genes as revealed by genetic differentiation analyses".

PubMed

Eveno, Emmanuelle; Collada, Carmen; Guevara, M Angeles; Léger, Valérie; Soto, Alvaro; Díaz, Luis; Léger, Patrick; González-Martínez, Santiago C; Cervera, M Teresa; Plomion, Christophe; Garnier-Géré, Pauline H

2008-02-01

The importance of natural selection for shaping adaptive trait differentiation among natural populations of allogamous tree species has long been recognized. Determining the molecular basis of local adaptation remains largely unresolved, and the respective roles of selection and demography in shaping population structure are actively debated. Using a multilocus scan that aims to detect outliers from simulated neutral expectations, we analyzed patterns of nucleotide diversity and genetic differentiation at 11 polymorphic candidate genes for drought stress tolerance in phenotypically contrasted Pinus pinaster Ait. populations across its geographical range. We compared 3 coalescent-based methods: 2 frequentist-like, including 1 approach specifically developed for biallelic single nucleotide polymorphisms (SNPs) here and 1 Bayesian. Five genes showed outlier patterns that were robust across methods at the haplotype level for 2 of them. Two genes presented higher F(ST) values than expected (PR-AGP4 and erd3), suggesting that they could have been affected by the action of diversifying selection among populations. In contrast, 3 genes presented lower F(ST) values than expected (dhn-1, dhn2, and lp3-1), which could represent signatures of homogenizing selection among populations. A smaller proportion of outliers were detected at the SNP level suggesting the potential functional significance of particular combinations of sites in drought-response candidate genes. The Bayesian method appeared robust to low sample sizes, flexible to assumptions regarding migration rates, and powerful for detecting selection at the haplotype level, but the frequentist-like method adapted to SNPs was more efficient for the identification of outlier SNPs showing low differentiation. Population-specific effects estimated in the Bayesian method also revealed populations with lower immigration rates, which could have led to favorable situations for local adaptation. Outlier patterns are discussed in relation to the different genes' putative involvement in drought tolerance responses, from published results in transcriptomics and association mapping in P. pinaster and other related species. These genes clearly constitute relevant candidates for future association studies in P. pinaster.
Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets.

PubMed

Huang, Min-Wei; Lin, Wei-Chao; Tsai, Chih-Fong

2018-01-01

Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.
On the issues of probability distribution of GPS carrier phase observations

NASA Astrophysics Data System (ADS)

Luo, X.; Mayer, M.; Heck, B.

2009-04-01

In common practice the observables related to Global Positioning System (GPS) are assumed to follow a Gauss-Laplace normal distribution. Actually, full knowledge of the observables' distribution is not required for parameter estimation by means of the least-squares algorithm based on the functional relation between observations and unknown parameters as well as the associated variance-covariance matrix. However, the probability distribution of GPS observations plays a key role in procedures for quality control (e.g. outlier and cycle slips detection, ambiguity resolution) and in reliability-related assessments of the estimation results. Under non-ideal observation conditions with respect to the factors impacting GPS data quality, for example multipath effects and atmospheric delays, the validity of the normal distribution postulate of GPS observations is in doubt. This paper presents a detailed analysis of the distribution properties of GPS carrier phase observations using double difference residuals. For this purpose 1-Hz observation data from the permanent SAPOS
A random sampling approach for robust estimation of tissue-to-plasma ratio from extremely sparse data.

PubMed

Chu, Hui-May; Ette, Ene I

2005-09-02

his study was performed to develop a new nonparametric approach for the estimation of robust tissue-to-plasma ratio from extremely sparsely sampled paired data (ie, one sample each from plasma and tissue per subject). Tissue-to-plasma ratio was estimated from paired/unpaired experimental data using independent time points approach, area under the curve (AUC) values calculated with the naïve data averaging approach, and AUC values calculated using sampling based approaches (eg, the pseudoprofile-based bootstrap [PpbB] approach and the random sampling approach [our proposed approach]). The random sampling approach involves the use of a 2-phase algorithm. The convergence of the sampling/resampling approaches was investigated, as well as the robustness of the estimates produced by different approaches. To evaluate the latter, new data sets were generated by introducing outlier(s) into the real data set. One to 2 concentration values were inflated by 10% to 40% from their original values to produce the outliers. Tissue-to-plasma ratios computed using the independent time points approach varied between 0 and 50 across time points. The ratio obtained from AUC values acquired using the naive data averaging approach was not associated with any measure of uncertainty or variability. Calculating the ratio without regard to pairing yielded poorer estimates. The random sampling and pseudoprofile-based bootstrap approaches yielded tissue-to-plasma ratios with uncertainty and variability. However, the random sampling approach, because of the 2-phase nature of its algorithm, yielded more robust estimates and required fewer replications. Therefore, a 2-phase random sampling approach is proposed for the robust estimation of tissue-to-plasma ratio from extremely sparsely sampled data.
Outlier-resilient complexity analysis of heartbeat dynamics

NASA Astrophysics Data System (ADS)

Lo, Men-Tzung; Chang, Yi-Chung; Lin, Chen; Young, Hsu-Wen Vincent; Lin, Yen-Hung; Ho, Yi-Lwun; Peng, Chung-Kang; Hu, Kun

2015-03-01

Complexity in physiological outputs is believed to be a hallmark of healthy physiological control. How to accurately quantify the degree of complexity in physiological signals with outliers remains a major barrier for translating this novel concept of nonlinear dynamic theory to clinical practice. Here we propose a new approach to estimate the complexity in a signal by analyzing the irregularity of the sign time series of its coarse-grained time series at different time scales. Using surrogate data, we show that the method can reliably assess the complexity in noisy data while being highly resilient to outliers. We further apply this method to the analysis of human heartbeat recordings. Without removing any outliers due to ectopic beats, the method is able to detect a degradation of cardiac control in patients with congestive heart failure and a more degradation in critically ill patients whose life continuation relies on extracorporeal membrane oxygenator (ECMO). Moreover, the derived complexity measures can predict the mortality of ECMO patients. These results indicate that the proposed method may serve as a promising tool for monitoring cardiac function of patients in clinical settings.
Toward Automated Generation of Reservoir Water Elevation Changes From Satellite Radar Altimetry.

NASA Astrophysics Data System (ADS)

Okeowo, M. A.; Lee, H.; Hossain, F.

2015-12-01

Until now, processing satellite radar altimetry data over inland water bodies on a large scale has been a cumbersome task primarily due to contaminated measurements from their surrounding topography. It becomes more challenging if the size of the water body is small and thus the number of available high-rate measurements from the water surface is limited. A manual removal of outliers is time consuming which limits a global generation of reservoir elevation profiles. This has limited a global study of lakes and reservoir elevation profiles for monitoring storage changes and hydrologic modeling. We have proposed a new method to automatically generate a time-series information from raw satellite radar altimetry without user intervention. With this method, scientist with little knowledge of altimetry can now independently process radar altimetry for diverse purposes. The method is based on K-means clustering, backscatter coefficient and statistical analysis of the dataset for outlier detection. The result of this method will be validated using in-situ gauges from US, Indus and Bangladesh reservoirs. In addition, a sensitivity analysis will be done to ascertain the limitations of this algorithm based on the surrounding topography, and the length of altimetry track overlap with the lake/reservoir. Finally, a reservoir storage change will be estimated on the study sites using MODIS and Landsat water classification for estimating the area of reservoir and the height will be estimated using Jason-2 and SARAL/Altika satellites.
Unbalance detection in rotor systems with active bearings using self-sensing piezoelectric actuators

NASA Astrophysics Data System (ADS)

Ambur, Ramakrishnan; Rinderknecht, Stephan

2018-03-01

Machines which are developed today are highly automated due to increased use of mechatronic systems. To ensure their reliable operation, fault detection and isolation (FDI) is an important feature along with a better control. This research work aims to achieve and integrate both these functions with minimum number of components in a mechatronic system. This article investigates a rotating machine with active bearings equipped with piezoelectric actuators. There is an inherent coupling between their electrical and mechanical properties because of which they can also be used as sensors. Mechanical deflection can be reconstructed from these self-sensing actuators from measured voltage and current signals. These virtual sensor signals are utilised to detect unbalance in a rotor system. Parameters of unbalance such as its magnitude and phase are detected by parametric estimation method in frequency domain. Unbalance location has been identified using hypothesis of localization of faults. Robustness of the estimates against outliers in measurements is improved using weighted least squares method. Unbalances are detected in a real test bench apart from simulation using its model. Experiments are performed in stationary as well as in transient case. As a further step unbalances are estimated during simultaneous actuation of actuators in closed loop with an adaptive algorithm for vibration minimisation. This strategy could be used in systems which aim for both fault detection and control action.
Addressing the issue of insufficient information in data-based bridge health monitoring : final report.

DOT National Transportation Integrated Search

2015-11-01

One of the most efficient ways to solve the damage detection problem using the statistical pattern recognition : approach is that of exploiting the methods of outlier analysis. Cast within the pattern recognition framework, : damage detection assesse...
Detecting New Pedestrian Facilities from VGI Data Sources

NASA Astrophysics Data System (ADS)

Zhong, S.; Xie, Z.

2017-12-01

Pedestrian facility (e.g. footbridge, pedestrian crossing and underground passage) information is an important basic data of location based service (LBS) for pedestrians. However, timely updating pedestrian facility information challenges due to facilities change frequently. Previous pedestrian facility information collecting and updating tasks are mainly completed by highly trained specialized persons. However, this conventional approach has several disadvantages such as high cost, long update cycle and so on. Volunteered Geographic Information (VGI) has proven efficiency to provide new, free and fast growing spatial data. Pedestrian trajectory, which can be seen as measurements of real pedestrian road, is one of the most valuable information of VGI data. Although the accuracy of the trajectories is not too high, due to the large number of measurements, an improvement of quality of the road information can be achieved. Thus, we develop a method for detecting new pedestrian facilities based on the current road network and pedestrian trajectories. Specifically, 1) by analyzing speed, distance and direction, those outliers of pedestrian trajectories are removed, 2) a road network matching algorithm is developed for eliminating redundant trajectories, and 3) a space-time cluster algorithm is adopted for detecting new walking facilities. The performance of the method is evaluated with a series of experiments conducted on a part of the road network of Heifei and a large number of real pedestrian trajectories, and verified the results by using Tencent Street map. The results show that the proposed method is able to detecting new pedestrian facilities from VGI data accurately. We believe that the proposed method provides an alternative way for general road data acquisition, and can improve the quality of LBS for pedestrians.
Robust Spectral Unmixing of Sparse Multispectral Lidar Waveforms using Gamma Markov Random Fields

DOE PAGES

Altmann, Yoann; Maccarone, Aurora; McCarthy, Aongus; ...

2017-05-10

Here, this paper presents a new Bayesian spectral un-mixing algorithm to analyse remote scenes sensed via sparse multispectral Lidar measurements. To a first approximation, in the presence of a target, each Lidar waveform consists of a main peak, whose position depends on the target distance and whose amplitude depends on the wavelength of the laser source considered (i.e, on the target reflectivity). Besides, these temporal responses are usually assumed to be corrupted by Poisson noise in the low photon count regime. When considering multiple wavelengths, it becomes possible to use spectral information in order to identify and quantify the mainmore » materials in the scene, in addition to estimation of the Lidar-based range profiles. Due to its anomaly detection capability, the proposed hierarchical Bayesian model, coupled with an efficient Markov chain Monte Carlo algorithm, allows robust estimation of depth images together with abundance and outlier maps associated with the observed 3D scene. The proposed methodology is illustrated via experiments conducted with real multispectral Lidar data acquired in a controlled environment. The results demonstrate the possibility to unmix spectral responses constructed from extremely sparse photon counts (less than 10 photons per pixel and band).« less

[Gaussian process regression and its application in near-infrared spectroscopy analysis].

PubMed

Feng, Ai-Ming; Fang, Li-Min; Lin, Min

2011-06-01

Gaussian process (GP) is applied in the present paper as a chemometric method to explore the complicated relationship between the near infrared (NIR) spectra and ingredients. After the outliers were detected by Monte Carlo cross validation (MCCV) method and removed from dataset, different preprocessing methods, such as multiplicative scatter correction (MSC), smoothing and derivate, were tried for the best performance of the models. Furthermore, uninformative variable elimination (UVE) was introduced as a variable selection technique and the characteristic wavelengths obtained were further employed as input for modeling. A public dataset with 80 NIR spectra of corn was introduced as an example for evaluating the new algorithm. The optimal models for oil, starch and protein were obtained by the GP regression method. The performance of the final models were evaluated according to the root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP) and correlation coefficient (r). The models give good calibration ability with r values above 0.99 and the prediction ability is also satisfactory with r values higher than 0.96. The overall results demonstrate that GP algorithm is an effective chemometric method and is promising for the NIR analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Altmann, Yoann; Maccarone, Aurora; McCarthy, Aongus

Here, this paper presents a new Bayesian spectral un-mixing algorithm to analyse remote scenes sensed via sparse multispectral Lidar measurements. To a first approximation, in the presence of a target, each Lidar waveform consists of a main peak, whose position depends on the target distance and whose amplitude depends on the wavelength of the laser source considered (i.e, on the target reflectivity). Besides, these temporal responses are usually assumed to be corrupted by Poisson noise in the low photon count regime. When considering multiple wavelengths, it becomes possible to use spectral information in order to identify and quantify the mainmore » materials in the scene, in addition to estimation of the Lidar-based range profiles. Due to its anomaly detection capability, the proposed hierarchical Bayesian model, coupled with an efficient Markov chain Monte Carlo algorithm, allows robust estimation of depth images together with abundance and outlier maps associated with the observed 3D scene. The proposed methodology is illustrated via experiments conducted with real multispectral Lidar data acquired in a controlled environment. The results demonstrate the possibility to unmix spectral responses constructed from extremely sparse photon counts (less than 10 photons per pixel and band).« less
Application of surface enhanced Raman scattering and competitive adaptive reweighted sampling on detecting furfural dissolved in transformer oil

NASA Astrophysics Data System (ADS)

Chen, Weigen; Zou, Jingxin; Wan, Fu; Fan, Zhou; Yang, Dingkun

2018-03-01

Detecting the dissolving furfural in mineral oil is an essential technical method to evaluate the ageing condition of oil-paper insulation and the degradation of mechanical properties. Compared with the traditional detection method, Raman spectroscopy is obviously convenient and timesaving in operation. This study explored the method of applying surface enhanced Raman scattering (SERS) on quantitative analysis of the furfural dissolved in oil. Oil solution with different concentration of furfural were prepared and calibrated by high-performance liquid chromatography. Confocal laser Raman spectroscopy (CLRS) and SERS technology were employed to acquire Raman spectral data. Monte Carlo cross validation (MCCV) was used to eliminate the outliers in sample set, then competitive adaptive reweighted sampling (CARS) was developed to select an optimal combination of informative variables that most reflect the chemical properties of concern. Based on selected Raman spectral features, support vector machine (SVM) combined with particle swarm algorithm (PSO) was used to set up a furfural quantitative analysis model. Finally, the generalization ability and prediction precision of the established method were verified by the samples made in lab. In summary, a new spectral method is proposed to quickly detect furfural in oil, which lays a foundation for evaluating the ageing of oil-paper insulation in oil immersed electrical equipment.
Lower reference limits of quantitative cord glucose-6-phosphate dehydrogenase estimated from healthy term neonates according to the clinical and laboratory standards institute guidelines: a cross sectional retrospective study

PubMed Central

2013-01-01

Background Previous studies have reported the lower reference limit (LRL) of quantitative cord glucose-6-phosphate dehydrogenase (G6PD), but they have not used approved international statistical methodology. Using common standards is expecting to yield more true findings. Therefore, we aimed to estimate LRL of quantitative G6PD detection in healthy term neonates by using statistical analyses endorsed by the International Federation of Clinical Chemistry (IFCC) and the Clinical and Laboratory Standards Institute (CLSI) for reference interval estimation. Methods This cross sectional retrospective study was performed at King Abdulaziz Hospital, Saudi Arabia, between March 2010 and June 2012. The study monitored consecutive neonates born to mothers from one Arab Muslim tribe that was assumed to have a low prevalence of G6PD-deficiency. Neonates that satisfied the following criteria were included: full-term birth (37 weeks); no admission to the special care nursery; no phototherapy treatment; negative direct antiglobulin test; and fathers of female neonates were from the same mothers’ tribe. The G6PD activity (Units/gram Hemoglobin) was measured spectrophotometrically by an automated kit. This study used statistical analyses endorsed by IFCC and CLSI for reference interval estimation. The 2.5th percentiles and the corresponding 95% confidence intervals (CI) were estimated as LRLs, both in presence and absence of outliers. Results 207 males and 188 females term neonates who had cord blood quantitative G6PD testing met the inclusion criteria. Method of Horn detected 20 G6PD values as outliers (8 males and 12 females). Distributions of quantitative cord G6PD values exhibited a normal distribution in absence of the outliers only. The Harris-Boyd method and proportion criteria revealed that combined gender LRLs were reliable. The combined bootstrap LRL in presence of the outliers was 10.0 (95% CI: 7.5-10.7) and the combined parametric LRL in absence of the outliers was 11.0 (95% CI: 10.5-11.3). Conclusion These results contribute to the LRL of quantitative cord G6PD detection in full-term neonates. They are transferable to another laboratory when pre-analytical factors and testing methods are comparable and the IFCC-CLSI requirements of transference are satisfied. We are suggesting using estimated LRL in absence of the outliers as mislabeling G6PD-deficient neonates as normal is intolerable whereas mislabeling G6PD-normal neonates as deficient is tolerable. PMID:24016342
VizieR Online Data Catalog: SDSS-DR9 photometric redshifts (Brescia+, 2014)

NASA Astrophysics Data System (ADS)

Brescia, M.; Cavuoti, S.; Longo, G.; de Stefano, V.

2014-07-01

We present an application of a machine learning method to the estimation of photometric redshifts for the galaxies in the SDSS Data Release 9 (SDSS-DR9). Photometric redshifts for more than 143 million galaxies were produced. The MLPQNA (Multi Layer Perceptron with Quasi Newton Algorithm) model provided within the framework of the DAMEWARE (DAta Mining and Exploration Web Application REsource) is an interpolative method derived from machine learning models. The obtained redshifts have an overall uncertainty of σ=0.023 with a very small average bias of about 3x10-5 and a fraction of catastrophic outliers of about 5%. After removal of the catastrophic outliers, the uncertainty is about σ=0.017. The catalogue files report in their name the range of DEC degrees related to the included objects. (60 data files).
Density-based clustering of small peptide conformations sampled from a molecular dynamics simulation.

PubMed

Kim, Minkyoung; Choi, Seung-Hoon; Kim, Junhyoung; Choi, Kihang; Shin, Jae-Min; Kang, Sang-Kee; Choi, Yun-Jaie; Jung, Dong Hyun

2009-11-01

This study describes the application of a density-based algorithm to clustering small peptide conformations after a molecular dynamics simulation. We propose a clustering method for small peptide conformations that enables adjacent clusters to be separated more clearly on the basis of neighbor density. Neighbor density means the number of neighboring conformations, so if a conformation has too few neighboring conformations, then it is considered as noise or an outlier and is excluded from the list of cluster members. With this approach, we can easily identify clusters in which the members are densely crowded in the conformational space, and we can safely avoid misclustering individual clusters linked by noise or outliers. Consideration of neighbor density significantly improves the efficiency of clustering of small peptide conformations sampled from molecular dynamics simulations and can be used for predicting peptide structures.
A method for separating seismo-ionospheric TEC outliers from heliogeomagnetic disturbances by using nu-SVR

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pattisahusiwa, Asis; Liong, The Houw; Purqon, Acep

Seismo-Ionospheric is a study of ionosphere disturbances associated with seismic activities. In many previous researches, heliogeomagnetic or strong earthquake activities can caused the disturbances in the ionosphere. However, it is difficult to separate these disturbances based on related sources. In this research, we proposed a method to separate these disturbances/outliers by using nu-SVR with the world-wide GPS data. TEC data related to the 26th December 2004 Sumatra and the 11th March 2011 Honshu earthquakes had been analyzed. After analyzed TEC data in several location around the earthquake epicenter and compared with geomagnetic data, the method shows a good result inmore » the average to detect the source of these outliers. This method is promising to use in the future research.« less
An anomaly detection approach for the identification of DME patients using spectral domain optical coherence tomography images.

PubMed

Sidibé, Désiré; Sankar, Shrinivasan; Lemaître, Guillaume; Rastgoo, Mojdeh; Massich, Joan; Cheung, Carol Y; Tan, Gavin S W; Milea, Dan; Lamoureux, Ecosse; Wong, Tien Y; Mériaudeau, Fabrice

2017-02-01

This paper proposes a method for automatic classification of spectral domain OCT data for the identification of patients with retinal diseases such as Diabetic Macular Edema (DME). We address this issue as an anomaly detection problem and propose a method that not only allows the classification of the OCT volume, but also allows the identification of the individual diseased B-scans inside the volume. Our approach is based on modeling the appearance of normal OCT images with a Gaussian Mixture Model (GMM) and detecting abnormal OCT images as outliers. The classification of an OCT volume is based on the number of detected outliers. Experimental results with two different datasets show that the proposed method achieves a sensitivity and a specificity of 80% and 93% on the first dataset, and 100% and 80% on the second one. Moreover, the experiments show that the proposed method achieves better classification performance than other recently published works. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
a Gross Error Elimination Method for Point Cloud Data Based on Kd-Tree

NASA Astrophysics Data System (ADS)

Kang, Q.; Huang, G.; Yang, S.

2018-04-01

Point cloud data has been one type of widely used data sources in the field of remote sensing. Key steps of point cloud data's pro-processing focus on gross error elimination and quality control. Owing to the volume feature of point could data, existed gross error elimination methods need spend massive memory both in space and time. This paper employed a new method which based on Kd-tree algorithm to construct, k-nearest neighbor algorithm to search, settled appropriate threshold to determine with result turns out a judgement that whether target point is or not an outlier. Experimental results show that, our proposed algorithm will help to delete gross error in point cloud data and facilitate to decrease memory consumption, improve efficiency.
a Robust Method for Stereo Visual Odometry Based on Multiple Euclidean Distance Constraint and Ransac Algorithm

NASA Astrophysics Data System (ADS)

Zhou, Q.; Tong, X.; Liu, S.; Lu, X.; Liu, S.; Chen, P.; Jin, Y.; Xie, H.

2017-07-01

Visual Odometry (VO) is a critical component for planetary robot navigation and safety. It estimates the ego-motion using stereo images frame by frame. Feature points extraction and matching is one of the key steps for robotic motion estimation which largely influences the precision and robustness. In this work, we choose the Oriented FAST and Rotated BRIEF (ORB) features by considering both accuracy and speed issues. For more robustness in challenging environment e.g., rough terrain or planetary surface, this paper presents a robust outliers elimination method based on Euclidean Distance Constraint (EDC) and Random Sample Consensus (RANSAC) algorithm. In the matching process, a set of ORB feature points are extracted from the current left and right synchronous images and the Brute Force (BF) matcher is used to find the correspondences between the two images for the Space Intersection. Then the EDC and RANSAC algorithms are carried out to eliminate mismatches whose distances are beyond a predefined threshold. Similarly, when the left image of the next time matches the feature points with the current left images, the EDC and RANSAC are iteratively performed. After the above mentioned, there are exceptional remaining mismatched points in some cases, for which the third time RANSAC is applied to eliminate the effects of those outliers in the estimation of the ego-motion parameters (Interior Orientation and Exterior Orientation). The proposed approach has been tested on a real-world vehicle dataset and the result benefits from its high robustness.
Genomic Changes Associated with Reproductive and Migratory Ecotypes in Sockeye Salmon (Oncorhynchus nerka).

PubMed

Veale, Andrew J; Russello, Michael A

2017-10-01

Mechanisms underlying adaptive evolution can best be explored using paired populations displaying similar phenotypic divergence, illuminating the genomic changes associated with specific life history traits. Here, we used paired migratory [anadromous vs. resident (kokanee)] and reproductive [shore- vs. stream-spawning] ecotypes of sockeye salmon (Oncorhynchus nerka) sampled from seven lakes and two rivers spanning three catchments (Columbia, Fraser, and Skeena) in British Columbia, Canada to investigate the patterns and processes underlying their divergence. Restriction-site associated DNA sequencing was used to genotype this sampling at 7,347 single nucleotide polymorphisms, 334 of which were identified as outlier loci and candidates for divergent selection within at least one ecotype comparison. Sixty-eight of these outliers were present in two or more comparisons, with 33 detected across multiple catchments. Of particular note, one locus was detected as the most significant outlier between shore and stream-spawning ecotypes in multiple comparisons and across catchments (Columbia, Fraser, and Snake). We also detected several genomic islands of divergence, some shared among comparisons, potentially showing linked signals of differential selection. The single nucleotide polymorphisms and genomic regions identified in our study offer a range of mechanistic hypotheses associated with the genetic basis of O. nerka life history variation and provide novel tools for informing fisheries management. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Unsupervised universal steganalyzer for high-dimensional steganalytic features

NASA Astrophysics Data System (ADS)

Hou, Xiaodan; Zhang, Tao

2016-11-01

The research in developing steganalytic features has been highly successful. These features are extremely powerful when applied to supervised binary classification problems. However, they are incompatible with unsupervised universal steganalysis because the unsupervised method cannot distinguish embedding distortion from varying levels of noises caused by cover variation. This study attempts to alleviate the problem by introducing similarity retrieval of image statistical properties (SRISP), with the specific aim of mitigating the effect of cover variation on the existing steganalytic features. First, cover images with some statistical properties similar to those of a given test image are searched from a retrieval cover database to establish an aided sample set. Then, unsupervised outlier detection is performed on a test set composed of the given test image and its aided sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called SRISP-aided unsupervised outlier detection, requires no training. Thus, it does not suffer from model mismatch mess. Compared with prior unsupervised outlier detectors that do not consider SRISP, the proposed framework not only retains the universality but also exhibits superior performance when applied to high-dimensional steganalytic features.
Scalable Machine Learning for Massive Astronomical Datasets

NASA Astrophysics Data System (ADS)

Ball, Nicholas M.; Astronomy Data Centre, Canadian

2014-01-01

We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
SweeD: likelihood-based detection of selective sweeps in thousands of genomes.

PubMed

Pavlidis, Pavlos; Živkovic, Daniel; Stamatakis, Alexandros; Alachiotis, Nikolaos

2013-09-01

The advent of modern DNA sequencing technology is the driving force in obtaining complete intra-specific genomes that can be used to detect loci that have been subject to positive selection in the recent past. Based on selective sweep theory, beneficial loci can be detected by examining the single nucleotide polymorphism patterns in intraspecific genome alignments. In the last decade, a plethora of algorithms for identifying selective sweeps have been developed. However, the majority of these algorithms have not been designed for analyzing whole-genome data. We present SweeD (Sweep Detector), an open-source tool for the rapid detection of selective sweeps in whole genomes. It analyzes site frequency spectra and represents a substantial extension of the widely used SweepFinder program. The sequential version of SweeD is up to 22 times faster than SweepFinder and, more importantly, is able to analyze thousands of sequences. We also provide a parallel implementation of SweeD for multi-core processors. Furthermore, we implemented a checkpointing mechanism that allows to deploy SweeD on cluster systems with queue execution time restrictions, as well as to resume long-running analyses after processor failures. In addition, the user can specify various demographic models via the command-line to calculate their theoretically expected site frequency spectra. Therefore, (in contrast to SweepFinder) the neutral site frequencies can optionally be directly calculated from a given demographic model. We show that an increase of sample size results in more precise detection of positive selection. Thus, the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection. We validate SweeD via simulations and by scanning the first chromosome from the 1000 human Genomes project for selective sweeps. We compare SweeD results with results from a linkage-disequilibrium-based approach and identify common outliers.
Multiple-Beam Detection of Fast Transient Radio Sources

NASA Technical Reports Server (NTRS)

Thompson, David R.; Wagstaff, Kiri L.; Majid, Walid A.

2011-01-01

A method has been designed for using multiple independent stations to discriminate fast transient radio sources from local anomalies, such as antenna noise or radio frequency interference (RFI). This can improve the sensitivity of incoherent detection for geographically separated stations such as the very long baseline array (VLBA), the future square kilometer array (SKA), or any other coincident observations by multiple separated receivers. The transients are short, broadband pulses of radio energy, often just a few milliseconds long, emitted by a variety of exotic astronomical phenomena. They generally represent rare, high-energy events making them of great scientific value. For RFI-robust adaptive detection of transients, using multiple stations, a family of algorithms has been developed. The technique exploits the fact that the separated stations constitute statistically independent samples of the target. This can be used to adaptively ignore RFI events for superior sensitivity. If the antenna signals are independent and identically distributed (IID), then RFI events are simply outlier data points that can be removed through robust estimation such as a trimmed or Winsorized estimator. The alternative "trimmed" estimator is considered, which excises the strongest n signals from the list of short-beamed intensities. Because local RFI is independent at each antenna, this interference is unlikely to occur at many antennas on the same step. Trimming the strongest signals provides robustness to RFI that can theoretically outperform even the detection performance of the same number of antennas at a single site. This algorithm requires sorting the signals at each time step and dispersion measure, an operation that is computationally tractable for existing array sizes. An alternative uses the various stations to form an ensemble estimate of the conditional density function (CDF) evaluated at each time step. Both methods outperform standard detection strategies on a test sequence of VLBA data, and both are efficient enough for deployment in real-time, online transient detection applications.
Evaluation of Two Outlier-Detection-Based Methods for Detecting Tissue-Selective Genes from Microarray Data

PubMed Central

Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro

2007-01-01

Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent’s non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent’s method is not suitable for ROKU. PMID:19936074
Real-time detection of organic contamination events in water distribution systems by principal components analysis of ultraviolet spectral data.

PubMed

Zhang, Jian; Hou, Dibo; Wang, Ke; Huang, Pingjie; Zhang, Guangxin; Loáiciga, Hugo

2017-05-01

The detection of organic contaminants in water distribution systems is essential to protect public health from potential harmful compounds resulting from accidental spills or intentional releases. Existing methods for detecting organic contaminants are based on quantitative analyses such as chemical testing and gas/liquid chromatography, which are time- and reagent-consuming and involve costly maintenance. This study proposes a novel procedure based on discrete wavelet transform and principal component analysis for detecting organic contamination events from ultraviolet spectral data. Firstly, the spectrum of each observation is transformed using discrete wavelet with a coiflet mother wavelet to capture the abrupt change along the wavelength. Principal component analysis is then employed to approximate the spectra based on capture and fusion features. The significant value of Hotelling's T 2 statistics is calculated and used to detect outliers. An alarm of contamination event is triggered by sequential Bayesian analysis when the outliers appear continuously in several observations. The effectiveness of the proposed procedure is tested on-line using a pilot-scale setup and experimental data.
Portraying the Expression Landscapes of B-Cell Lymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes

PubMed Central

Hopp, Lydia; Lembcke, Kathrin; Binder, Hans; Wirth, Henry

2013-01-01

We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics. PMID:24833231
Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction.

PubMed

Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi

2017-08-08

Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors.
Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction

PubMed Central

Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi

2017-01-01

Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors. PMID:28786957

Genetic variation of loci potentially under selection confounds species-genetic diversity correlations in a fragmented habitat.

PubMed

Bertin, Angeline; Gouin, Nicolas; Baumel, Alex; Gianoli, Ernesto; Serratosa, Juan; Osorio, Rodomiro; Manel, Stephanie

2017-01-01

Positive species-genetic diversity correlations (SGDCs) are often thought to result from the parallel influence of neutral processes on genetic and species diversity. Yet, confounding effects of non-neutral mechanisms have not been explored. Here, we investigate the impact of non-neutral genetic diversity on SGDCs in high Andean wetlands. We compare correlations between plant species diversity and genetic diversity (GD) calculated with and without loci potentially under selection (outlier loci). The study system includes 2188 specimens from five species (three common aquatic macroinvertebrate and two dominant plant species) that were genotyped for 396 amplified fragment length polymorphism loci. We also appraise the importance of neutral processes on SGDCs by investigating the influence of habitat fragmentation features. Significant positive SGDCs were detected for all five species (mean SGDC = 0.52 ± 0.05). While only a few outlier loci were detected in each species, they resulted in significant decreases in GD and in SGDCs. This supports the hypothesis that neutral processes drive species-genetic diversity relationships in high Andean wetlands. Unexpectedly, the effects on genetic diversity GD of the habitat fragmentation characteristics in this study increased with the presence of outlier loci in two species. Overall, our results reveal pitfalls in using habitat features to infer processes driving SGDCs and show that a few loci potentially under selection are enough to cause a significant downward bias in SGDC. Investigating confounding effects of outlier loci thus represents a useful approach to evidence the contribution of neutral processes on species-genetic diversity relationships. © 2016 John Wiley & Sons Ltd.
Genome scanning for detecting adaptive genes along environmental gradients in the Japanese conifer, Cryptomeria japonica.

PubMed

Tsumura, Y; Uchiyama, K; Moriguchi, Y; Ueno, S; Ihara-Ujino, T

2012-12-01

Local adaptation is important in evolutionary processes and speciation. We used multiple tests to identify several candidate genes that may be involved in local adaptation from 1026 loci in 14 natural populations of Cryptomeria japonica, the most economically important forestry tree in Japan. We also studied the relationships between genotypes and environmental variables to obtain information on the selective pressures acting on individual populations. Outlier loci were mapped onto a linkage map, and the positions of loci associated with specific environmental variables are considered. The outlier loci were not randomly distributed on the linkage map; linkage group 11 was identified as a genomic island of divergence. Three loci in this region were also associated with environmental variables such as mean annual temperature, daily maximum temperature, maximum snow depth, and so on. Outlier loci identified with high significance levels will be essential for conservation purposes and for future work on molecular breeding.
An Automated Algorithm to Screen Massive Training Samples for a Global Impervious Surface Classification

NASA Technical Reports Server (NTRS)

Tan, Bin; Brown de Colstoun, Eric; Wolfe, Robert E.; Tilton, James C.; Huang, Chengquan; Smith, Sarah E.

2012-01-01

An algorithm is developed to automatically screen the outliers from massive training samples for Global Land Survey - Imperviousness Mapping Project (GLS-IMP). GLS-IMP is to produce a global 30 m spatial resolution impervious cover data set for years 2000 and 2010 based on the Landsat Global Land Survey (GLS) data set. This unprecedented high resolution impervious cover data set is not only significant to the urbanization studies but also desired by the global carbon, hydrology, and energy balance researches. A supervised classification method, regression tree, is applied in this project. A set of accurate training samples is the key to the supervised classifications. Here we developed the global scale training samples from 1 m or so resolution fine resolution satellite data (Quickbird and Worldview2), and then aggregate the fine resolution impervious cover map to 30 m resolution. In order to improve the classification accuracy, the training samples should be screened before used to train the regression tree. It is impossible to manually screen 30 m resolution training samples collected globally. For example, in Europe only, there are 174 training sites. The size of the sites ranges from 4.5 km by 4.5 km to 8.1 km by 3.6 km. The amount training samples are over six millions. Therefore, we develop this automated statistic based algorithm to screen the training samples in two levels: site and scene level. At the site level, all the training samples are divided to 10 groups according to the percentage of the impervious surface within a sample pixel. The samples following in each 10% forms one group. For each group, both univariate and multivariate outliers are detected and removed. Then the screen process escalates to the scene level. A similar screen process but with a looser threshold is applied on the scene level considering the possible variance due to the site difference. We do not perform the screen process across the scenes because the scenes might vary due to the phenology, solar-view geometry, and atmospheric condition etc. factors but not actual landcover difference. Finally, we will compare the classification results from screened and unscreened training samples to assess the improvement achieved by cleaning up the training samples. Keywords:
Flexible methods for segmentation evaluation: results from CT-based luggage screening.

PubMed

Karimi, Seemeen; Jiang, Xiaoqian; Cosman, Pamela; Martz, Harry

2014-01-01

Imaging systems used in aviation security include segmentation algorithms in an automatic threat recognition pipeline. The segmentation algorithms evolve in response to emerging threats and changing performance requirements. Analysis of segmentation algorithms' behavior, including the nature of errors and feature recovery, facilitates their development. However, evaluation methods from the literature provide limited characterization of the segmentation algorithms. To develop segmentation evaluation methods that measure systematic errors such as oversegmentation and undersegmentation, outliers, and overall errors. The methods must measure feature recovery and allow us to prioritize segments. We developed two complementary evaluation methods using statistical techniques and information theory. We also created a semi-automatic method to define ground truth from 3D images. We applied our methods to evaluate five segmentation algorithms developed for CT luggage screening. We validated our methods with synthetic problems and an observer evaluation. Both methods selected the same best segmentation algorithm. Human evaluation confirmed the findings. The measurement of systematic errors and prioritization helped in understanding the behavior of each segmentation algorithm. Our evaluation methods allow us to measure and explain the accuracy of segmentation algorithms.
Robust feature matching via support-line voting and affine-invariant ratios

NASA Astrophysics Data System (ADS)

Li, Jiayuan; Hu, Qingwu; Ai, Mingyao; Zhong, Ruofei

2017-10-01

Robust image matching is crucial for many applications of remote sensing and photogrammetry, such as image fusion, image registration, and change detection. In this paper, we propose a robust feature matching method based on support-line voting and affine-invariant ratios. We first use popular feature matching algorithms, such as SIFT, to obtain a set of initial matches. A support-line descriptor based on multiple adaptive binning gradient histograms is subsequently applied in the support-line voting stage to filter outliers. In addition, we use affine-invariant ratios computed by a two-line structure to refine the matching results and estimate the local affine transformation. The local affine model is more robust to distortions caused by elevation differences than the global affine transformation, especially for high-resolution remote sensing images and UAV images. Thus, the proposed method is suitable for both rigid and non-rigid image matching problems. Finally, we extract as many high-precision correspondences as possible based on the local affine extension and build a grid-wise affine model for remote sensing image registration. We compare the proposed method with six state-of-the-art algorithms on several data sets and show that our method significantly outperforms the other methods. The proposed method achieves 94.46% average precision on 15 challenging remote sensing image pairs, while the second-best method, RANSAC, only achieves 70.3%. In addition, the number of detected correct matches of the proposed method is approximately four times the number of initial SIFT matches.
Density-based penalty parameter optimization on C-SVM.

PubMed

Liu, Yun; Lian, Jie; Bartolacci, Michael R; Zeng, Qing-An

2014-01-01

The support vector machine (SVM) is one of the most widely used approaches for data classification and regression. SVM achieves the largest distance between the positive and negative support vectors, which neglects the remote instances away from the SVM interface. In order to avoid a position change of the SVM interface as the result of an error system outlier, C-SVM was implemented to decrease the influences of the system's outliers. Traditional C-SVM holds a uniform parameter C for both positive and negative instances; however, according to the different number proportions and the data distribution, positive and negative instances should be set with different weights for the penalty parameter of the error terms. Therefore, in this paper, we propose density-based penalty parameter optimization of C-SVM. The experiential results indicated that our proposed algorithm has outstanding performance with respect to both precision and recall.
Robust dynamic 3-D measurements with motion-compensated phase-shifting profilometry

NASA Astrophysics Data System (ADS)

Feng, Shijie; Zuo, Chao; Tao, Tianyang; Hu, Yan; Zhang, Minliang; Chen, Qian; Gu, Guohua

2018-04-01

Phase-shifting profilometry (PSP) is a widely used approach to high-accuracy three-dimensional shape measurements. However, when it comes to moving objects, phase errors induced by the movement often result in severe artifacts even though a high-speed camera is in use. From our observations, there are three kinds of motion artifacts: motion ripples, motion-induced phase unwrapping errors, and motion outliers. We present a novel motion-compensated PSP to remove the artifacts for dynamic measurements of rigid objects. The phase error of motion ripples is analyzed for the N-step phase-shifting algorithm and is compensated using the statistical nature of the fringes. The phase unwrapping errors are corrected exploiting adjacent reliable pixels, and the outliers are removed by comparing the original phase map with a smoothed phase map. Compared with the three-step PSP, our method can improve the accuracy by more than 95% for objects in motion.
Community trees: Identifying codiversification in the Páramo dipteran community.

PubMed

Carstens, Bryan C; Gruenstaeudl, Michael; Reid, Noah M

2016-05-01

Groups of codistributed species that responded in a concerted manner to environmental events are expected to share patterns of evolutionary diversification. However, the identification of such groups has largely been based on qualitative, post hoc analyses. We develop here two methods (posterior predictive simulation [PPS], Kuhner-Felsenstein [K-F] analysis of variance [ANOVA]) for the analysis of codistributed species that, given a group of species with a shared pattern of diversification, allow empiricists to identify those taxa that do not codiversify (i.e., "outlier" species). The identification of outlier species makes it possible to jointly estimate the evolutionary history of co-diversifying taxa. To evaluate the approaches presented here, we collected data from Páramo dipterans, identified outlier species, and estimated a "community tree" from species that are identified as having codiversified. Our results demonstrate that dipteran communities from different Páramo habitats in the same mountain range are more closely related than communities in other ranges. We also conduct simulation testing to evaluate this approach. Results suggest that our approach provides a useful addition to comparative phylogeographic methods, while identifying aspects of the analysis that require careful interpretation. In particular, both the PPS and K-F ANOVA perform acceptably when there are one or two outlier species, but less so as the number of outliers increases. This is likely a function of the corresponding degradation of the signal of community divergence; without a strong signal from a codiversifying community, there is no dominant pattern from which to detect an outlier species. For this reason, both the magnitude of K-F distance distribution and outside knowledge about the phylogeographic history of each putative member of the community should be considered when interpreting the results. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Power enhancement via multivariate outlier testing with gene expression arrays.

PubMed

Asare, Adam L; Gao, Zhong; Carey, Vincent J; Wang, Richard; Seyfert-Margolis, Vicki

2009-01-01

As the use of microarrays in human studies continues to increase, stringent quality assurance is necessary to ensure accurate experimental interpretation. We present a formal approach for microarray quality assessment that is based on dimension reduction of established measures of signal and noise components of expression followed by parametric multivariate outlier testing. We applied our approach to several data resources. First, as a negative control, we found that the Affymetrix and Illumina contributions to MAQC data were free from outliers at a nominal outlier flagging rate of alpha=0.01. Second, we created a tunable framework for artificially corrupting intensity data from the Affymetrix Latin Square spike-in experiment to allow investigation of sensitivity and specificity of quality assurance (QA) criteria. Third, we applied the procedure to 507 Affymetrix microarray GeneChips processed with RNA from human peripheral blood samples. We show that exclusion of arrays by this approach substantially increases inferential power, or the ability to detect differential expression, in large clinical studies. http://bioconductor.org/packages/2.3/bioc/html/arrayMvout.html and http://bioconductor.org/packages/2.3/bioc/html/affyContam.html affyContam (credentials: readonly/readonly)
The Space-Time Variation of Global Crop Yields, Detecting Simultaneous Outliers and Identifying the Teleconnections with Climatic Patterns

NASA Astrophysics Data System (ADS)

Najafi, E.; Devineni, N.; Pal, I.; Khanbilvardi, R.

2017-12-01

An understanding of the climate factors that influence the space-time variability of crop yields is important for food security purposes and can help us predict global food availability. In this study, we address how the crop yield trends of countries globally were related to each other during the last several decades and the main climatic variables that triggered high/low crop yields simultaneously across the world. Robust Principal Component Analysis (rPCA) is used to identify the primary modes of variation in wheat, maize, sorghum, rice, soybeans, and barley yields. Relations between these modes of variability and important climatic variables, especially anomalous sea surface temperature (SSTa), are examined from 1964 to 2010. rPCA is also used to identify simultaneous outliers in each year, i.e. systematic high/low crop yields across the globe. The results demonstrated spatiotemporal patterns of these crop yields and the climate-related events that caused them as well as the connection of outliers with weather extremes. We find that among climatic variables, SST has had the most impact on creating simultaneous crop yields variability and yield outliers in many countries. An understanding of this phenomenon can benefit global crop trade networks.
Genomic signatures of positive selection in humans and the limits of outlier approaches.

PubMed

Kelley, Joanna L; Madeoy, Jennifer; Calhoun, John C; Swanson, Willie; Akey, Joshua M

2006-08-01

Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.
Topics in Statistical Calibration

DTIC Science & Technology

2014-03-27

on a parametric bootstrap where, instead of sampling directly from the residuals , samples are drawn from a normal distribution. This procedure will...addition to centering them (Davison and Hinkley, 1997). When there are outliers in the residuals , the bootstrap distribution of x̂0 can become skewed or...based and inversion methods using the linear mixed-effects model. Then, a simple parametric bootstrap algorithm is proposed that can be used to either
Position Estimation Using Image Derivative

NASA Technical Reports Server (NTRS)

Mortari, Daniele; deDilectis, Francesco; Zanetti, Renato

2015-01-01

This paper describes an image processing algorithm to process Moon and/or Earth images. The theory presented is based on the fact that Moon hard edge points are characterized by the highest values of the image derivative. Outliers are eliminated by two sequential filters. Moon center and radius are then estimated by nonlinear least-squares using circular sigmoid functions. The proposed image processing has been applied and validated using real and synthetic Moon images.
An experimental clinical evaluation of EIT imaging with ℓ1 data and image norms.

PubMed

Mamatjan, Yasin; Borsic, Andrea; Gürsoy, Doga; Adler, Andy

2013-09-01

Electrical impedance tomography (EIT) produces an image of internal conductivity distributions in a body from current injection and electrical measurements at surface electrodes. Typically, image reconstruction is formulated using regularized schemes in which ℓ2-norms are used for both data misfit and image prior terms. Such a formulation is computationally convenient, but favours smooth conductivity solutions and is sensitive to outliers. Recent studies highlighted the potential of ℓ1-norm and provided the mathematical basis to improve image quality and robustness of the images to data outliers. In this paper, we (i) extended a primal-dual interior point method (PDIPM) algorithm to 2.5D EIT image reconstruction to solve ℓ1 and mixed ℓ1/ℓ2 formulations efficiently, (ii) evaluated the formulation on clinical and experimental data, and (iii) developed a practical strategy to select hyperparameters using the L-curve which requires minimum user-dependence. The PDIPM algorithm was evaluated using clinical and experimental scenarios on human lung and dog breathing with known electrode errors, which requires a rigorous regularization and causes the failure of reconstruction with an ℓ2-norm solution. The results showed that an ℓ1 solution is not only more robust to unavoidable measurement errors in a clinical setting, but it also provides high contrast resolution on organ boundaries.
Modeling the direction-continuous time-of-arrival in head-related transfer functions

PubMed Central

Ziegelwanger, Harald; Majdak, Piotr

2015-01-01

Head-related transfer functions (HRTFs) describe the filtering of the incoming sound by the torso, head, and pinna. As a consequence of the propagation path from the source to the ear, each HRTF contains a direction-dependent, broadband time-of-arrival (TOA). TOAs are usually estimated independently for each direction from HRTFs, a method prone to artifacts and limited by the spatial sampling. In this study, a continuous-direction TOA model combined with an outlier-removal algorithm is proposed. The model is based on a simplified geometric representation of the listener, and his/her arbitrary position within the HRTF measurement. The outlier-removal procedure uses the extreme studentized deviation test to remove implausible TOAs. The model was evaluated for numerically calculated HRTFs of sphere, torso, and pinna under various conditions. The accuracy of estimated parameters was within the resolution given by the sampling rate. Applied to acoustically measured HRTFs of 172 listeners, the estimated parameters were consistent with realistic listener geometry. The outlier removal further improved the goodness-of-fit, particularly for some problematic fits. The comparison with a simpler model that fixed the listener position to the center of the measurement geometry showed a clear advantage of listener position as an additional free model parameter. PMID:24606268
A 2D range Hausdorff approach to 3D facial recognition.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Koch, Mark William; Russ, Trina Denise; Little, Charles Quentin

2004-11-01

This paper presents a 3D facial recognition algorithm based on the Hausdorff distance metric. The standard 3D formulation of the Hausdorff matching algorithm has been modified to operate on a 2D range image, enabling a reduction in computation from O(N2) to O(N) without large storage requirements. The Hausdorff distance is known for its robustness to data outliers and inconsistent data between two data sets, making it a suitable choice for dealing with the inherent problems in many 3D datasets due to sensor noise and object self-occlusion. For optimal performance, the algorithm assumes a good initial alignment between probe and templatemore » datasets. However, to minimize the error between two faces, the alignment can be iteratively refined. Results from the algorithm are presented using 3D face images from the Face Recognition Grand Challenge database version 1.0.« less
Mobile robot motion estimation using Hough transform

NASA Astrophysics Data System (ADS)

Aldoshkin, D. N.; Yamskikh, T. N.; Tsarev, R. Yu

2018-05-01

This paper proposes an algorithm for estimation of mobile robot motion. The geometry of surrounding space is described with range scans (samples of distance measurements) taken by the mobile robot’s range sensors. A similar sample of space geometry in any arbitrary preceding moment of time or the environment map can be used as a reference. The suggested algorithm is invariant to isotropic scaling of samples or map that allows using samples measured in different units and maps made at different scales. The algorithm is based on Hough transform: it maps from measurement space to a straight-line parameters space. In the straight-line parameters, space the problems of estimating rotation, scaling and translation are solved separately breaking down a problem of estimating mobile robot localization into three smaller independent problems. The specific feature of the algorithm presented is its robustness to noise and outliers inherited from Hough transform. The prototype of the system of mobile robot orientation is described.
kruX: matrix-based non-parametric eQTL discovery.

PubMed

Qi, Jianlong; Asl, Hassan Foroughi; Björkegren, Johan; Michoel, Tom

2014-01-14

The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com.
LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research

NASA Astrophysics Data System (ADS)

Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team

2011-01-01

The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.
Robust detrending, rereferencing, outlier detection, and inpainting for multichannel data.

PubMed

de Cheveigné, Alain; Arzounian, Dorothée

2018-05-15

Electroencephalography (EEG), magnetoencephalography (MEG) and related techniques are prone to glitches, slow drift, steps, etc., that contaminate the data and interfere with the analysis and interpretation. These artifacts are usually addressed in a preprocessing phase that attempts to remove them or minimize their impact. This paper offers a set of useful techniques for this purpose: robust detrending, robust rereferencing, outlier detection, data interpolation (inpainting), step removal, and filter ringing artifact removal. These techniques provide a less wasteful alternative to discarding corrupted trials or channels, and they are relatively immune to artifacts that disrupt alternative approaches such as filtering. Robust detrending allows slow drifts and common mode signals to be factored out while avoiding the deleterious effects of glitches. Robust rereferencing reduces the impact of artifacts on the reference. Inpainting allows corrupt data to be interpolated from intact parts based on the correlation structure estimated over the intact parts. Outlier detection allows the corrupt parts to be identified. Step removal fixes the high-amplitude flux jump artifacts that are common with some MEG systems. Ringing removal allows the ringing response of the antialiasing filter to glitches (steps, pulses) to be suppressed. The performance of the methods is illustrated and evaluated using synthetic data and data from real EEG and MEG systems. These methods, which are mainly automatic and require little tuning, can greatly improve the quality of the data. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

a Weighted Closed-Form Solution for Rgb-D Data Registration

NASA Astrophysics Data System (ADS)

Vestena, K. M.; Dos Santos, D. R.; Oilveira, E. M., Jr.; Pavan, N. L.; Khoshelham, K.

2016-06-01

Existing 3D indoor mapping of RGB-D data are prominently point-based and feature-based methods. In most cases iterative closest point (ICP) and its variants are generally used for pairwise registration process. Considering that the ICP algorithm requires an relatively accurate initial transformation and high overlap a weighted closed-form solution for RGB-D data registration is proposed. In this solution, we weighted and normalized the 3D points based on the theoretical random errors and the dual-number quaternions are used to represent the 3D rigid body motion. Basically, dual-number quaternions provide a closed-form solution by minimizing a cost function. The most important advantage of the closed-form solution is that it provides the optimal transformation in one-step, it does not need to calculate good initial estimates and expressively decreases the demand for computer resources in contrast to the iterative method. Basically, first our method exploits RGB information. We employed a scale invariant feature transformation (SIFT) for extracting, detecting, and matching features. It is able to detect and describe local features that are invariant to scaling and rotation. To detect and filter outliers, we used random sample consensus (RANSAC) algorithm, jointly with an statistical dispersion called interquartile range (IQR). After, a new RGB-D loop-closure solution is implemented based on the volumetric information between pair of point clouds and the dispersion of the random errors. The loop-closure consists to recognize when the sensor revisits some region. Finally, a globally consistent map is created to minimize the registration errors via a graph-based optimization. The effectiveness of the proposed method is demonstrated with a Kinect dataset. The experimental results show that the proposed method can properly map the indoor environment with an absolute accuracy around 1.5% of the travel of a trajectory.
Photogrammetric DSM denoising

NASA Astrophysics Data System (ADS)

Nex, F.; Gerke, M.

2014-08-01

Image matching techniques can nowadays provide very dense point clouds and they are often considered a valid alternative to LiDAR point cloud. However, photogrammetric point clouds are often characterized by a higher level of random noise compared to LiDAR data and by the presence of large outliers. These problems constitute a limitation in the practical use of photogrammetric data for many applications but an effective way to enhance the generated point cloud has still to be found. In this paper we concentrate on the restoration of Digital Surface Models (DSM), computed from dense image matching point clouds. A photogrammetric DSM, i.e. a 2.5D representation of the surface is still one of the major products derived from point clouds. Four different algorithms devoted to DSM denoising are presented: a standard median filter approach, a bilateral filter, a variational approach (TGV: Total Generalized Variation), as well as a newly developed algorithm, which is embedded into a Markov Random Field (MRF) framework and optimized through graph-cuts. The ability of each algorithm to recover the original DSM has been quantitatively evaluated. To do that, a synthetic DSM has been generated and different typologies of noise have been added to mimic the typical errors of photogrammetric DSMs. The evaluation reveals that standard filters like median and edge preserving smoothing through a bilateral filter approach cannot sufficiently remove typical errors occurring in a photogrammetric DSM. The TGV-based approach much better removes random noise, but large areas with outliers still remain. Our own method which explicitly models the degradation properties of those DSM outperforms the others in all aspects.
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels

PubMed Central

Maulik, Ujjwal; Sarkar, Anasua

2013-01-01

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

PubMed

Maulik, Ujjwal; Sarkar, Anasua

2013-01-01

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.
Evaluating Effect of Albendazole on Trichuris trichiura Infection: A Systematic Review Article.

PubMed

Ahmadi Jouybari, Toraj; Najaf Ghobadi, Khadije; Lotfi, Bahare; Alavi Majd, Hamid; Ahmadi, Nayeb Ali; Rostami-Nejad, Mohammad; Aghaei, Abbas

2016-01-01

The aim of the study was assessment of defaults and conducted meta-analysis of the efficacy of single-dose oral albendazole against T. trichiura infection. We searched PubMed, ISI Web of Science, Science Direct, the Cochrane Central Register of Controlled Trials, and WHO library databases between 1983 and 2014. Data from 13 clinical trial articles were used. Each article was included the effect of single oral dose (400 mg) albendazole and placebo in treating two groups of patients with T. trichiura infection. For both groups in each article, sample size, the number of those with T. trichiura infection, and the number of those recovered following the intake of albendazole were identified and recorded. The relative risk and variance were computed. Funnel plot, Beggs and Eggers tests were used for assessment of publication bias. The random effect variance shift outlier model and likelihood ratio test were applied for detecting outliers. In order to detect influence, DFFITS values, Cook's distances and COVRATIO were used. Data were analyzed using STATA and R software. The article number 13 and 9 were outlier and influence, respectively. Outlier is diagnosed by variance shift of target study in inferential method and by RR value in graphical method. Funnel plot and Beggs test did not show the publication bias ( P =0.272). However, the Eggers test confirmed it ( P =0.034). Meta-analysis after removal of article 13 showed that relative risk was 1.99 (CI 95% 1.71 - 2.31). The estimated RR and our meta-analyses show that treatment of T. trichiura with single oral doses of albendazole is unsatisfactory. New anthelminthics are urgently needed.
A global method for identifying dependences between helio-geophysical and biological series by filtering the precedents (outliers)

NASA Astrophysics Data System (ADS)

Ozheredov, V. A.; Breus, T. K.; Gurfinkel, Yu. I.; Matveeva, T. A.

2014-12-01

A new approach to finding the dependence between heliophysical and meteorological factors and physiological parameters is considered that is based on the preliminary filtering of precedents (outliers). The sought-after dependence is masked by extraneous influences which cannot be taken into account. Therefore, the typically calculated correlation between the external-influence ( x) and physiology ( y) parameters is extremely low and does not allow their interdependence to be conclusively proved. A robust method for removing the precedents (outliers) from the database is proposed that is based on the intelligent sorting of the polynomial curves of possible dependences y( x), followed by filtering out the precedents which are far away from y( x) and optimizing the coefficient of nonlinear correlation between the regular, i.e., remaining, precedents. This optimization problem is shown to be a search for a maximum in the absence of the concept of gradient and requires the use of a genetic algorithm based on the Gray code. The relationships between the various medical and biological parameters and characteristics of the space and terrestrial weather are obtained and verified using the cross-validation method. It is proven that, by filtering out no more than 20% of precedents, it is possible to obtain a nonlinear correlation coefficient of no less than 0.5. A juxtaposition of the proposed method for filtering precedents (outliers) and the least-square method (LSM) for determining the optimal polynomial using multiple independent tests (Monte Carlo method) of models, which are as close as possible to real dependences, has shown that the LSM determination loses much in comparison to the proposed method.
Detection and monitoring of emerald ash borer populations: trap trees and the factors that may influence their effectiveness

Treesearch

Andrew J. Storer; Jessica A. Metzger; Ivich Fraser; Deborah G. McCullough; Therese M. Poland; Robert L. Heyd

2007-01-01

The exotic emerald ash borer (EAB), Agrilus planipennis Fairmaire, was first identified in Michigan in 2002, though it had likely been established there for a number of years prior to detection. A key to management of EAB populations is the ability to detect this insect in order to accurately describe its distribution and to locate new outlier...
Multiple diagnosis based on photoplethysmography: hematocrit, SpO2, pulse, and respiration

NASA Astrophysics Data System (ADS)

Yoon, Gilwon; Lee, Jong Y.; Jeon, Kye Jin; Park, Kun-Kook; Yeo, Hyung S.; Hwang, Hyun T.; Kim, Hong S.; Hwang, In-Duk

2002-09-01

Photo-plethysmography measures pulsatile blood flow in real-time and non-invasively. One of widely known applications of PPG is the measurement of saturated oxygen in arterial blood(SpO2). In our work, using several wavelengths more than those used in a pulse oximeter, an algorithm and instrument have been developed to measure hematocrit, saturated oxygen, pulse and respiratory rates simultaneously. To predict hematocrit, a dedicated algorithm is developed based on scattering of RBC and a protocol for detecting outlier signals is used to increase accuracy and reliability. Digital filtering techniques are used to extract respiratory rate signals. Utilization of wavelengths under 1000nm and a multi-wavelength LED array chip and digital-oriented electronics enable us to make a compact device. Our preliminary clinical trials show that the achieved percent errors are +/-8.2% for hematocrit when tested with 594 persons, R2 for SpO2 fitting is 0.99985 when tested with a Bi-Tek pulse oximeter simulator and the SpO2 error for in vivo test is +/-2.5% over the range of 75~100%. The error of pulse rates is less than +/-5%. We obtained a positive predictive value of 96% for respiratory rates in qualitative analysis.
The Effect of Personalization on Smartphone-Based Fall Detectors

PubMed Central

Medrano, Carlos; Plaza, Inmaculada; Igual, Raúl; Sánchez, Ángel; Castro, Manuel

2016-01-01

The risk of falling is high among different groups of people, such as older people, individuals with Parkinson's disease or patients in neuro-rehabilitation units. Developing robust fall detectors is important for acting promptly in case of a fall. Therefore, in this study we propose to personalize smartphone-based detectors to boost their performance as compared to a non-personalized system. Four algorithms were investigated using a public dataset: three novelty detection algorithms—Nearest Neighbor (NN), Local Outlier Factor (LOF) and One-Class Support Vector Machine (OneClass-SVM)—and a traditional supervised algorithm, Support Vector Machine (SVM). The effect of personalization was studied for each subject by considering two different training conditions: data coming only from that subject or data coming from the remaining subjects. The area under the receiver operating characteristic curve (AUC) was selected as the primary figure of merit. The results show that there is a general trend towards the increase in performance by personalizing the detector, but the effect depends on the individual being considered. A personalized NN can reach the performance of a non-personalized SVM (average AUC of 0.9861 and 0.9795, respectively), which is remarkable since NN only uses activities of daily living for training. PMID:26797614
Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications.

PubMed

Shang, Fanhua; Cheng, James; Liu, Yuanyuan; Luo, Zhi-Quan; Lin, Zhouchen

2017-09-04

The heavy-tailed distributions of corrupted outliers and singular values of all channels in low-level vision have proven effective priors for many applications such as background modeling, photometric stereo and image alignment. And they can be well modeled by a hyper-Laplacian. However, the use of such distributions generally leads to challenging non-convex, non-smooth and non-Lipschitz problems, and makes existing algorithms very slow for large-scale applications. Together with the analytic solutions to Lp-norm minimization with two specific values of p, i.e., p=1/2 and p=2/3, we propose two novel bilinear factor matrix norm minimization models for robust principal component analysis. We first define the double nuclear norm and Frobenius/nuclear hybrid norm penalties, and then prove that they are in essence the Schatten-1/2 and 2/3 quasi-norms, respectively, which lead to much more tractable and scalable Lipschitz optimization problems. Our experimental analysis shows that both our methods yield more accurate solutions than original Schatten quasi-norm minimization, even when the number of observations is very limited. Finally, we apply our penalties to various low-level vision problems, e.g. moving object detection, image alignment and inpainting, and show that our methods usually outperform the state-of-the-art methods.
Detection, Location and Grasping Objects Using a Stereo Sensor on UAV in Outdoor Environments.

PubMed

Ramon Soria, Pablo; Arrue, Begoña C; Ollero, Anibal

2017-01-07

The article presents a vision system for the autonomous grasping of objects with Unmanned Aerial Vehicles (UAVs) in real time. Giving UAVs the capability to manipulate objects vastly extends their applications, as they are capable of accessing places that are difficult to reach or even unreachable for human beings. This work is focused on the grasping of known objects based on feature models. The system runs in an on-board computer on a UAV equipped with a stereo camera and a robotic arm. The algorithm learns a feature-based model in an offline stage, then it is used online for detection of the targeted object and estimation of its position. This feature-based model was proved to be robust to both occlusions and the presence of outliers. The use of stereo cameras improves the learning stage, providing 3D information and helping to filter features in the online stage. An experimental system was derived using a rotary-wing UAV and a small manipulator for final proof of concept. The robotic arm is designed with three degrees of freedom and is lightweight due to payload limitations of the UAV. The system has been validated with different objects, both indoors and outdoors.
Detecting subject-specific activations using fuzzy clustering

PubMed Central

Seghier, Mohamed L.; Friston, Karl J.; Price, Cathy J.

2007-01-01

Inter-subject variability in evoked brain responses is attracting attention because it may reflect important variability in structure–function relationships over subjects. This variability could be a signature of degenerate (many-to-one) structure–function mappings in normal subjects or reflect changes that are disclosed by brain damage. In this paper, we describe a non-iterative fuzzy clustering algorithm (FCP: fuzzy clustering with fixed prototypes) for characterizing inter-subject variability in between-subject or second-level analyses of fMRI data. The approach identifies the contribution of each subject to response profiles in voxels surviving a classical F-statistic criterion. The output identifies subjects who drive activation in specific cortical regions (local effects) or in voxels distributed across neural systems (global effects). The sensitivity of the approach was assessed in 38 normal subjects performing an overt naming task. FCP revealed that several subjects had either abnormally high or abnormally low responses. FCP may be particularly useful for characterizing outlier responses in rare patients or heterogeneous populations. In these cases, atypical activations may not be detected by standard tests, under parametric assumptions. The advantage of using FCP is that it searches all voxels systematically and can identify atypical activation patterns in a quantitative and unsupervised manner. PMID:17478103
Estimation of cylinder orientation in three-dimensional point cloud using angular distance-based optimization

NASA Astrophysics Data System (ADS)

Su, Yun-Ting; Hu, Shuowen; Bethel, James S.

2017-05-01

Light detection and ranging (LIDAR) has become a widely used tool in remote sensing for mapping, surveying, modeling, and a host of other applications. The motivation behind this work is the modeling of piping systems in industrial sites, where cylinders are the most common primitive or shape. We focus on cylinder parameter estimation in three-dimensional point clouds, proposing a mathematical formulation based on angular distance to determine the cylinder orientation. We demonstrate the accuracy and robustness of the technique on synthetically generated cylinder point clouds (where the true axis orientation is known) as well as on real LIDAR data of piping systems. The proposed algorithm is compared with a discrete space Hough transform-based approach as well as a continuous space inlier approach, which iteratively discards outlier points to refine the cylinder parameter estimates. Results show that the proposed method is more computationally efficient than the Hough transform approach and is more accurate than both the Hough transform approach and the inlier method.
The Influence Function of Principal Component Analysis by Self-Organizing Rule.

PubMed

Higuchi; Eguchi

1998-07-28

This article is concerned with a neural network approach to principal component analysis (PCA). An algorithm for PCA by the self-organizing rule has been proposed and its robustness observed through the simulation study by Xu and Yuille (1995). In this article, the robustness of the algorithm against outliers is investigated by using the theory of influence function. The influence function of the principal component vector is given in an explicit form. Through this expression, the method is shown to be robust against any directions orthogonal to the principal component vector. In addition, a statistic generated by the self-organizing rule is proposed to assess the influence of data in PCA.
Cross-visit tumor sub-segmentation and registration with outlier rejection for dynamic contrast-enhanced MRI time series data.

PubMed

Buonaccorsi, G A; Rose, C J; O'Connor, J P B; Roberts, C; Watson, Y; Jackson, A; Jayson, G C; Parker, G J M

2010-01-01

Clinical trials of anti-angiogenic and vascular-disrupting agents often use biomarkers derived from DCE-MRI, typically reporting whole-tumor summary statistics and so overlooking spatial parameter variations caused by tissue heterogeneity. We present a data-driven segmentation method comprising tracer-kinetic model-driven registration for motion correction, conversion from MR signal intensity to contrast agent concentration for cross-visit normalization, iterative principal components analysis for imputation of missing data and dimensionality reduction, and statistical outlier detection using the minimum covariance determinant to obtain a robust Mahalanobis distance. After applying these techniques we cluster in the principal components space using k-means. We present results from a clinical trial of a VEGF inhibitor, using time-series data selected because of problems due to motion and outlier time series. We obtained spatially-contiguous clusters that map to regions with distinct microvascular characteristics. This methodology has the potential to uncover localized effects in trials using DCE-MRI-based biomarkers.
A computational study on outliers in world music.

PubMed

Panteli, Maria; Benetos, Emmanouil; Dixon, Simon

2017-01-01

The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as 'outliers'. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the 'uniqueness' of the music of each country.
A RLS-SVM Aided Fusion Methodology for INS during GPS Outages

PubMed Central

Yao, Yiqing; Xu, Xiaosu

2017-01-01

In order to maintain a relatively high accuracy of navigation performance during global positioning system (GPS) outages, a novel robust least squares support vector machine (LS-SVM)-aided fusion methodology is explored to provide the pseudo-GPS position information for the inertial navigation system (INS). The relationship between the yaw, specific force, velocity, and the position increment is modeled. Rather than share the same weight in the traditional LS-SVM, the proposed algorithm allocates various weights for different data, which makes the system immune to the outliers. Field test data was collected to evaluate the proposed algorithm. The comparison results indicate that the proposed algorithm can effectively provide position corrections for standalone INS during the 300 s GPS outage, which outperforms the traditional LS-SVM method. Historical information is also involved to better represent the vehicle dynamics. PMID:28245549
A RLS-SVM Aided Fusion Methodology for INS during GPS Outages.

PubMed

Yao, Yiqing; Xu, Xiaosu

2017-02-24

In order to maintain a relatively high accuracy of navigation performance during global positioning system (GPS) outages, a novel robust least squares support vector machine (LS-SVM)-aided fusion methodology is explored to provide the pseudo-GPS position information for the inertial navigation system (INS). The relationship between the yaw, specific force, velocity, and the position increment is modeled. Rather than share the same weight in the traditional LS-SVM, the proposed algorithm allocates various weights for different data, which makes the system immune to the outliers. Field test data was collected to evaluate the proposed algorithm. The comparison results indicate that the proposed algorithm can effectively provide position corrections for standalone INS during the 300 s GPS outage, which outperforms the traditional LS-SVM method. Historical information is also involved to better represent the vehicle dynamics.
Analyzing contentious relationships and outlier genes in phylogenomics.

PubMed

Walker, Joseph F; Brown, Joseph W; Smith, Stephen A

2018-06-08

Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Here, we examined two datasets where supermatrix and coalescent-based species trees conflict. We identified two highly influential "outlier" genes in each dataset. When removed from each dataset, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate dataset have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant dataset did not exhibit any obvious systematic error and therefore may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting (ILS). Here we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic datasets that does not assume a single topology for all genes. For both datasets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic datasets by asking more targeted edge-based questions.
[Application of Stata software to test heterogeneity in meta-analysis method].

PubMed

Wang, Dan; Mou, Zhen-yun; Zhai, Jun-xia; Zong, Hong-xia; Zhao, Xiao-dong

2008-07-01

To introduce the application of Stata software to heterogeneity test in meta-analysis. A data set was set up according to the example in the study, and the corresponding commands of the methods in Stata 9 software were applied to test the example. The methods used were Q-test and I2 statistic attached to the fixed effect model forest plot, H statistic and Galbraith plot. The existence of the heterogeneity among studies could be detected by Q-test and H statistic and the degree of the heterogeneity could be detected by I2 statistic. The outliers which were the sources of the heterogeneity could be spotted from the Galbraith plot. Heterogeneity test in meta-analysis can be completed by the four methods in Stata software simply and quickly. H and I2 statistics are more robust, and the outliers of the heterogeneity can be clearly seen in the Galbraith plot among the four methods.

Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection

PubMed Central

2012-01-01

Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalized and signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to select a single parameter. PMID:22703641
Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution

PubMed Central

Lo, Kenneth

2011-01-01

Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components. PMID:22125375
Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.

PubMed

Lo, Kenneth; Gottardo, Raphael

2012-01-01

Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.
High Performance Proactive Digital Forensics

NASA Astrophysics Data System (ADS)

Alharbi, Soltan; Moa, Belaid; Weber-Jahnke, Jens; Traore, Issa

2012-10-01

With the increase in the number of digital crimes and in their sophistication, High Performance Computing (HPC) is becoming a must in Digital Forensics (DF). According to the FBI annual report, the size of data processed during the 2010 fiscal year reached 3,086 TB (compared to 2,334 TB in 2009) and the number of agencies that requested Regional Computer Forensics Laboratory assistance increasing from 689 in 2009 to 722 in 2010. Since most investigation tools are both I/O and CPU bound, the next-generation DF tools are required to be distributed and offer HPC capabilities. The need for HPC is even more evident in investigating crimes on clouds or when proactive DF analysis and on-site investigation, requiring semi-real time processing, are performed. Although overcoming the performance challenge is a major goal in DF, as far as we know, there is almost no research on HPC-DF except for few papers. As such, in this work, we extend our work on the need of a proactive system and present a high performance automated proactive digital forensic system. The most expensive phase of the system, namely proactive analysis and detection, uses a parallel extension of the iterative z algorithm. It also implements new parallel information-based outlier detection algorithms to proactively and forensically handle suspicious activities. To analyse a large number of targets and events and continuously do so (to capture the dynamics of the system), we rely on a multi-resolution approach to explore the digital forensic space. Data set from the Honeynet Forensic Challenge in 2001 is used to evaluate the system from DF and HPC perspectives.
Reduction of ZTD outliers through improved GNSS data processing and screening strategies

NASA Astrophysics Data System (ADS)

Stepniak, Katarzyna; Bock, Olivier; Wielgosz, Pawel

2018-03-01

Though Global Navigation Satellite System (GNSS) data processing has been significantly improved over the years, it is still commonly observed that zenith tropospheric delay (ZTD) estimates contain many outliers which are detrimental to meteorological and climatological applications. In this paper, we show that ZTD outliers in double-difference processing are mostly caused by sub-daily data gaps at reference stations, which cause disconnections of clusters of stations from the reference network and common mode biases due to the strong correlation between stations in short baselines. They can reach a few centimetres in ZTD and usually coincide with a jump in formal errors. The magnitude and sign of these biases are impossible to predict because they depend on different errors in the observations and on the geometry of the baselines. We elaborate and test a new baseline strategy which solves this problem and significantly reduces the number of outliers compared to the standard strategy commonly used for positioning (e.g. determination of national reference frame) in which the pre-defined network is composed of a skeleton of reference stations to which secondary stations are connected in a star-like structure. The new strategy is also shown to perform better than the widely used strategy maximizing the number of observations available in many GNSS programs. The reason is that observations are maximized before processing, whereas the final number of used observations can be dramatically lower because of data rejection (screening) during the processing. The study relies on the analysis of 1 year of GPS (Global Positioning System) data from a regional network of 136 GNSS stations processed using Bernese GNSS Software v.5.2. A post-processing screening procedure is also proposed to detect and remove a few outliers which may still remain due to short data gaps. It is based on a combination of range checks and outlier checks of ZTD and formal errors. The accuracy of the final screened GPS ZTD estimates is assessed by comparison to ERA-Interim reanalysis.
Shear wave speed recovery in transient elastography and supersonic imaging using propagating fronts

NASA Astrophysics Data System (ADS)

McLaughlin, Joyce; Renzi, Daniel

2006-04-01

Transient elastography and supersonic imaging are promising new techniques for characterizing the elasticity of soft tissues. Using this method, an 'ultrafast imaging' system (up to 10 000 frames s-1) follows in real time the propagation of a low frequency shear wave. The displacement of the propagating shear wave is measured as a function of time and space. The objective of this paper is to develop and test algorithms whose ultimate product is images of the shear wave speed of tissue mimicking phantoms. The data used in the algorithms are the front of the propagating shear wave. Here, we first develop techniques to find the arrival time surface given the displacement data from a transient elastography experiment. The arrival time surface satisfies the Eikonal equation. We then propose a family of methods, called distance methods, to solve the inverse Eikonal equation: given the arrival times of a propagating wave, find the wave speed. Lastly, we explain why simple inversion schemes for the inverse Eikonal equation lead to large outliers in the wave speed and numerically demonstrate that the new scheme presented here does not have any large outliers. We exhibit two recoveries using these methods: one is with synthetic data; the other is with laboratory data obtained by Mathias Fink's group (the Laboratoire Ondes et Acoustique, ESPCI, Université Paris VII).
Prospective clinical validation of independent DVH prediction for plan QA in automatic treatment planning for prostate cancer patients.

PubMed

Wang, Yibing; Heijmen, Ben J M; Petit, Steven F

2017-12-01

To prospectively investigate the use of an independent DVH prediction tool to detect outliers in the quality of fully automatically generated treatment plans for prostate cancer patients. A plan QA tool was developed to predict rectum, anus and bladder DVHs, based on overlap volume histograms and principal component analysis (PCA). The tool was trained with 22 automatically generated, clinical plans, and independently validated with 21 plans. Its use was prospectively investigated for 50 new plans by replanning in case of detected outliers. For rectum D mean , V 65Gy , V 75Gy , anus D mean , and bladder D mean , the difference between predicted and achieved was within 0.4 Gy or 0.3% (SD within 1.8 Gy or 1.3%). Thirteen detected outliers were re-planned, leading to moderate but statistically significant improvements (mean, max): rectum D mean (1.3 Gy, 3.4 Gy), V 65Gy (2.7%, 4.2%), anus D mean (1.6 Gy, 6.9 Gy), and bladder D mean (1.5 Gy, 5.1 Gy). The rectum V 75Gy of the new plans slightly increased (0.2%, p = 0.087). A high accuracy DVH prediction tool was developed and used for independent QA of automatically generated plans. In 28% of plans, minor dosimetric deviations were observed that could be improved by plan adjustments. Larger gains are expected for manually generated plans. Copyright © 2017 Elsevier B.V. All rights reserved.
Prospective casemix-based funding, analysis and financial impact of cost outliers in all-patient refined diagnosis related groups in three Belgian general hospitals.

PubMed

Pirson, Magali; Martins, Dimitri; Jackson, Terri; Dramaix, Michèle; Leclercq, Pol

2006-03-01

This study examined the impact of cost outliers in term of hospital resources consumption, the financial impact of the outliers under the Belgium casemix-based system, and the validity of two "proxies" for costs: length of stay and charges. The cost of all hospital stays at three Belgian general hospitals were calculated for the year 2001. High resource use outliers were selected according to the following rule: 75th percentile +1.5 xinter-quartile range. The frequency of cost outliers varied from 7% to 8% across hospitals. Explanatory factors were: major or extreme severity of illness, longer length of stay, and intensive care unit stay. Cost outliers account for 22-30% of hospital costs. One-third of length-of-stay outliers are not cost outliers, and nearly one-quarter of charges outliers are not cost outliers. The current funding system in Belgium does not penalize hospitals having a high percentage of outliers. The billing generated by these patients largely compensates for costs generated. Length of stay and charges are not a good approximation to select cost outliers.
Robust estimation of partially linear models for longitudinal data with dropouts and measurement error.

PubMed

Qin, Guoyou; Zhang, Jiajia; Zhu, Zhongyi; Fung, Wing

2016-12-20

Outliers, measurement error, and missing data are commonly seen in longitudinal data because of its data collection process. However, no method can address all three of these issues simultaneously. This paper focuses on the robust estimation of partially linear models for longitudinal data with dropouts and measurement error. A new robust estimating equation, simultaneously tackling outliers, measurement error, and missingness, is proposed. The asymptotic properties of the proposed estimator are established under some regularity conditions. The proposed method is easy to implement in practice by utilizing the existing standard generalized estimating equations algorithms. The comprehensive simulation studies show the strength of the proposed method in dealing with longitudinal data with all three features. Finally, the proposed method is applied to data from the Lifestyle Education for Activity and Nutrition study and confirms the effectiveness of the intervention in producing weight loss at month 9. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Regularized Filters for L1-Norm-Based Common Spatial Patterns.

PubMed

Wang, Haixian; Li, Xiaomeng

2016-02-01

The l1 -norm-based common spatial patterns (CSP-L1) approach is a recently developed technique for optimizing spatial filters in the field of electroencephalogram (EEG)-based brain computer interfaces. The l1 -norm-based expression of dispersion in CSP-L1 alleviates the negative impact of outliers. In this paper, we further improve the robustness of CSP-L1 by taking into account noise which does not necessarily have as large a deviation as with outliers. The noise modelling is formulated by using the waveform length of the EEG time course. With the noise modelling, we then regularize the objective function of CSP-L1, in which the l1-norm is used in two folds: one is the dispersion and the other is the waveform length. An iterative algorithm is designed to resolve the optimization problem of the regularized objective function. A toy illustration and the experiments of classification on real EEG data sets show the effectiveness of the proposed method.
Flexible methods for segmentation evaluation: Results from CT-based luggage screening

PubMed Central

Karimi, Seemeen; Jiang, Xiaoqian; Cosman, Pamela; Martz, Harry

2017-01-01

BACKGROUND Imaging systems used in aviation security include segmentation algorithms in an automatic threat recognition pipeline. The segmentation algorithms evolve in response to emerging threats and changing performance requirements. Analysis of segmentation algorithms’ behavior, including the nature of errors and feature recovery, facilitates their development. However, evaluation methods from the literature provide limited characterization of the segmentation algorithms. OBJECTIVE To develop segmentation evaluation methods that measure systematic errors such as oversegmentation and undersegmentation, outliers, and overall errors. The methods must measure feature recovery and allow us to prioritize segments. METHODS We developed two complementary evaluation methods using statistical techniques and information theory. We also created a semi-automatic method to define ground truth from 3D images. We applied our methods to evaluate five segmentation algorithms developed for CT luggage screening. We validated our methods with synthetic problems and an observer evaluation. RESULTS Both methods selected the same best segmentation algorithm. Human evaluation confirmed the findings. The measurement of systematic errors and prioritization helped in understanding the behavior of each segmentation algorithm. CONCLUSIONS Our evaluation methods allow us to measure and explain the accuracy of segmentation algorithms. PMID:24699346
Improving the estimation of zenith dry tropospheric delays using regional surface meteorological data

NASA Astrophysics Data System (ADS)

Luo, X.; Heck, B.; Awange, J. L.

2013-12-01

Global Navigation Satellite Systems (GNSS) are emerging as possible tools for remote sensing high-resolution atmospheric water vapour that improves weather forecasting through numerical weather prediction models. Nowadays, the GNSS-derived tropospheric zenith total delay (ZTD), comprising zenith dry delay (ZDD) and zenith wet delay (ZWD), is achievable with sub-centimetre accuracy. However, if no representative near-site meteorological information is available, the quality of the ZDD derived from tropospheric models is degraded, leading to inaccurate estimation of the water vapour component ZWD as difference between ZTD and ZDD. On the basis of freely accessible regional surface meteorological data, this paper proposes a height-dependent linear correction model for a priori ZDD. By applying the ordinary least-squares estimation (OLSE), bootstrapping (BOOT), and leave-one-out cross-validation (CROS) methods, the model parameters are estimated and analysed with respect to outlier detection. The model validation is carried out using GNSS stations with near-site meteorological measurements. The results verify the efficiency of the proposed ZDD correction model, showing a significant reduction in the mean bias from several centimetres to about 5 mm. The OLSE method enables a fast computation, while the CROS procedure allows for outlier detection. All the three methods produce consistent results after outlier elimination, which improves the regression quality by about 20% and the model accuracy by up to 30%.
m-BIRCH: an online clustering approach for computer vision applications

NASA Astrophysics Data System (ADS)

Madan, Siddharth K.; Dana, Kristin J.

2015-03-01

We adapt a classic online clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH), to incrementally cluster large datasets of features commonly used in multimedia and computer vision. We call the adapted version modified-BIRCH (m-BIRCH). The algorithm uses only a fraction of the dataset memory to perform clustering, and updates the clustering decisions when new data comes in. Modifications made in m-BIRCH enable data driven parameter selection and effectively handle varying density regions in the feature space. Data driven parameter selection automatically controls the level of coarseness of the data summarization. Effective handling of varying density regions is necessary to well represent the different density regions in data summarization. We use m-BIRCH to cluster 840K color SIFT descriptors, and 60K outlier corrupted grayscale patches. We use the algorithm to cluster datasets consisting of challenging non-convex clustering patterns. Our implementation of the algorithm provides an useful clustering tool and is made publicly available.
3D reconstruction from multi-view VHR-satellite images in MicMac

NASA Astrophysics Data System (ADS)

Rupnik, Ewelina; Pierrot-Deseilligny, Marc; Delorme, Arthur

2018-05-01

This work addresses the generation of high quality digital surface models by fusing multiple depths maps calculated with the dense image matching method. The algorithm is adapted to very high resolution multi-view satellite images, and the main contributions of this work are in the multi-view fusion. The algorithm is insensitive to outliers, takes into account the matching quality indicators, handles non-correlated zones (e.g. occlusions), and is solved with a multi-directional dynamic programming approach. No geometric constraints (e.g. surface planarity) or auxiliary data in form of ground control points are required for its operation. Prior to the fusion procedures, the RPC geolocation parameters of all images are improved in a bundle block adjustment routine. The performance of the algorithm is evaluated on two VHR (Very High Resolution)-satellite image datasets (Pléiades, WorldView-3) revealing its good performance in reconstructing non-textured areas, repetitive patterns, and surface discontinuities.
The Exact Solution to Rank-1 L1-Norm TUCKER2 Decomposition

NASA Astrophysics Data System (ADS)

Markopoulos, Panos P.; Chachlakis, Dimitris G.; Papalexakis, Evangelos E.

2018-04-01

We study rank-1 {L1-norm-based TUCKER2} (L1-TUCKER2) decomposition of 3-way tensors, treated as a collection of $N$ $D \\times M$ matrices that are to be jointly decomposed. Our contributions are as follows. i) We prove that the problem is equivalent to combinatorial optimization over $N$ antipodal-binary variables. ii) We derive the first two algorithms in the literature for its exact solution. The first algorithm has cost exponential in $N$; the second one has cost polynomial in $N$ (under a mild assumption). Our algorithms are accompanied by formal complexity analysis. iii) We conduct numerical studies to compare the performance of exact L1-TUCKER2 (proposed) with standard HOSVD, HOOI, GLRAM, PCA, L1-PCA, and TPCA-L1. Our studies show that L1-TUCKER2 outperforms (in tensor approximation) all the above counterparts when the processed data are outlier corrupted.
Multisensor and Multispectral Approach in Documenting and Analyzing Liquefaction Hazard using Remote Sensing

NASA Astrophysics Data System (ADS)

Oommen, T.; Baise, L. G.; Gens, R.; Prakash, A.; Gupta, R. P.

2008-12-01

Seismic liquefaction is the loss of strength of soil due to shaking that leads to various ground failures such as lateral spreading, settlements, tilting, and sand boils. It is important to document these failures after earthquakes to advance our study of when and where liquefaction occurs. The current approach of mapping these failures by field investigation teams suffers due to the inaccessibility to some of the sites immediately after the event, short life of some of these failures, difficulties in mapping the aerial extent of the failure, incomplete coverage etc. After the 2001 Bhuj earthquake (India), researchers, using the Indian remote sensing satellite, illustrated that satellite remote sensing can provide a synoptic view of the terrain and offer unbiased estimates of liquefaction failures. However, a multisensor (data from different sensors onboard of the same or different satellites) and multispectral (data collected in different spectral regions) approach is needed to efficiently document liquefaction incidences and/or its potential of occurrence due to the possibility of a particular satellite being located inappropriately to image an area shortly after an earthquake. The use of SAR satellite imagery ensures the acquisition of data in all weather conditions at day and night as well as information complimentary to the optical data sets. In this study, we analyze the applicability of the various satellites (Landsat, RADARSAT, Terra-MISR, IRS-1C, IRS-1D) in mapping liquefaction failures after the 2001 Bhuj earthquake using Support Vector Data Description (SVDD). The SVDD is a kernel based nonparametric outlier detection algorithm inspired by the Support Vector Machines (SVMs), which is a new generation learning algorithm based on the statistical learning theory. We present the applicability of SVDD for unsupervised change-detection studies (i.e. to identify post-earthquake liquefaction failures). The liquefaction occurrences identified from the different satellites using SVDD have been compared to the ground truth in terms of documented liquefaction failures by other researchers. We present the applicability and appropriateness of the various satellites and spectral regions for documenting liquefaction related failures. Results illustrate that the SVDD is a promising unsupervised change-detection algorithm, which can help in automating the documentation of earthquake induced liquefaction failures.
Machine learning of molecular properties: Locality and active learning

NASA Astrophysics Data System (ADS)

Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.

2018-06-01

In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.
Asynchronous P300 classification in a reactive brain-computer interface during an outlier detection task

NASA Astrophysics Data System (ADS)

Krumpe, Tanja; Walter, Carina; Rosenstiel, Wolfgang; Spüler, Martin

2016-08-01

Objective. In this study, the feasibility of detecting a P300 via an asynchronous classification mode in a reactive EEG-based brain-computer interface (BCI) was evaluated. The P300 is one of the most popular BCI control signals and therefore used in many applications, mostly for active communication purposes (e.g. P300 speller). As the majority of all systems work with a stimulus-locked mode of classification (synchronous), the field of applications is limited. A new approach needs to be applied in a setting in which a stimulus-locked classification cannot be used due to the fact that the presented stimuli cannot be controlled or predicted by the system. Approach. A continuous observation task requiring the detection of outliers was implemented to test such an approach. The study was divided into an offline and an online part. Main results. Both parts of the study revealed that an asynchronous detection of the P300 can successfully be used to detect single events with high specificity. It also revealed that no significant difference in performance was found between the synchronous and the asynchronous approach. Significance. The results encourage the use of an asynchronous classification approach in suitable applications without a potential loss in performance.
Atomic absorption spectrophotometric determination of tin in canned foods, using nitric acid-hydrochloric acid digestion and nitrous oxide-acetylene flame: collaborative study.

PubMed

Dabeka, R W; McKenzie, A D; Albert, R H

1985-01-01

Twenty-six collaborators participated in a study to evaluate an atomic absorption spectrophotometric (AAS) method for the determination of tin in canned foods. The 5 foods evaluated were meat, pineapple juice, tomato paste, evaporated milk, and green beans, each spiked at 2 levels. The concentration range of tin in the samples was 10-450 micrograms/g, and each level was sent as a blind duplicate. Statistical treatment of results revealed no laboratory outliers and 6 individual or replicate-total outliers, accounting for 3.3% of the data. Repeatability (within-laboratory) coefficient of variation (CVo) ranged from 2.2 to 48%, depending on the tin level and food evaluated. For samples containing greater than or equal to 80 micrograms/g of tin, repeatability CV averaged 5.6% including outliers and 3.7% after their rejection. Overall among-laboratories coefficient of variation (CVx) varied from 3.3 to 58%; at levels greater than or equal to 80 micrograms/g, it averaged 7.3% with outliers and 5.3% after their rejection. Recovery of tin, based on spiking levels, ranged from 100.0 to 112.8% and averaged 105.4%. Detection limit range is 2-10 micrograms/g, and lower quantitation limit is 40 micrograms/g. This method has been adopted official first action.
Signatures of positive selection and local adaptation to urbanization in white-footed mice (Peromyscus leucopus).

PubMed

Harris, Stephen E; Munshi-South, Jason

2017-11-01

Urbanization significantly alters natural ecosystems and has accelerated globally. Urban wildlife populations are often highly fragmented by human infrastructure, and isolated populations may adapt in response to local urban pressures. However, relatively few studies have identified genomic signatures of adaptation in urban animals. We used a landscape genomic approach to examine signatures of selection in urban populations of white-footed mice (Peromyscus leucopus) in New York City. We analysed 154,770 SNPs identified from transcriptome data from 48 P. leucopus individuals from three urban and three rural populations and used outlier tests to identify evidence of urban adaptation. We accounted for demography by simulating a neutral SNP data set under an inferred demographic history as a null model for outlier analysis. We also tested whether candidate genes were associated with environmental variables related to urbanization. In total, we detected 381 outlier loci and after stringent filtering, identified and annotated 19 candidate loci. Many of the candidate genes were involved in metabolic processes and have well-established roles in metabolizing lipids and carbohydrates. Our results indicate that white-footed mice in New York City are adapting at the biomolecular level to local selective pressures in urban habitats. Annotation of outlier loci suggests selection is acting on metabolic pathways in urban populations, likely related to novel diets in cities that differ from diets in less disturbed areas. © 2017 John Wiley & Sons Ltd.

kruX: matrix-based non-parametric eQTL discovery

PubMed Central

2014-01-01

Background The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. Results We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. Conclusion kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com. PMID:24423115
Improved MODIS aerosol retrieval in urban areas using a land classification approach and empirical orthogonal functions

NASA Astrophysics Data System (ADS)

Levitan, Nathaniel; Gross, Barry

2016-10-01

New, high-resolution aerosol products are required in urban areas to improve the spatial coverage of the products, in terms of both resolution and retrieval frequency. These new products will improve our understanding of the spatial variability of aerosols in urban areas and will be useful in the detection of localized aerosol emissions. Urban aerosol retrieval is challenging for existing algorithms because of the high spatial variability of the surface reflectance, indicating the need for improved urban surface reflectance models. This problem can be stated in the language of novelty detection as the problem of selecting aerosol parameters whose effective surface reflectance spectrum is not an outlier in some space. In this paper, empirical orthogonal functions, a reconstruction-based novelty detection technique, is used to perform single-pixel aerosol retrieval using the single angular and temporal sample provided by the MODIS sensor. The empirical orthogonal basis functions are trained for different land classes using the MODIS BRDF MCD43 product. Existing land classification products are used in training and aerosol retrieval. The retrieval is compared against the existing operational MODIS 3 KM Dark Target (DT) aerosol product and co-located AERONET data. Based on the comparison, our method allows for a significant increase in retrieval frequency and a moderate decrease in the known biases of MODIS urban aerosol retrievals.
Spacecraft alignment estimation. [for onboard sensors

NASA Technical Reports Server (NTRS)

Shuster, Malcolm D.; Bierman, Gerald J.

1988-01-01

A numerically well-behaved factorized methodology is developed for estimating spacecraft sensor alignments from prelaunch and inflight data without the need to compute the spacecraft attitude or angular velocity. Such a methodology permits the estimation of sensor alignments (or other biases) in a framework free of unknown dynamical variables. In actual mission implementation such an algorithm is usually better behaved than one that must compute sensor alignments simultaneously with the spacecraft attitude, for example by means of a Kalman filter. In particular, such a methodology is less sensitive to data dropouts of long duration, and the derived measurement used in the attitude-independent algorithm usually makes data checking and editing of outliers much simpler than would be the case in the filter.
Geostationary Communications Satellites as Sensors for the Space Weather Environment: Telemetry Event Identification Algorithms

NASA Astrophysics Data System (ADS)

Carlton, A.; Cahoy, K.

2015-12-01

Reliability of geostationary communication satellites (GEO ComSats) is critical to many industries worldwide. The space radiation environment poses a significant threat and manufacturers and operators expend considerable effort to maintain reliability for users. Knowledge of the space radiation environment at the orbital location of a satellite is of critical importance for diagnosing and resolving issues resulting from space weather, for optimizing cost and reliability, and for space situational awareness. For decades, operators and manufacturers have collected large amounts of telemetry from geostationary (GEO) communications satellites to monitor system health and performance, yet this data is rarely mined for scientific purposes. The goal of this work is to acquire and analyze archived data from commercial operators using new algorithms that can detect when a space weather (or non-space weather) event of interest has occurred or is in progress. We have developed algorithms, collectively called SEER (System Event Evaluation Routine), to statistically analyze power amplifier current and temperature telemetry by identifying deviations from nominal operations or other events and trends of interest. This paper focuses on our work in progress, which currently includes methods for detection of jumps ("spikes", outliers) and step changes (changes in the local mean) in the telemetry. We then examine available space weather data from the NOAA GOES and the NOAA-computed Kp index and sunspot numbers to see what role, if any, it might have played. By combining the results of the algorithm for many components, the spacecraft can be used as a "sensor" for the space radiation environment. Similar events occurring at one time across many component telemetry streams may be indicative of a space radiation event or system-wide health and safety concern. Using SEER on representative datasets of telemetry from Inmarsat and Intelsat, we find events that occur across all or many of telemetry files at certain dates. We compare these system-wide events to known space weather storms, such as the 2003 Halloween storms, and to spacecraft operational events, such as maneuvers. We also present future applications and expansions of SEER for robust space environment sensing and system health and safety monitoring.
[An Extraction and Recognition Method of the Distributed Optical Fiber Vibration Signal Based on EMD-AWPP and HOSA-SVM Algorithm].

PubMed

Zhang, Yanjun; Liu, Wen-zhe; Fu, Xing-hu; Bi, Wei-hong

2016-02-01

Given that the traditional signal processing methods can not effectively distinguish the different vibration intrusion signal, a feature extraction and recognition method of the vibration information is proposed based on EMD-AWPP and HOSA-SVM, using for high precision signal recognition of distributed fiber optic intrusion detection system. When dealing with different types of vibration, the method firstly utilizes the adaptive wavelet processing algorithm based on empirical mode decomposition effect to reduce the abnormal value influence of sensing signal and improve the accuracy of signal feature extraction. Not only the low frequency part of the signal is decomposed, but also the high frequency part the details of the signal disposed better by time-frequency localization process. Secondly, it uses the bispectrum and bicoherence spectrum to accurately extract the feature vector which contains different types of intrusion vibration. Finally, based on the BPNN reference model, the recognition parameters of SVM after the implementation of the particle swarm optimization can distinguish signals of different intrusion vibration, which endows the identification model stronger adaptive and self-learning ability. It overcomes the shortcomings, such as easy to fall into local optimum. The simulation experiment results showed that this new method can effectively extract the feature vector of sensing information, eliminate the influence of random noise and reduce the effects of outliers for different types of invasion source. The predicted category identifies with the output category and the accurate rate of vibration identification can reach above 95%. So it is better than BPNN recognition algorithm and improves the accuracy of the information analysis effectively.
[Automatic Sleep Stage Classification Based on an Improved K-means Clustering Algorithm].

PubMed

Xiao, Shuyuan; Wang, Bei; Zhang, Jian; Zhang, Qunfeng; Zou, Junzhong

2016-10-01

Sleep stage scoring is a hotspot in the field of medicine and neuroscience.Visual inspection of sleep is laborious and the results may be subjective to different clinicians.Automatic sleep stage classification algorithm can be used to reduce the manual workload.However,there are still limitations when it encounters complicated and changeable clinical cases.The purpose of this paper is to develop an automatic sleep staging algorithm based on the characteristics of actual sleep data.In the proposed improved K-means clustering algorithm,points were selected as the initial centers by using a concept of density to avoid the randomness of the original K-means algorithm.Meanwhile,the cluster centers were updated according to the‘Three-Sigma Rule’during the iteration to abate the influence of the outliers.The proposed method was tested and analyzed on the overnight sleep data of the healthy persons and patients with sleep disorders after continuous positive airway pressure(CPAP)treatment.The automatic sleep stage classification results were compared with the visual inspection by qualified clinicians and the averaged accuracy reached 76%.With the analysis of morphological diversity of sleep data,it was proved that the proposed improved K-means algorithm was feasible and valid for clinical practice.
On Evaluating Brain Tissue Classifiers without a Ground Truth

PubMed Central

Martin-Fernandez, Marcos; Ungar, Lida; Nakamura, Motoaki; Koo, Min-Seong; McCarley, Robert W.; Shenton, Martha E.

2009-01-01

In this paper, we present a set of techniques for the evaluation of brain tissue classifiers on a large data set of MR images of the head. Due to the difficulty of establishing a gold standard for this type of data, we focus our attention on methods which do not require a ground truth, but instead rely on a common agreement principle. Three different techniques are presented: the Williams’ index, a measure of common agreement; STAPLE, an Expectation Maximization algorithm which simultaneously estimates performance parameters and constructs an estimated reference standard; and Multidimensional Scaling, a visualization technique to explore similarity data. We apply these different evaluation methodologies to a set eleven different segmentation algorithms on forty MR images. We then validate our evaluation pipeline by building a ground truth based on human expert tracings. The evaluations with and without a ground truth are compared. Our findings show that comparing classifiers without a gold standard can provide a lot of interesting information. In particular, outliers can be easily detected, strongly consistent or highly variable techniques can be readily discriminated, and the overall similarity between different techniques can be assessed. On the other hand, we also find that some information present in the expert segmentations is not captured by the automatic classifiers, suggesting that common agreement alone may not be sufficient for a precise performance evaluation of brain tissue classifiers. PMID:17532646
Live texturing of augmented reality characters from colored drawings.

PubMed

Magnenat, Stéphane; Ngo, Dat Tien; Zünd, Fabio; Ryffel, Mattia; Noris, Gioacchino; Rothlin, Gerhard; Marra, Alessia; Nitti, Maurizio; Fua, Pascal; Gross, Markus; Sumner, Robert W

2015-11-01

Coloring books capture the imagination of children and provide them with one of their earliest opportunities for creative expression. However, given the proliferation and popularity of digital devices, real-world activities like coloring can seem unexciting, and children become less engaged in them. Augmented reality holds unique potential to impact this situation by providing a bridge between real-world activities and digital enhancements. In this paper, we present an augmented reality coloring book App in which children color characters in a printed coloring book and inspect their work using a mobile device. The drawing is detected and tracked, and the video stream is augmented with an animated 3-D version of the character that is textured according to the child's coloring. This is possible thanks to several novel technical contributions. We present a texturing process that applies the captured texture from a 2-D colored drawing to both the visible and occluded regions of a 3-D character in real time. We develop a deformable surface tracking method designed for colored drawings that uses a new outlier rejection algorithm for real-time tracking and surface deformation recovery. We present a content creation pipeline to efficiently create the 2-D and 3-D content. And, finally, we validate our work with two user studies that examine the quality of our texturing algorithm and the overall App experience.
Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York

PubMed Central

Goovaerts, Pierre; Jacquez, Geoffrey M

2004-01-01

Background Complete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses. PMID:15272930
Estimating magnitude and frequency of floods using the PeakFQ 7.0 program

USGS Publications Warehouse

Veilleux, Andrea G.; Cohn, Timothy A.; Flynn, Kathleen M.; Mason, Jr., Robert R.; Hummel, Paul R.

2014-01-01

Flood-frequency analysis provides information about the magnitude and frequency of flood discharges based on records of annual maximum instantaneous peak discharges collected at streamgages. The information is essential for defining flood-hazard areas, for managing floodplains, and for designing bridges, culverts, dams, levees, and other flood-control structures. Bulletin 17B (B17B) of the Interagency Advisory Committee on Water Data (IACWD; 1982) codifies the standard methodology for conducting flood-frequency studies in the United States. B17B specifies that annual peak-flow data are to be fit to a log-Pearson Type III distribution. Specific methods are also prescribed for improving skew estimates using regional skew information, tests for high and low outliers, adjustments for low outliers and zero flows, and procedures for incorporating historical flood information. The authors of B17B identified various needs for methodological improvement and recommended additional study. In response to these needs, the Advisory Committee on Water Information (ACWI, successor to IACWD; http://acwi.gov/, Subcommittee on Hydrology (SOH), Hydrologic Frequency Analysis Work Group (HFAWG), has recommended modest changes to B17B. These changes include adoption of a generalized method-of-moments estimator denoted the Expected Moments Algorithm (EMA) (Cohn and others, 1997) and a generalized version of the Grubbs-Beck test for low outliers (Cohn and others, 2013). The SOH requested that the USGS implement these changes in a user-friendly, publicly accessible program.
VisNAV 100: a robust, compact imaging sensor for enabling autonomous air-to-air refueling of aircraft and unmanned aerial vehicles

NASA Astrophysics Data System (ADS)

Katake, Anup; Choi, Heeyoul

2010-01-01

To enable autonomous air-to-refueling of manned and unmanned vehicles a robust high speed relative navigation sensor capable of proving high accuracy 3DOF information in diverse operating conditions is required. To help address this problem, StarVision Technologies Inc. has been developing a compact, high update rate (100Hz), wide field-of-view (90deg) direction and range estimation imaging sensor called VisNAV 100. The sensor is fully autonomous requiring no communication from the tanker aircraft and contains high reliability embedded avionics to provide range, azimuth, elevation (3 degrees of freedom solution 3DOF) and closing speed relative to the tanker aircraft. The sensor is capable of providing 3DOF with an error of 1% in range and 0.1deg in azimuth/elevation up to a range of 30m and 1 deg error in direction for ranges up to 200m at 100Hz update rates. In this paper we will discuss the algorithms that were developed in-house to enable robust beacon pattern detection, outlier rejection and 3DOF estimation in adverse conditions and present the results of several outdoor tests. Results from the long range single beacon detection tests will also be discussed.
Influence assessment in censored mixed-effects models using the multivariate Student’s-t distribution

PubMed Central

Matos, Larissa A.; Bandyopadhyay, Dipankar; Castro, Luis M.; Lachos, Victor H.

2015-01-01

In biomedical studies on HIV RNA dynamics, viral loads generate repeated measures that are often subjected to upper and lower detection limits, and hence these responses are either left- or right-censored. Linear and non-linear mixed-effects censored (LMEC/NLMEC) models are routinely used to analyse these longitudinal data, with normality assumptions for the random effects and residual errors. However, the derived inference may not be robust when these underlying normality assumptions are questionable, especially the presence of outliers and thick-tails. Motivated by this, Matos et al. (2013b) recently proposed an exact EM-type algorithm for LMEC/NLMEC models using a multivariate Student’s-t distribution, with closed-form expressions at the E-step. In this paper, we develop influence diagnostics for LMEC/NLMEC models using the multivariate Student’s-t density, based on the conditional expectation of the complete data log-likelihood. This partially eliminates the complexity associated with the approach of Cook (1977, 1986) for censored mixed-effects models. The new methodology is illustrated via an application to a longitudinal HIV dataset. In addition, a simulation study explores the accuracy of the proposed measures in detecting possible influential observations for heavy-tailed censored data under different perturbation and censoring schemes. PMID:26190871
Detection, Location and Grasping Objects Using a Stereo Sensor on UAV in Outdoor Environments

PubMed Central

Ramon Soria, Pablo; Arrue, Begoña C.; Ollero, Anibal

2017-01-01

The article presents a vision system for the autonomous grasping of objects with Unmanned Aerial Vehicles (UAVs) in real time. Giving UAVs the capability to manipulate objects vastly extends their applications, as they are capable of accessing places that are difficult to reach or even unreachable for human beings. This work is focused on the grasping of known objects based on feature models. The system runs in an on-board computer on a UAV equipped with a stereo camera and a robotic arm. The algorithm learns a feature-based model in an offline stage, then it is used online for detection of the targeted object and estimation of its position. This feature-based model was proved to be robust to both occlusions and the presence of outliers. The use of stereo cameras improves the learning stage, providing 3D information and helping to filter features in the online stage. An experimental system was derived using a rotary-wing UAV and a small manipulator for final proof of concept. The robotic arm is designed with three degrees of freedom and is lightweight due to payload limitations of the UAV. The system has been validated with different objects, both indoors and outdoors. PMID:28067851
Regression Models for Identifying Noise Sources in Magnetic Resonance Images

PubMed Central

Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

2009-01-01

Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478
GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays

PubMed Central

Li, Ao; Liu, Zongzhi; Lezon-Geyda, Kimberly; Sarkar, Sudipa; Lannin, Donald; Schulz, Vincent; Krop, Ian; Winer, Eric; Harris, Lyndsay; Tuck, David

2011-01-01

There is an increasing interest in using single nucleotide polymorphism (SNP) genotyping arrays for profiling chromosomal rearrangements in tumors, as they allow simultaneous detection of copy number and loss of heterozygosity with high resolution. Critical issues such as signal baseline shift due to aneuploidy, normal cell contamination, and the presence of GC content bias have been reported to dramatically alter SNP array signals and complicate accurate identification of aberrations in cancer genomes. To address these issues, we propose a novel Global Parameter Hidden Markov Model (GPHMM) to unravel tangled genotyping data generated from tumor samples. In contrast to other HMM methods, a distinct feature of GPHMM is that the issues mentioned above are quantitatively modeled by global parameters and integrated within the statistical framework. We developed an efficient EM algorithm for parameter estimation. We evaluated performance on three data sets and show that GPHMM can correctly identify chromosomal aberrations in tumor samples containing as few as 10% cancer cells. Furthermore, we demonstrated that the estimation of global parameters in GPHMM provides information about the biological characteristics of tumor samples and the quality of genotyping signal from SNP array experiments, which is helpful for data quality control and outlier detection in cohort studies. PMID:21398628
Dysmorphometrics: the modelling of morphological abnormalities.

PubMed

Claes, Peter; Daniels, Katleen; Walters, Mark; Clement, John; Vandermeulen, Dirk; Suetens, Paul

2012-02-06

The study of typical morphological variations using quantitative, morphometric descriptors has always interested biologists in general. However, unusual examples of form, such as abnormalities are often encountered in biomedical sciences. Despite the long history of morphometrics, the means to identify and quantify such unusual form differences remains limited. A theoretical concept, called dysmorphometrics, is introduced augmenting current geometric morphometrics with a focus on identifying and modelling form abnormalities. Dysmorphometrics applies the paradigm of detecting form differences as outliers compared to an appropriate norm. To achieve this, the likelihood formulation of landmark superimpositions is extended with outlier processes explicitly introducing a latent variable coding for abnormalities. A tractable solution to this augmented superimposition problem is obtained using Expectation-Maximization. The topography of detected abnormalities is encoded in a dysmorphogram. We demonstrate the use of dysmorphometrics to measure abrupt changes in time, asymmetry and discordancy in a set of human faces presenting with facial abnormalities. The results clearly illustrate the unique power to reveal unusual form differences given only normative data with clear applications in both biomedical practice & research.
Adaptive population divergence and directional gene flow across steep elevational gradients in a climate‐sensitive mammal

USGS Publications Warehouse

Waterhouse, Matthew D.; Erb, Liesl P.; Beever, Erik; Russello, Michael A.

2018-01-01

The American pika is a thermally sensitive, alpine lagomorph species. Recent climate-associated population extirpations and genetic signatures of reduced population sizes range-wide indicate the viability of this species is sensitive to climate change. To test for potential adaptive responses to climate stress, we sampled pikas along two elevational gradients (each ~470 to 1640 m) and employed three outlier detection methods, BAYESCAN, LFMM, and BAYPASS, to scan for genotype-environment associations in samples genotyped at 30,763 SNP loci. We resolved 173 loci with robust evidence of natural selection detected by either two independent analyses or replicated in both transects. A BLASTN search of these outlier loci revealed several genes associated with metabolic function and oxygen transport, indicating natural selection from thermal stress and hypoxia. We also found evidence of directional gene flow primarily downslope from large high-elevation populations and reduced gene flow at outlier loci, a pattern suggesting potential impediments to the upward elevational movement of adaptive alleles in response to contemporary climate change. Finally, we documented evidence of reduced genetic diversity associated the south-facing transect and an increase in corticosterone stress levels associated with inbreeding. This study suggests the American pika is already undergoing climate-associated natural selection at multiple genomic regions. Further analysis is needed to determine if the rate of climate adaptation in the American pika and other thermally sensitive species will be able to keep pace with rapidly changing climate conditions.
Robust geostatistical analysis of spatial data

NASA Astrophysics Data System (ADS)

Papritz, Andreas; Künsch, Hans Rudolf; Schwierz, Cornelia; Stahel, Werner A.

2013-04-01

Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outliers affect the modelling of the large-scale spatial trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation (Welsh and Richardson, 1997). Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and non-sampled locations and kriging variances. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis a data set on heavy metal contamination of the soil in the vicinity of a metal smelter. Marchant, B.P. and Lark, R.M. 2007. Robust estimation of the variogram by residual maximum likelihood. Geoderma 140: 62-72. Richardson, A.M. and Welsh, A.H. 1995. Robust restricted maximum likelihood in mixed linear models. Biometrics 51: 1429-1439. Welsh, A.H. and Richardson, A.M. 1997. Approaches to the robust estimation of mixed models. In: Handbook of Statistics Vol. 15, Elsevier, pp. 343-384.
Robust geostatistical analysis of spatial data

NASA Astrophysics Data System (ADS)

Papritz, A.; Künsch, H. R.; Schwierz, C.; Stahel, W. A.

2012-04-01

Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outlying observations may results from errors (e.g. in data transcription) or from local perturbations in the processes that are responsible for a given pattern of spatial variation. As an example, the spatial distribution of some trace metal in the soils of a region may be distorted by emissions of local anthropogenic sources. Outliers affect the modelling of the large-scale spatial variation, the so-called external drift or trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) [2] proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) [1] for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation. Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and unsampled locations and kriging variances. The method has been implemented in an R package. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis of the Tarrawarra soil moisture data set [3].
Co-Registration Between Multisource Remote-Sensing Images

NASA Astrophysics Data System (ADS)

Wu, J.; Chang, C.; Tsai, H.-Y.; Liu, M.-C.

2012-07-01

Image registration is essential for geospatial information systems analysis, which usually involves integrating multitemporal and multispectral datasets from remote optical and radar sensors. An algorithm that deals with feature extraction, keypoint matching, outlier detection and image warping is experimented in this study. The methods currently available in the literature rely on techniques, such as the scale-invariant feature transform, between-edge cost minimization, normalized cross correlation, leasts-quares image matching, random sample consensus, iterated data snooping and thin-plate splines. Their basics are highlighted and encoded into a computer program. The test images are excerpts from digital files created by the multispectral SPOT-5 and Formosat-2 sensors, and by the panchromatic IKONOS and QuickBird sensors. Suburban areas, housing rooftops, the countryside and hilly plantations are studied. The co-registered images are displayed with block subimages in a criss-cross pattern. Besides the imagery, the registration accuracy is expressed by the root mean square error. Toward the end, this paper also includes a few opinions on issues that are believed to hinder a correct correspondence between diverse images.

42 CFR 413.237 - Outliers.

Code of Federal Regulations, 2010 CFR

2010-10-01

...-only drugs effective January 1, 2014. (2) Adult predicted ESRD outlier services Medicare allowable... furnished to an adult beneficiary by an ESRD facility. (3) Pediatric predicted ESRD outlier services... outlier services furnished to a pediatric beneficiary by an ESRD facility. (4) Adult fixed dollar loss...
Kepler AutoRegressive Planet Search: Motivation & Methodology

NASA Astrophysics Data System (ADS)

Caceres, Gabriel; Feigelson, Eric; Jogesh Babu, G.; Bahamonde, Natalia; Bertin, Karine; Christen, Alejandra; Curé, Michel; Meza, Cristian

2015-08-01

The Kepler AutoRegressive Planet Search (KARPS) project uses statistical methodology associated with autoregressive (AR) processes to model Kepler lightcurves in order to improve exoplanet transit detection in systems with high stellar variability. We also introduce a planet-search algorithm to detect transits in time-series residuals after application of the AR models. One of the main obstacles in detecting faint planetary transits is the intrinsic stellar variability of the host star. The variability displayed by many stars may have autoregressive properties, wherein later flux values are correlated with previous ones in some manner. Auto-Regressive Moving-Average (ARMA) models, Generalized Auto-Regressive Conditional Heteroskedasticity (GARCH), and related models are flexible, phenomenological methods used with great success to model stochastic temporal behaviors in many fields of study, particularly econometrics. Powerful statistical methods are implemented in the public statistical software environment R and its many packages. Modeling involves maximum likelihood fitting, model selection, and residual analysis. These techniques provide a useful framework to model stellar variability and are used in KARPS with the objective of reducing stellar noise to enhance opportunities to find as-yet-undiscovered planets. Our analysis procedure consisting of three steps: pre-processing of the data to remove discontinuities, gaps and outliers; ARMA-type model selection and fitting; and transit signal search of the residuals using a new Transit Comb Filter (TCF) that replaces traditional box-finding algorithms. We apply the procedures to simulated Kepler-like time series with known stellar and planetary signals to evaluate the effectiveness of the KARPS procedures. The ARMA-type modeling is effective at reducing stellar noise, but also reduces and transforms the transit signal into ingress/egress spikes. A periodogram based on the TCF is constructed to concentrate the signal of these periodic spikes. When a periodic transit is found, the model is displayed on a standard period-folded averaged light curve. We also illustrate the efficient coding in R.
On-line Machine Learning and Event Detection in Petascale Data Streams

NASA Astrophysics Data System (ADS)

Thompson, David R.; Wagstaff, K. L.

2012-01-01

Traditional statistical data mining involves off-line analysis in which all data are available and equally accessible. However, petascale datasets have challenged this premise since it is often impossible to store, let alone analyze, the relevant observations. This has led the machine learning community to investigate adaptive processing chains where data mining is a continuous process. Here pattern recognition permits triage and followup decisions at multiple stages of a processing pipeline. Such techniques can also benefit new astronomical instruments such as the Large Synoptic Survey Telescope (LSST) and Square Kilometre Array (SKA) that will generate petascale data volumes. We summarize some machine learning perspectives on real time data mining, with representative cases of astronomical applications and event detection in high volume datastreams. The first is a "supervised classification" approach currently used for transient event detection at the Very Long Baseline Array (VLBA). It injects known signals of interest - faint single-pulse anomalies - and tunes system parameters to recover these events. This permits meaningful event detection for diverse instrument configurations and observing conditions whose noise cannot be well-characterized in advance. Second, "semi-supervised novelty detection" finds novel events based on statistical deviations from previous patterns. It detects outlier signals of interest while considering known examples of false alarm interference. Applied to data from the Parkes pulsar survey, the approach identifies anomalous "peryton" phenomena that do not match previous event models. Finally, we consider online light curve classification that can trigger adaptive followup measurements of candidate events. Classifier performance analyses suggest optimal survey strategies, and permit principled followup decisions from incomplete data. These examples trace a broad range of algorithm possibilities available for online astronomical data mining. This talk describes research performed at the Jet Propulsion Laboratory, California Institute of Technology. Copyright 2012, All Rights Reserved. U.S. Government support acknowledged.
Identification and influence of spatio-temporal outliers in urban air quality measurements.

PubMed

O'Leary, Brendan; Reiners, John J; Xu, Xiaohong; Lemke, Lawrence D

2016-12-15

Forty eight potential outliers in air pollution measurements taken simultaneously in Detroit, Michigan, USA and Windsor, Ontario, Canada in 2008 and 2009 were identified using four independent methods: box plots, variogram clouds, difference maps, and the Local Moran's I statistic. These methods were subsequently used in combination to reduce and select a final set of 13 outliers for nitrogen dioxide (NO 2 ), volatile organic compounds (VOCs), total benzene, toluene, ethyl benzene, and xylene (BTEX), and particulate matter in two size fractions (PM 2.5 and PM 10 ). The selected outliers were excluded from the measurement datasets and used to revise air pollution models. In addition, a set of temporally-scaled air pollution models was generated using time series measurements from community air quality monitors, with and without the selected outliers. The influence of outlier exclusion on associations with asthma exacerbation rates aggregated at a postal zone scale in both cities was evaluated. Results demonstrate that the inclusion or exclusion of outliers influences the strength of observed associations between intraurban air quality and asthma exacerbation in both cities. The box plot, variogram cloud, and difference map methods largely determined the final list of outliers, due to the high degree of conformity among their results. The Moran's I approach was not useful for outlier identification in the datasets studied. Removing outliers changed the spatial distribution of modeled concentration values and derivative exposure estimates averaged over postal zones. Overall, associations between air pollution and acute asthma exacerbation rates were weaker with outliers removed, but improved with the addition of temporal information. Decreases in statistically significant associations between air pollution and asthma resulted, in part, from smaller pollutant concentration ranges used for linear regression. Nevertheless, the practice of identifying outliers through congruence among multiple methods strengthens confidence in the analysis of outlier presence and influence in environmental datasets. Copyright Â© 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Treatment of Outliers via Interpolation Method with Neural Network Forecast Performances

NASA Astrophysics Data System (ADS)

Wahir, N. A.; Nor, M. E.; Rusiman, M. S.; Gopal, K.

2018-04-01

Outliers often lurk in many datasets, especially in real data. Such anomalous data can negatively affect statistical analyses, primarily normality, variance, and estimation aspects. Hence, handling the occurrences of outliers require special attention. Therefore, it is important to determine the suitable ways in treating outliers so as to ensure that the quality of the analyzed data is indeed high. As such, this paper discusses an alternative method to treat outliers via linear interpolation method. In fact, assuming outlier as a missing value in the dataset allows the application of the interpolation method to interpolate the outliers thus, enabling the comparison of data series using forecast accuracy before and after outlier treatment. With that, the monthly time series of Malaysian tourist arrivals from January 1998 until December 2015 had been used to interpolate the new series. The results indicated that the linear interpolation method, which was comprised of improved time series data, displayed better results, when compared to the original time series data in forecasting from both Box-Jenkins and neural network approaches.
Chemical quality of bottom sediments in selected streams, Jefferson County, Kentucky, April-July 1992

USGS Publications Warehouse

Moore, B.L.; Evaldi, R.D.

1995-01-01

Bottom sediments from 25 stream sites in Jefferson County, Ky., were analyzed for percent volatile solids and concentrations of nutrients, major metals, trace elements, miscellaneous inorganic compounds, and selected organic compounds. Statistical high outliers of the constituent concentrations analyzed for in the bottom sediments were defined as a measure of possible elevated concentrations. Statistical high outliers were determined for at least 1 constituent at each of 12 sampling sites in Jefferson County. Of the 10 stream basins sampled in Jefferson County, the Middle Fork Beargrass Basin, Cedar Creek Basin, and Harrods Creek Basin were the only three basins where a statistical high outlier was not found for any of the measured constituents. In the Pennsylvania Run Basin, total volatile solids, nitrate plus nitrite, and endrin constituents were statistical high outliers. Pond Creek was the only basin where five constituents were statistical high outliers-barium, beryllium, cadmium, chromium, and silver. Nitrate plus nitrite and copper constituents were the only statistical high outliers found in the Mill Creek Basin. In the Floyds Fork Basin, nitrate plus nitrite, phosphorus, mercury, and silver constituents were the only statistical high outliers. Ammonia was the only statistical high outlier found in the South Fork Beargrass Basin. In the Goose Creek Basin, mercury and silver constituents were the only statistical high outliers. Cyanide was the only statistical high outlier in the Muddy Fork Basin.
Improving the Accuracy of Cloud Detection Using Machine Learning

NASA Astrophysics Data System (ADS)

Craddock, M. E.; Alliss, R. J.; Mason, M.

2017-12-01

Cloud detection from geostationary satellite imagery has long been accomplished through multi-spectral channel differencing in comparison to the Earth's surface. The distinction of clear/cloud is then determined by comparing these differences to empirical thresholds. Using this methodology, the probability of detecting clouds exceeds 90% but performance varies seasonally, regionally and temporally. The Cloud Mask Generator (CMG) database developed under this effort, consists of 20 years of 4 km, 15minute clear/cloud images based on GOES data over CONUS and Hawaii. The algorithms to determine cloudy pixels in the imagery are based on well-known multi-spectral techniques and defined thresholds. These thresholds were produced by manually studying thousands of images and thousands of man-hours to determine the success and failure of the algorithms to fine tune the thresholds. This study aims to investigate the potential of improving cloud detection by using Random Forest (RF) ensemble classification. RF is the ideal methodology to employ for cloud detection as it runs efficiently on large datasets, is robust to outliers and noise and is able to deal with highly correlated predictors, such as multi-spectral satellite imagery. The RF code was developed using Python in about 4 weeks. The region of focus selected was Hawaii and includes the use of visible and infrared imagery, topography and multi-spectral image products as predictors. The development of the cloud detection technique is realized in three steps. First, tuning of the RF models is completed to identify the optimal values of the number of trees and number of predictors to employ for both day and night scenes. Second, the RF models are trained using the optimal number of trees and a select number of random predictors identified during the tuning phase. Lastly, the model is used to predict clouds for an independent time period than used during training and compared to truth, the CMG cloud mask. Initial results show 97% accuracy during the daytime, 94% accuracy at night, and 95% accuracy for all times. The total time to train, tune and test was approximately one week. The improved performance and reduced time to produce results is testament to improved computer technology and the use of machine learning as a more efficient and accurate methodology of cloud detection.
Robust MST-Based Clustering Algorithm.

PubMed

Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing

2018-06-01

Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.
An efficient identification approach for stable and unstable nonlinear systems using Colliding Bodies Optimization algorithm.

PubMed

Pal, Partha S; Kar, R; Mandal, D; Ghoshal, S P

2015-11-01

This paper presents an efficient approach to identify different stable and practically useful Hammerstein models as well as unstable nonlinear process along with its stable closed loop counterpart with the help of an evolutionary algorithm as Colliding Bodies Optimization (CBO) optimization algorithm. The performance measures of the CBO based optimization approach such as precision, accuracy are justified with the minimum output mean square value (MSE) which signifies that the amount of bias and variance in the output domain are also the least. It is also observed that the optimization of output MSE in the presence of outliers has resulted in a very close estimation of the output parameters consistently, which also justifies the effective general applicability of the CBO algorithm towards the system identification problem and also establishes the practical usefulness of the applied approach. Optimum values of the MSEs, computational times and statistical information of the MSEs are all found to be the superior as compared with those of the other existing similar types of stochastic algorithms based approaches reported in different recent literature, which establish the robustness and efficiency of the applied CBO based identification scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Influence of outliers on accuracy estimation in genomic prediction in plant breeding.

PubMed

Estaghvirou, Sidi Boubacar Ould; Ogutu, Joseph O; Piepho, Hans-Peter

2014-10-01

Outliers often pose problems in analyses of data in plant breeding, but their influence on the performance of methods for estimating predictive accuracy in genomic prediction studies has not yet been evaluated. Here, we evaluate the influence of outliers on the performance of methods for accuracy estimation in genomic prediction studies using simulation. We simulated 1000 datasets for each of 10 scenarios to evaluate the influence of outliers on the performance of seven methods for estimating accuracy. These scenarios are defined by the number of genotypes, marker effect variance, and magnitude of outliers. To mimic outliers, we added to one observation in each simulated dataset, in turn, 5-, 8-, and 10-times the error SD used to simulate small and large phenotypic datasets. The effect of outliers on accuracy estimation was evaluated by comparing deviations in the estimated and true accuracies for datasets with and without outliers. Outliers adversely influenced accuracy estimation, more so at small values of genetic variance or number of genotypes. A method for estimating heritability and predictive accuracy in plant breeding and another used to estimate accuracy in animal breeding were the most accurate and resistant to outliers across all scenarios and are therefore preferable for accuracy estimation in genomic prediction studies. The performances of the other five methods that use cross-validation were less consistent and varied widely across scenarios. The computing time for the methods increased as the size of outliers and sample size increased and the genetic variance decreased. Copyright © 2014 Ould Estaghvirou et al.
On damage detection in wind turbine gearboxes using outlier analysis

NASA Astrophysics Data System (ADS)

Antoniadou, Ifigeneia; Manson, Graeme; Dervilis, Nikolaos; Staszewski, Wieslaw J.; Worden, Keith

2012-04-01

The proportion of worldwide installed wind power in power systems increases over the years as a result of the steadily growing interest in renewable energy sources. Still, the advantages offered by the use of wind power are overshadowed by the high operational and maintenance costs, resulting in the low competitiveness of wind power in the energy market. In order to reduce the costs of corrective maintenance, the application of condition monitoring to gearboxes becomes highly important, since gearboxes are among the wind turbine components with the most frequent failure observations. While condition monitoring of gearboxes in general is common practice, with various methods having been developed over the last few decades, wind turbine gearbox condition monitoring faces a major challenge: the detection of faults under the time-varying load conditions prevailing in wind turbine systems. Classical time and frequency domain methods fail to detect faults under variable load conditions, due to the temporary effect that these faults have on vibration signals. This paper uses the statistical discipline of outlier analysis for the damage detection of gearbox tooth faults. A simplified two-degree-of-freedom gearbox model considering nonlinear backlash, time-periodic mesh stiffness and static transmission error, simulates the vibration signals to be analysed. Local stiffness reduction is used for the simulation of tooth faults and statistical processes determine the existence of intermittencies. The lowest level of fault detection, the threshold value, is considered and the Mahalanobis squared-distance is calculated for the novelty detection problem.
Accuracy in breast shape alignment with 3D surface fitting algorithms.

PubMed

Riboldi, Marco; Gierga, David P; Chen, George T Y; Baroni, Guido

2009-04-01

Surface imaging is in use in radiotherapy clinical practice for patient setup optimization and monitoring. Breast alignment is accomplished by searching for a tentative spatial correspondence between the reference and daily surface shape models. In this study, the authors quantify whole breast shape alignment by relying on texture features digitized on 3D surface models. Texture feature localization was validated through repeated measurements in a silicone breast phantom, mounted on a high precision mechanical stage. Clinical investigations on breast shape alignment included 133 fractions in 18 patients treated with accelerated partial breast irradiation. The breast shape was detected with a 3D video based surface imaging system so that breathing was compensated. An in-house algorithm for breast alignment, based on surface fitting constrained by nipple matching (constrained surface fitting), was applied. Results were compared with a commercial software where no constraints are utilized (unconstrained surface fitting). Texture feature localization was validated within 2 mm in each anatomical direction. Clinical data show that unconstrained surface fitting achieves adequate accuracy in most cases, though nipple mismatch is considerably higher than residual surface distances (3.9 mm vs 0.6 mm on average). Outliers beyond 1 cm can be experienced as the result of a degenerate surface fit, where unconstrained surface fitting is not sufficient to establish spatial correspondence. In the constrained surface fitting algorithm, average surface mismatch within 1 mm was obtained when nipple position was forced to match in the [1.5; 5] mm range. In conclusion, optimal results can be obtained by trading off the desired overall surface congruence vs matching of selected landmarks (constraint). Constrained surface fitting is put forward to represent an improvement in setup accuracy for those applications where whole breast positional reproducibility is an issue.
Data Mining Research with the LSST

NASA Astrophysics Data System (ADS)

Borne, Kirk D.; Strauss, M. A.; Tyson, J. A.

2007-12-01

The LSST catalog database will exceed 10 petabytes, comprising several hundred attributes for 5 billion galaxies, 10 billion stars, and over 1 billion variable sources (optical variables, transients, or moving objects), extracted from over 20,000 square degrees of deep imaging in 5 passbands with thorough time domain coverage: 1000 visits over the 10-year LSST survey lifetime. The opportunities are enormous for novel scientific discoveries within this rich time-domain ultra-deep multi-band survey database. Data Mining, Machine Learning, and Knowledge Discovery research opportunities with the LSST are now under study, with a potential for new collaborations to develop to contribute to these investigations. We will describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. We also give some illustrative examples of current scientific data mining research in astronomy, and point out where new research is needed. In particular, the data mining research community will need to address several issues in the coming years as we prepare for the LSST data deluge. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; visual data mining algorithms for visual exploration of the data; indexing of multi-attribute multi-dimensional astronomical databases (beyond RA-Dec spatial indexing) for rapid querying of petabyte databases; and more. Finally, we will identify opportunities for synergistic collaboration between the data mining research group and the LSST Data Management and Science Collaboration teams.
Two-dimensional radial laser scanning for circular marker detection and external mobile robot tracking.

PubMed

Teixidó, Mercè; Pallejà, Tomàs; Font, Davinia; Tresanchez, Marcel; Moreno, Javier; Palacín, Jordi

2012-11-28

This paper presents the use of an external fixed two-dimensional laser scanner to detect cylindrical targets attached to moving devices, such as a mobile robot. This proposal is based on the detection of circular markers in the raw data provided by the laser scanner by applying an algorithm for outlier avoidance and a least-squares circular fitting. Some experiments have been developed to empirically validate the proposal with different cylindrical targets in order to estimate the location and tracking errors achieved, which are generally less than 20 mm in the area covered by the laser sensor. As a result of the validation experiments, several error maps have been obtained in order to give an estimate of the uncertainty of any location computed. This proposal has been validated with a medium-sized mobile robot with an attached cylindrical target (diameter 200 mm). The trajectory of the mobile robot was estimated with an average location error of less than 15 mm, and the real location error in each individual circular fitting was similar to the error estimated with the obtained error maps. The radial area covered in this validation experiment was up to 10 m, a value that depends on the radius of the cylindrical target and the radial density of the distance range points provided by the laser scanner but this area can be increased by combining the information of additional external laser scanners.
Feature Detection in SAR Interferograms With Missing Data Displays Fault Slip Near El Mayor-Cucapah and South Napa Earthquakes

NASA Astrophysics Data System (ADS)

Parker, J. W.; Donnellan, A.; Glasscoe, M. T.; Stough, T.

2015-12-01

Edge detection identifies seismic or aseismic fault motion, as demonstrated in repeat-pass inteferograms obtained by the Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) program. But this identification, demonstrated in 2010, was not robust: for best results, it requires a flattened background image, interpolation into missing data (holes) and outliers, and background noise that is either sufficiently small or roughly white Gaussian. Proper treatment of missing data, bursting noise patches, and tiny noise differences at short distances apart from bursts are essential to creating an acceptably reliable method sensitive to small near-surface fractures. Clearly a robust method is needed for machine scanning of the thousands of UAVSAR repeat-pass interferograms for evidence of fault slip, landslides, and other local features: hand-crafted intervention will not do. Effective methods of identifying, removing and filling in bad pixels reveal significant features of surface fractures. A rich network of edges (probably fractures and subsidence) in difference images spanning the South Napa earthquake give way to a simple set of postseismically slipping faults. Coseismic El Mayor-Cucapah interferograms compared to post-seismic difference images show nearly disjoint patterns of surface fractures in California's Sonoran Desert; the combined pattern reveals a network of near-perpendicular, probably conjugate faults not mapped before the earthquake. The current algorithms for UAVSAR interferogram edge detections are shown to be effective in difficult environments, including agricultural (Napa, Imperial Valley) and difficult urban areas (Orange County.).
Patient-specific instrumentation improved mechanical alignment, while early clinical outcome was comparable to conventional instrumentation in TKA.

PubMed

Anderl, Werner; Pauzenberger, Leo; Kölblinger, Roman; Kiesselbach, Gabriele; Brandl, Georg; Laky, Brenda; Kriegleder, Bernhard; Heuberer, Philipp; Schwameis, Eva

2016-01-01

The aim of this prospective study was to compare early clinical outcome, radiological limb alignment, and three-dimensional (3D)-component positioning between conventional and computed tomography (CT)-based patient-specific instrumentation (PSI) in primary mobile-bearing total knee arthroplasty (TKA). Two hundred ninety consecutive patients (300 knees) with severe, debilitating osteoarthritis scheduled for TKA were included in this study using either conventional instrumentation (CVI, n = 150) or PSI (n = 150). Patients were clinically assessed before and 2 years after surgery according to the Knee-Society-Score (KSS) and the visual-analog-scale for pain (VAS). Additionally, the Western Ontario McMaster Universities Osteoarthritis Index (WOMAC) and the Oxford-Knee-Score (OKS) were collected at follow-up. To evaluate accuracy of CVI and PSI, hip-knee-ankle angle (HKA) and 3D-component positioning were assessed on postoperative radiographs and CT. Data of 222 knees (CVI: n = 108, PSI: n = 114) were available for analysis after a mean follow-up of 28.6 ± 5.2 months. At the early follow-up, clinical outcome (KSS, VAS, WOMAC, OKS) was comparable between the two groups. Mean HKA-deviation from the targeted neutral mechanical axis (CVI: 2.2° ± 1.7°; PSI: 1.5° ± 1.4°; p < 0.001), rates of outliers (CVI: 22.2%; PSI: 9.6%; p = 0.016), and 3D-component positioning outliers were significantly lower in the PSI group. Non-outliers (HKA: 180° ± 3°) showed better clinical results than outliers at the 2-year follow-up. CT-based PSI compared with CVI improves accuracy of mechanical alignment restoration and 3D-component positioning in primary TKA. While clinical outcome was comparable between the two instrumentation groups at early follow-up, significantly inferior outcome was detected in the subgroup of HKA-outliers. Prospective comparative study, Level II.
Biomarker profiling in reef corals of Tonga’s Ha’apai and Vava’u archipelagos

PubMed Central

Chen, Chii-Shiarng; Dempsey, Alexandra C.

2017-01-01

Given the significant threats towards Earth’s coral reefs, there is an urgent need to document the current physiological condition of the resident organisms, particularly the reef-building scleractinians themselves. Unfortunately, most of the planet’s reefs are understudied, and some have yet to be seen. For instance, the Kingdom of Tonga possesses an extensive reef system, with thousands of hectares of unobserved reefs; little is known about their ecology, nor is there any information on the health of the resident corals. Given such knowledge deficiencies, 59 reefs across three Tongan archipelagos were surveyed herein, and pocilloporid corals were sampled from approximately half of these surveyed sites; 10 molecular-scale response variable were assessed in 88 of the sampled colonies, and 12 colonies were found to be outliers based on employment of a multivariate statistics-based aberrancy detection system. These outliers differed from the statistically normally behaving colonies in having not only higher RNA/DNA ratios but also elevated expression levels of three genes: 1) Symbiodinium zinc-induced facilitator-like 1-like, 2) host coral copper-zinc superoxide dismutase, and 3) host green fluorescent protein-like chromoprotein. Outliers were also characterized by significantly higher variation amongst the molecular response variables assessed, and the response variables that contributed most significantly to colonies being delineated as outliers differed between the two predominant reef coral species sampled, Pocillopora damicornis and P. acuta. These closely related species also displayed dissimilar temporal fluctuation patterns in their molecular physiologies, an observation that may have been driven by differences in their feeding strategies. Future works should attempt to determine whether corals displaying statistically aberrant molecular physiology, such as the 12 Tongan outliers identified herein, are indeed characterized by a diminished capacity for acclimating to the rapid changes in their abiotic milieu occurring as a result of global climate change. PMID:29091723
Outlier identification in urban soils and its implications for identification of potential contaminated land

NASA Astrophysics Data System (ADS)

Zhang, Chaosheng

2010-05-01

Outliers in urban soil geochemical databases may imply potential contaminated land. Different methodologies which can be easily implemented for the identification of global and spatial outliers were applied for Pb concentrations in urban soils of Galway City in Ireland. Due to its strongly skewed probability feature, a Box-Cox transformation was performed prior to further analyses. The graphic methods of histogram and box-and-whisker plot were effective in identification of global outliers at the original scale of the dataset. Spatial outliers could be identified by a local indicator of spatial association of local Moran's I, cross-validation of kriging, and a geographically weighted regression. The spatial locations of outliers were visualised using a geographical information system. Different methods showed generally consistent results, but differences existed. It is suggested that outliers identified by statistical methods should be confirmed and justified using scientific knowledge before they are properly dealt with.
Outliers: A Potential Data Problem.

ERIC Educational Resources Information Center

Douzenis, Cordelia; Rakow, Ernest A.

Outliers, extreme data values relative to others in a sample, may distort statistics that assume internal levels of measurement and normal distribution. The outlier may be a valid value or an error. Several procedures are available for identifying outliers, and each may be applied to errors of prediction from the regression lines for utility in a…
A robust interpolation method for constructing digital elevation models from remote sensing data

NASA Astrophysics Data System (ADS)

Chen, Chuanfa; Liu, Fengying; Li, Yanyan; Yan, Changqing; Liu, Guolin

2016-09-01

A digital elevation model (DEM) derived from remote sensing data often suffers from outliers due to various reasons such as the physical limitation of sensors and low contrast of terrain textures. In order to reduce the effect of outliers on DEM construction, a robust algorithm of multiquadric (MQ) methodology based on M-estimators (MQ-M) was proposed. MQ-M adopts an adaptive weight function with three-parts. The weight function is null for large errors, one for small errors and quadric for others. A mathematical surface was employed to comparatively analyze the robustness of MQ-M, and its performance was compared with those of the classical MQ and a recently developed robust MQ method based on least absolute deviation (MQ-L). Numerical tests show that MQ-M is comparative to the classical MQ and superior to MQ-L when sample points follow normal and Laplace distributions, and under the presence of outliers the former is more accurate than the latter. A real-world example of DEM construction using stereo images indicates that compared with the classical interpolation methods, such as natural neighbor (NN), ordinary kriging (OK), ANUDEM, MQ-L and MQ, MQ-M has a better ability of preserving subtle terrain features. MQ-M replaces thin plate spline for reference DEM construction to assess the contribution to our recently developed multiresolution hierarchical classification method (MHC). Classifying the 15 groups of benchmark datasets provided by the ISPRS Commission demonstrates that MQ-M-based MHC is more accurate than MQ-L-based and TPS-based MHCs. MQ-M has high potential for DEM construction.

Comparison of cardiac TnI outliers using a contemporary and a high-sensitivity assay on the Abbott Architect platform.

PubMed

Ryan, J B; Southby, S J; Stuart, L A; Mackay, R; Florkowski, C M; George, P M

2014-07-01

Assays for cardiac troponin (cTn) have undergone improvements in sensitivity and precision in recent years. Increased rates of outliers, however, have been reported on various cTn platforms, typically giving irreproducible, falsely higher results. We aimed to evaluate the outlier rate occurring in patients with elevated cTnI using a contemporary and high-sensitivity assay. All patients with elevated cTnI (up to 300 ng/L) performed over a 21-month period were assayed in duplicate. A contemporary assay (Abbott STAT Troponin-I) was used for the first part of the study and subsequently a high-sensitivity assay (Abbott STAT High-Sensitive Troponin-I) was used. Outliers exceeded a calculated critical difference (CD) (CD = z × √2 × SDAnalytical) where z = 3.5 (for probability of 0.0005) and critical outliers also were on a different side of the decision level. The respective outlier and critical outlier rates were 0.22% and 0.10% for the contemporary assay (n = 4009) and 0.18% and 0.13% for the high-sensitivity assay (n = 3878). There was no significant reduction in outlier rate between the two assays (χ(2) = 0.034, P = 0.854). Fifty-six percent of outliers occurred in samples where cTn was an 'add-on' test (and was stored and refrigerated prior to assay). Despite recent improvements in cTn methods, outliers (including critical outliers) still occur at a low rate in both a contemporary and high-sensitivity cTnI assay. Laboratory and clinical staff should be aware of this potential analytical error, particularly in samples with suboptimal sample handling such as add-on tests. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Assessing significance in a Markov chain without mixing.

PubMed

Chikina, Maria; Frieze, Alan; Pegden, Wesley

2017-03-14

We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a [Formula: see text] value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a [Formula: see text] outlier compared with the sampled ranks (its rank is in the bottom [Formula: see text] of sampled ranks), then this observation should correspond to a [Formula: see text] value of [Formula: see text] This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an [Formula: see text]-outlier on the walk is significant at [Formula: see text] under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at [Formula: see text] is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting.
Adaptive population divergence and directional gene flow across steep elevational gradients in a climate-sensitive mammal.

PubMed

Waterhouse, Matthew D; Erb, Liesl P; Beever, Erik A; Russello, Michael A

2018-06-01

The ecological effects of climate change have been shown in most major taxonomic groups; however, the evolutionary consequences are less well-documented. Adaptation to new climatic conditions offers a potential long-term mechanism for species to maintain viability in rapidly changing environments, but mammalian examples remain scarce. The American pika (Ochotona princeps) has been impacted by recent climate-associated extirpations and range-wide reductions in population sizes, establishing it as a sentinel mammalian species for climate change. To investigate evidence for local adaptation and reconstruct patterns of genomic diversity and gene flow across rapidly changing environments, we used a space-for-time design and restriction site-associated DNA sequencing to genotype American pikas along two steep elevational gradients at 30,966 SNPs and employed independent outlier detection methods that scanned for genotype-environment associations. We identified 338 outlier SNPs detected by two separate analyses and/or replicated in both transects, several of which were annotated to genes involved in metabolic function and oxygen transport. Additionally, we found evidence of directional gene flow primarily downslope from high-elevation populations, along with reduced gene flow at outlier loci. If this trend continues, elevational range contractions in American pikas will likely be from local extirpation rather than upward movement of low-elevation individuals; this, in turn, could limit the potential for adaptation within this landscape. These findings are of particular relevance for future conservation and management of American pikas and other elevationally restricted, thermally sensitive species. © 2018 John Wiley & Sons Ltd.
Assessing significance in a Markov chain without mixing

PubMed Central

Chikina, Maria; Frieze, Alan; Pegden, Wesley

2017-01-01

We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a p value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a 0.1% outlier compared with the sampled ranks (its rank is in the bottom 0.1% of sampled ranks), then this observation should correspond to a p value of 0.001. This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an ε-outlier on the walk is significant at p=2ε under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at p≈ε is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting. PMID:28246331
75 FR 42835 - Medicare Program; Inpatient Rehabilitation Facility Prospective Payment System for Federal Fiscal...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-22

... estimated cost of the case exceeds the adjusted outlier threshold. We calculate the adjusted outlier... to 80 percent of the difference between the estimated cost of the case and the outlier threshold. In... Federal Prospective Payment Rates VI. Update to Payments for High-Cost Outliers under the IRF PPS A...
Indicator saturation: a novel approach to detect multiple breaks in geodetic time series.

NASA Astrophysics Data System (ADS)

Jackson, L. P.; Pretis, F.; Williams, S. D. P.

2016-12-01

Geodetic time series can record long term trends, quasi-periodic signals at a variety of time scales from days to decades, and sudden breaks due to natural or anthropogenic causes. The causes of breaks range from instrument replacement to earthquakes to unknown (i.e. no attributable cause). Furthermore, breaks can be permanent or short-lived and range at least two orders of magnitude in size (mm to 100's mm). To account for this range of possible signal-characteristics requires a flexible time series method that can distinguish between true and false breaks, outliers and time-varying trends. One such method, Indicator Saturation (IS) comes from the field of econometrics where analysing stochastic signals in these terms is a common problem. The IS approach differs from alternative break detection methods by considering every point in the time series as a break until it is demonstrated statistically that it is not. A linear model is constructed with a break function at every point in time, and all but statistically significant breaks are removed through a general-to-specific model selection algorithm for more variables than observations. The IS method is flexible because it allows multiple breaks of different forms (e.g. impulses, shifts in the mean, and changing trends) to be detected, while simultaneously modelling any underlying variation driven by additional covariates. We apply the IS method to identify breaks in a suite of synthetic GPS time series used for the Detection of Offsets in GPS Experiments (DOGEX). We optimise the method to maximise the ratio of true-positive to false-positive detections, which improves estimates of errors in the long term rates of land motion currently required by the GPS community.
A method for rapid, targeted CNV genotyping identifies rare variants associated with neurocognitive disease.

PubMed

Mefford, Heather C; Cooper, Gregory M; Zerr, Troy; Smith, Joshua D; Baker, Carl; Shafer, Neil; Thorland, Erik C; Skinner, Cindy; Schwartz, Charles E; Nickerson, Deborah A; Eichler, Evan E

2009-09-01

Copy-number variants (CNVs) are substantial contributors to human disease. A central challenge in CNV-disease association studies is to characterize the pathogenicity of rare and possibly incompletely penetrant events, which requires the accurate detection of rare CNVs in large numbers of individuals. Cost and throughput issues limit our ability to perform these studies. We have adapted the Illumina BeadXpress SNP genotyping assay and developed an algorithm, SNP-Conditional OUTlier detection (SCOUT), to rapidly and accurately detect both rare and common CNVs in large cohorts. This approach is customizable, cost effective, highly parallelized, and largely automated. We applied this method to screen 69 loci in 1105 children with unexplained intellectual disability, identifying pathogenic variants in 3.1% of these individuals and potentially pathogenic variants in an additional 2.3%. We identified seven individuals (0.7%) with a deletion of 16p11.2, which has been previously associated with autism. Our results widen the phenotypic spectrum of these deletions to include intellectual disability without autism. We also detected 1.65-3.4 Mbp duplications at 16p13.11 in 1.1% of affected individuals and 350 kbp deletions at 15q11.2, near the Prader-Willi/Angelman syndrome critical region, in 0.8% of affected individuals. Compared to published CNVs in controls they are significantly (P = 4.7 x 10(-5) and 0.003, respectively) enriched in these children, supporting previously published hypotheses that they are neurocognitive disease risk factors. More generally, this approach offers a previously unavailable balance between customization, cost, and throughput for analysis of CNVs and should prove valuable for targeted CNV detection in both research and diagnostic settings.
Atlas-guided cluster analysis of large tractography datasets.

PubMed

Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

2013-01-01

Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.
Outlier Detection in High-Stakes Certification Testing. Research Report.

ERIC Educational Resources Information Center

Meijer, Rob R.

Recent developments of person-fit analysis in computerized adaptive testing (CAT) are discussed. Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to…
Outlier Detection in High-Stakes Certification Testing.

ERIC Educational Resources Information Center

Meijer, Rob R.

2002-01-01

Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)
Evaluation of statistical protocols for quality control of ecosystem carbon dioxide fluxes

Treesearch

Jorge F. Perez-Quezada; Nicanor Z. Saliendra; William E. Emmerich; Emilio A. Laca

2007-01-01

The process of quality control of micrometeorological and carbon dioxide (CO2) flux data can be subjective and may lack repeatability, which would undermine the results of many studies. Multivariate statistical methods and time series analysis were used together and independently to detect and replace outliers in CO2 flux...
Methodology to assess clinical liver safety data.

PubMed

Merz, Michael; Lee, Kwan R; Kullak-Ublick, Gerd A; Brueckner, Andreas; Watkins, Paul B

2014-11-01

Analysis of liver safety data has to be multivariate by nature and needs to take into account time dependency of observations. Current standard tools for liver safety assessment such as summary tables, individual data listings, and narratives address these requirements to a limited extent only. Using graphics in the context of a systematic workflow including predefined graph templates is a valuable addition to standard instruments, helping to ensure completeness of evaluation, and supporting both hypothesis generation and testing. Employing graphical workflows interactively allows analysis in a team-based setting and facilitates identification of the most suitable graphics for publishing and regulatory reporting. Another important tool is statistical outlier detection, accounting for the fact that for assessment of Drug-Induced Liver Injury, identification and thorough evaluation of extreme values has much more relevance than measures of central tendency in the data. Taken together, systematical graphical data exploration and statistical outlier detection may have the potential to significantly improve assessment and interpretation of clinical liver safety data. A workshop was convened to discuss best practices for the assessment of drug-induced liver injury (DILI) in clinical trials.
Lesion identification using unified segmentation-normalisation models and fuzzy clustering

PubMed Central

Seghier, Mohamed L.; Ramlackhansingh, Anil; Crinion, Jenny; Leff, Alexander P.; Price, Cathy J.

2008-01-01

In this paper, we propose a new automated procedure for lesion identification from single images based on the detection of outlier voxels. We demonstrate the utility of this procedure using artificial and real lesions. The scheme rests on two innovations: First, we augment the generative model used for combined segmentation and normalization of images, with an empirical prior for an atypical tissue class, which can be optimised iteratively. Second, we adopt a fuzzy clustering procedure to identify outlier voxels in normalised gray and white matter segments. These two advances suppress misclassification of voxels and restrict lesion identification to gray/white matter lesions respectively. Our analyses show a high sensitivity for detecting and delineating brain lesions with different sizes, locations, and textures. Our approach has important implications for the generation of lesion overlap maps of a given population and the assessment of lesion-deficit mappings. From a clinical perspective, our method should help to compute the total volume of lesion or to trace precisely lesion boundaries that might be pertinent for surgical or diagnostic purposes. PMID:18482850
Cycle bases to the rescue

NASA Astrophysics Data System (ADS)

Tóbiás, Roland; Furtenbacher, Tibor; Császár, Attila G.

2017-12-01

Cycle bases of graph theory are introduced for the analysis of transition data deposited in line-by-line rovibronic spectroscopic databases. The principal advantage of using cycle bases is that outlier transitions -almost always present in spectroscopic databases built from experimental data originating from many different sources- can be detected and identified straightforwardly and automatically. The data available for six water isotopologues, H216O, H217O, H218O, HD16O, HD17O, and HD18O, in the HITRAN2012 and GEISA2015 databases are used to demonstrate the utility of cycle-basis-based outlier-detection approaches. The spectroscopic databases appear to be sufficiently complete so that the great majority of the entries of the minimum cycle basis have the minimum possible length of four. More than 2000 transition conflicts have been identified for the isotopologue H216O in the HITRAN2012 database, the seven common conflict types are discussed. It is recommended to employ cycle bases, and especially a minimum cycle basis, for the analysis of transitions deposited in high-resolution spectroscopic databases.
Local sparse bump hunting reveals molecular heterogeneity of colon tumors‡

PubMed Central

Dazard, Jean-Eudes; Rao, J. Sunil; Markowitz, Sanford

2013-01-01

The question of molecular heterogeneity and of tumoral phenotype in cancer remains unresolved. To understand the underlying molecular basis of this phenomenon, we analyzed genome-wide expression data of colon cancer metastasis samples, as these tumors are the most advanced and hence would be anticipated to be the most likely heterogeneous group of tumors, potentially exhibiting the maximum amount of genetic heterogeneity. Casting a statistical net around such a complex problem proves difficult because of the high dimensionality and multi-collinearity of the gene expression space, combined with the fact that genes act in concert with one another and that not all genes surveyed might be involved. We devise a strategy to identify distinct subgroups of samples and determine the genetic/molecular signature that defines them. This involves use of the local sparse bump hunting algorithm, which provides a much more optimal and biologically faithful transformed space within which to search for bumps. In addition, thanks to the variable selection feature of the algorithm, we derived a novel sparse gene expression signature, which appears to divide all colon cancer patients into two populations: a population whose expression pattern can be molecularly encompassed within the bump and an outlier population that cannot be. Although all patients within any given stage of the disease, including the metastatic group, appear clinically homogeneous, our procedure revealed two subgroups in each stage with distinct genetic/molecular profiles. We also discuss implications of such a finding in terms of early detection, diagnosis and prognosis. PMID:22052459
Improvement of statistical methods for detecting anomalies in climate and environmental monitoring systems

NASA Astrophysics Data System (ADS)

Yakunin, A. G.; Hussein, H. M.

2018-01-01

The article shows how the known statistical methods, which are widely used in solving financial problems and a number of other fields of science and technology, can be effectively applied after minor modification for solving such problems in climate and environment monitoring systems, as the detection of anomalies in the form of abrupt changes in signal levels, the occurrence of positive and negative outliers and the violation of the cycle form in periodic processes.
Single nucleotide polymorphisms unravel hierarchical divergence and signatures of selection among Alaskan sockeye salmon (Oncorhynchus nerka) populations.

PubMed

Gomez-Uchida, Daniel; Seeb, James E; Smith, Matt J; Habicht, Christopher; Quinn, Thomas P; Seeb, Lisa W

2011-02-18

Disentangling the roles of geography and ecology driving population divergence and distinguishing adaptive from neutral evolution at the molecular level have been common goals among evolutionary and conservation biologists. Using single nucleotide polymorphism (SNP) multilocus genotypes for 31 sockeye salmon (Oncorhynchus nerka) populations from the Kvichak River, Alaska, we assessed the relative roles of geography (discrete boundaries or continuous distance) and ecology (spawning habitat and timing) driving genetic divergence in this species at varying spatial scales within the drainage. We also evaluated two outlier detection methods to characterize candidate SNPs responding to environmental selection, emphasizing which mechanism(s) may maintain the genetic variation of outlier loci. For the entire drainage, Mantel tests suggested a greater role of geographic distance on population divergence than differences in spawn timing when each variable was correlated with pairwise genetic distances. Clustering and hierarchical analyses of molecular variance indicated that the largest genetic differentiation occurred between populations from distinct lakes or subdrainages. Within one population-rich lake, however, Mantel tests suggested a greater role of spawn timing than geographic distance on population divergence when each variable was correlated with pairwise genetic distances. Variable spawn timing among populations was linked to specific spawning habitats as revealed by principal coordinate analyses. We additionally identified two outlier SNPs located in the major histocompatibility complex (MHC) class II that appeared robust to violations of demographic assumptions from an initial pool of eight candidates for selection. First, our results suggest that geography and ecology have influenced genetic divergence between Alaskan sockeye salmon populations in a hierarchical manner depending on the spatial scale. Second, we found consistent evidence for diversifying selection in two loci located in the MHC class II by means of outlier detection methods; yet, alternative scenarios for the evolution of these loci were also evaluated. Both conclusions argue that historical contingency and contemporary adaptation have likely driven differentiation between Kvichak River sockeye salmon populations, as revealed by a suite of SNPs. Our findings highlight the need for conservation of complex population structure, because it provides resilience in the face of environmental change, both natural and anthropogenic.
Single nucleotide polymorphisms unravel hierarchical divergence and signatures of selection among Alaskan sockeye salmon (Oncorhynchus nerka) populations

PubMed Central

2011-01-01

Background Disentangling the roles of geography and ecology driving population divergence and distinguishing adaptive from neutral evolution at the molecular level have been common goals among evolutionary and conservation biologists. Using single nucleotide polymorphism (SNP) multilocus genotypes for 31 sockeye salmon (Oncorhynchus nerka) populations from the Kvichak River, Alaska, we assessed the relative roles of geography (discrete boundaries or continuous distance) and ecology (spawning habitat and timing) driving genetic divergence in this species at varying spatial scales within the drainage. We also evaluated two outlier detection methods to characterize candidate SNPs responding to environmental selection, emphasizing which mechanism(s) may maintain the genetic variation of outlier loci. Results For the entire drainage, Mantel tests suggested a greater role of geographic distance on population divergence than differences in spawn timing when each variable was correlated with pairwise genetic distances. Clustering and hierarchical analyses of molecular variance indicated that the largest genetic differentiation occurred between populations from distinct lakes or subdrainages. Within one population-rich lake, however, Mantel tests suggested a greater role of spawn timing than geographic distance on population divergence when each variable was correlated with pairwise genetic distances. Variable spawn timing among populations was linked to specific spawning habitats as revealed by principal coordinate analyses. We additionally identified two outlier SNPs located in the major histocompatibility complex (MHC) class II that appeared robust to violations of demographic assumptions from an initial pool of eight candidates for selection. Conclusions First, our results suggest that geography and ecology have influenced genetic divergence between Alaskan sockeye salmon populations in a hierarchical manner depending on the spatial scale. Second, we found consistent evidence for diversifying selection in two loci located in the MHC class II by means of outlier detection methods; yet, alternative scenarios for the evolution of these loci were also evaluated. Both conclusions argue that historical contingency and contemporary adaptation have likely driven differentiation between Kvichak River sockeye salmon populations, as revealed by a suite of SNPs. Our findings highlight the need for conservation of complex population structure, because it provides resilience in the face of environmental change, both natural and anthropogenic. PMID:21332997
Development of a computerized monitoring program to identify narcotic diversion in a pediatric anesthesia practice.

PubMed

Brenn, B Randall; Kim, Margaret A; Hilmas, Elora

2015-08-15

Development of an operational reporting dashboard designed to correlate data from multiple sources to help detect potential drug diversion by automated dispensing cabinet (ADC) users is described. A commercial business intelligence platform was used to create a dashboard tool for rapid detection of unusual patterns of ADC transactions by anesthesia service providers at a large pediatric hospital. By linking information from the hospital's pharmacy information management system (PIMS) and anesthesia information management system (AIMS) in an associative data model, the "narcotic reconciliation dashboard" can generate various reports to help spot outlier activity associated with ADC dispensing of controlled substances and documentation of medication waste processing. The dashboard's utility was evaluated by "back-testing" the program with historical data on an actual episode of diversion by an anesthesia provider that had not been detected through traditional methods of PIMS and AIMS data monitoring. Dashboard-generated reports on key metrics (e.g., ADC transaction counts, discrepancies in dispensed versus reconciled amounts of narcotics, PIMS-AIMS documentation mismatches) over various time frames during the period of known diversion clearly indicated the diverter's outlier status relative to other authorized ADC users. A dashboard program for correlating ADC transaction data with pharmacy and patient care data may be an effective tool for detecting patterns of ADC use that suggest drug diversion. Copyright © 2015 by the American Society of Health-System Pharmacists, Inc. All rights reserved.
Sensor Fusion of Position- and Micro-Sensors (MEMS) integrated in a Wireless Sensor Network for movement detection in landslide areas

NASA Astrophysics Data System (ADS)

Arnhardt, Christian; Fernández-Steeger, Tomas; Azzam, Rafig

2010-05-01

Monitoring systems in landslide areas are important elements of effective Early Warning structures. Data acquisition and retrieval allows the detection of movement processes and thus is essential to generate warnings in time. Apart from the precise measurement, the reliability of data is fundamental, because outliers can trigger false alarms and leads to the loss of acceptance of such systems. For the monitoring of mass movements and their risk it is important to know, if there is movement, how fast it is and how trustworthy is the information. The joint project "Sensorbased landslide early warning system" (SLEWS) deals with these questions, and tries to improve data quality and to reduce false alarm rates, due to the combination of sensor date (sensor fusion). The project concentrates on the development of a prototypic Alarm- and Early Warning system (EWS) for different types of landslides by using various low-cost sensors, integrated in a wireless sensor network (WSN). The network consists of numerous connection points (nodes) that transfer data directly or over other nodes (Multi-Hop) in real-time to a data collection point (gateway). From there all the data packages are transmitted to a spatial data infrastructure (SDI) for further processing, analyzing and visualizing with respect to end-user specifications. The ad-hoc characteristic of the network allows the autonomous crosslinking of the nodes according to existing connections and communication strength. Due to the independent finding of new or more stable connections (self healing) a breakdown of the whole system is avoided. The bidirectional data stream enables the receiving of data from the network but also allows the transfer of commands and pointed requests into the WSN. For the detection of surface deformations in landslide areas small low-cost Micro-Electro-Mechanical-Systems (MEMS) and positionsensors from the automobile industries, different industrial applications and from other measurement technologies were chosen. The MEMS-Sensors are acceleration-, tilt- and barometric pressure sensors. The positionsensors are draw wire and linear displacement transducers. In first laboratory tests the accuracy and resolution were investigated. The tests showed good results for all sensors. For example tilt-movements can be monitored with an accuracy of +/- 0,06° and a resolution of 0,1°. With the displacement transducer change in length of >0,1mm is possible. Apart from laboratory tests, field tests in South France and Germany were done to prove data stability and movement detection under real conditions. The results obtained were very satisfying, too. In the next step the combination of numerous sensors (sensor fusion) of the same type (redundancy) or different types (complementary) was researched. Different experiments showed that there is a high concordance between identical sensor-types. According to different sensor parameters (sensitivity, accuracy, resolution) some sensor-types can identify changes earlier. Taking this into consideration, good correlations between different kinds of sensors were achieved, too. Thus the experiments showed that combination of sensors is possible and this could improve the detection of movement and movement rate but also outliers. Based on this results various algorithms were setup that include different statistical methods (outlier tests, testing of hypotheses) and procedures from decision theories (Hurwicz-criteria). These calculation formulas will be implemented in the spatial data infrastructure (SDI) for the further data processing and validation. In comparison with today existing mainly punctually working monitoring systems, the application of wireless sensor networks in combination with low-cost, but precise micro-sensors provides an inexpensive and easy to set up monitoring system also in large areas. The correlation of same but also different sensor-types permits a good data control. Thus the sensor fusion is a promising tool to detect movement more reliable and thus contributes essential to the improvement of Early Warning Systems.

Outlier identification and visualization for Pb concentrations in urban soils and its implications for identification of potential contaminated land.

PubMed

Zhang, Chaosheng; Tang, Ya; Luo, Lin; Xu, Weilin

2009-11-01

Outliers in urban soil geochemical databases may imply potential contaminated land. Different methodologies which can be easily implemented for the identification of global and spatial outliers were applied for Pb concentrations in urban soils of Galway City in Ireland. Due to its strongly skewed probability feature, a Box-Cox transformation was performed prior to further analyses. The graphic methods of histogram and box-and-whisker plot were effective in identification of global outliers at the original scale of the dataset. Spatial outliers could be identified by a local indicator of spatial association of local Moran's I, cross-validation of kriging, and a geographically weighted regression. The spatial locations of outliers were visualised using a geographical information system. Different methods showed generally consistent results, but differences existed. It is suggested that outliers identified by statistical methods should be confirmed and justified using scientific knowledge before they are properly dealt with.
Outliers to the peak energy-isotropic energy relation in gamma-ray bursts

NASA Astrophysics Data System (ADS)

Nakar, Ehud; Piran, Tsvi

2005-06-01

The peak energy-isotropic energy (EpEi) relation is among the most intriguing recent discoveries concerning gamma-ray bursts (GRBs). It can have numerous implications for our understanding of the emission mechanism of the bursts and for the application of GRBs to cosmological studies. However, this relation has been verified only for a small sample of bursts with measured redshifts. We propose here a test of whether a burst with an unknown redshift can potentially satisfy the EpEi relation. Applying this test to a large sample of BATSE bursts, we find that a significant fraction of those bursts cannot satisfy this relation. Our test is sensitive only to dim and hard bursts, and therefore this relation might still hold as an inequality (i.e. there are no intrinsically bright and soft bursts). We conclude that the observed relation seen in the sample of bursts with known redshift might be influenced by observational biases and the inability to locate and to localize well hard and weak bursts that have only a small number of photons. In particular, we point out that the threshold for detection, localization and redshift measurement is essentially higher than the threshold for detection alone. We predict that Swift will detect some hard and weak bursts that would be outliers to the EpEi relation. However, we cannot quantify this prediction. We stress the importance of understanding the detection-localization-redshift threshold for the coming Swift detections.
Robust Kriged Kalman Filtering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baingana, Brian; Dall'Anese, Emiliano; Mateos, Gonzalo

2015-11-11

Although the kriged Kalman filter (KKF) has well-documented merits for prediction of spatial-temporal processes, its performance degrades in the presence of outliers due to anomalous events, or measurement equipment failures. This paper proposes a robust KKF model that explicitly accounts for presence of measurement outliers. Exploiting outlier sparsity, a novel l1-regularized estimator that jointly predicts the spatial-temporal process at unmonitored locations, while identifying measurement outliers is put forth. Numerical tests are conducted on a synthetic Internet protocol (IP) network, and real transformer load data. Test results corroborate the effectiveness of the novel estimator in joint spatial prediction and outlier identification.
Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation.

PubMed

Adikaram, K K L B; Hussein, M A; Effenberger, M; Becker, T

2015-01-01

Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio Rmax of amax - amin and Sn - amin*n and that of Rmin of amax - amin and amax*n - Sn are always equal to 2/n, where amax is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax > 2/n and Rmin > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k1) and 2/n * (1 + k2), respectively, where k1 > k2 and 0 ≤ k1 ≤ n/2 - 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10-4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.
Features of Cross-Correlation Analysis in a Data-Driven Approach for Structural Damage Assessment

PubMed Central

Camacho Navarro, Jhonatan; Ruiz, Magda; Villamizar, Rodolfo; Mujica, Luis

2018-01-01

This work discusses the advantage of using cross-correlation analysis in a data-driven approach based on principal component analysis (PCA) and piezodiagnostics to obtain successful diagnosis of events in structural health monitoring (SHM). In this sense, the identification of noisy data and outliers, as well as the management of data cleansing stages can be facilitated through the implementation of a preprocessing stage based on cross-correlation functions. Additionally, this work evidences an improvement in damage detection when the cross-correlation is included as part of the whole damage assessment approach. The proposed methodology is validated by processing data measurements from piezoelectric devices (PZT), which are used in a piezodiagnostics approach based on PCA and baseline modeling. Thus, the influence of cross-correlation analysis used in the preprocessing stage is evaluated for damage detection by means of statistical plots and self-organizing maps. Three laboratory specimens were used as test structures in order to demonstrate the validity of the methodology: (i) a carbon steel pipe section with leak and mass damage types, (ii) an aircraft wing specimen, and (iii) a blade of a commercial aircraft turbine, where damages are specified as mass-added. As the main concluding remark, the suitability of cross-correlation features combined with a PCA-based piezodiagnostic approach in order to achieve a more robust damage assessment algorithm is verified for SHM tasks. PMID:29762505
Features of Cross-Correlation Analysis in a Data-Driven Approach for Structural Damage Assessment.

PubMed

Camacho Navarro, Jhonatan; Ruiz, Magda; Villamizar, Rodolfo; Mujica, Luis; Quiroga, Jabid

2018-05-15

This work discusses the advantage of using cross-correlation analysis in a data-driven approach based on principal component analysis (PCA) and piezodiagnostics to obtain successful diagnosis of events in structural health monitoring (SHM). In this sense, the identification of noisy data and outliers, as well as the management of data cleansing stages can be facilitated through the implementation of a preprocessing stage based on cross-correlation functions. Additionally, this work evidences an improvement in damage detection when the cross-correlation is included as part of the whole damage assessment approach. The proposed methodology is validated by processing data measurements from piezoelectric devices (PZT), which are used in a piezodiagnostics approach based on PCA and baseline modeling. Thus, the influence of cross-correlation analysis used in the preprocessing stage is evaluated for damage detection by means of statistical plots and self-organizing maps. Three laboratory specimens were used as test structures in order to demonstrate the validity of the methodology: (i) a carbon steel pipe section with leak and mass damage types, (ii) an aircraft wing specimen, and (iii) a blade of a commercial aircraft turbine, where damages are specified as mass-added. As the main concluding remark, the suitability of cross-correlation features combined with a PCA-based piezodiagnostic approach in order to achieve a more robust damage assessment algorithm is verified for SHM tasks.
All-sky search for periodic gravitational waves in the full S5 LIGO data

NASA Astrophysics Data System (ADS)

Abadie, J.; Abbott, B. P.; Abbott, R.; Abbott, T. D.; Abernathy, M.; Accadia, T.; Acernese, F.; Adams, C.; Adhikari, R.; Affeldt, C.; Ajith, P.; Allen, B.; Allen, G. S.; Amador Ceron, E.; Amariutei, D.; Amin, R. S.; Anderson, S. B.; Anderson, W. G.; Arai, K.; Arain, M. A.; Araya, M. C.; Aston, S. M.; Astone, P.; Atkinson, D.; Aufmuth, P.; Aulbert, C.; Aylott, B. E.; Babak, S.; Baker, P.; Ballardin, G.; Ballmer, S.; Barker, D.; Barone, F.; Barr, B.; Barriga, P.; Barsotti, L.; Barsuglia, M.; Barton, M. A.; Bartos, I.; Bassiri, R.; Bastarrika, M.; Basti, A.; Batch, J.; Bauchrowitz, J.; Bauer, Th. S.; Bebronne, M.; Behnke, B.; Beker, M. G.; Bell, A. S.; Belletoile, A.; Belopolski, I.; Benacquista, M.; Berliner, J. M.; Bertolini, A.; Betzwieser, J.; Beveridge, N.; Beyersdorf, P. T.; Bilenko, I. A.; Billingsley, G.; Birch, J.; Biswas, R.; Bitossi, M.; Bizouard, M. A.; Black, E.; Blackburn, J. K.; Blackburn, L.; Blair, D.; Bland, B.; Blom, M.; Bock, O.; Bodiya, T. P.; Bogan, C.; Bondarescu, R.; Bondu, F.; Bonelli, L.; Bonnand, R.; Bork, R.; Born, M.; Boschi, V.; Bose, S.; Bosi, L.; Bouhou, B.; Braccini, S.; Bradaschia, C.; Brady, P. R.; Braginsky, V. B.; Branchesi, M.; Brau, J. E.; Breyer, J.; Briant, T.; Bridges, D. O.; Brillet, A.; Brinkmann, M.; Brisson, V.; Britzger, M.; Brooks, A. F.; Brown, D. A.; Brummit, A.; Bulik, T.; Bulten, H. J.; Buonanno, A.; Burguet–Castell, J.; Burmeister, O.; Buskulic, D.; Buy, C.; Byer, R. L.; Cadonati, L.; Cagnoli, G.; Calloni, E.; Camp, J. B.; Campsie, P.; Cannizzo, J.; Cannon, K.; Canuel, B.; Cao, J.; Capano, C. D.; Carbognani, F.; Caride, S.; Caudill, S.; Cavaglià, M.; Cavalier, F.; Cavalieri, R.; Cella, G.; Cepeda, C.; Cesarini, E.; Chaibi, O.; Chalermsongsak, T.; Chalkley, E.; Charlton, P.; Chassande-Mottin, E.; Chelkowski, S.; Chen, Y.; Chincarini, A.; Chiummo, A.; Cho, H.; Christensen, N.; Chua, S. S. Y.; Chung, C. T. Y.; Chung, S.; Ciani, G.; Clara, F.; Clark, D. E.; Clark, J.; Clayton, J. H.; Cleva, F.; Coccia, E.; Cohadon, P.-F.; Colacino, C. N.; Colas, J.; Colla, A.; Colombini, M.; Conte, A.; Conte, R.; Cook, D.; Corbitt, T. R.; Cordier, M.; Cornish, N.; Corsi, A.; Costa, C. A.; Coughlin, M.; Coulon, J.-P.; Couvares, P.; Coward, D. M.; Coyne, D. C.; Creighton, J. D. E.; Creighton, T. D.; Cruise, A. M.; Cumming, A.; Cunningham, L.; Cuoco, E.; Cutler, R. M.; Dahl, K.; Danilishin, S. L.; Dannenberg, R.; D'Antonio, S.; Danzmann, K.; Dattilo, V.; Daudert, B.; Daveloza, H.; Davier, M.; Davies, G.; Daw, E. J.; Day, R.; Dayanga, T.; de Rosa, R.; Debra, D.; Debreczeni, G.; Degallaix, J.; Del Pozzo, W.; Del Prete, M.; Dent, T.; Dergachev, V.; Derosa, R.; Desalvo, R.; Dhurandhar, S.; di Fiore, L.; Diguglielmo, J.; di Lieto, A.; di Palma, I.; di Paolo Emilio, M.; di Virgilio, A.; Díaz, M.; Dietz, A.; Donovan, F.; Dooley, K. L.; Dorsher, S.; Drago, M.; Drever, R. W. P.; Driggers, J. C.; Du, Z.; Dumas, J.-C.; Dwyer, S.; Eberle, T.; Edgar, M.; Edwards, M.; Effler, A.; Ehrens, P.; Endrőczi, G.; Engel, R.; Etzel, T.; Evans, K.; Evans, M.; Evans, T.; Factourovich, M.; Fafone, V.; Fairhurst, S.; Fan, Y.; Farr, B. F.; Farr, W.; Fazi, D.; Fehrmann, H.; Feldbaum, D.; Ferrante, I.; Fidecaro, F.; Finn, L. S.; Fiori, I.; Fisher, R. P.; Flaminio, R.; Flanigan, M.; Foley, S.; Forsi, E.; Forte, L. A.; Fotopoulos, N.; Fournier, J.-D.; Franc, J.; Frasca, S.; Frasconi, F.; Frede, M.; Frei, M.; Frei, Z.; Freise, A.; Frey, R.; Fricke, T. T.; Friedrich, D.; Fritschel, P.; Frolov, V. V.; Fulda, P. J.; Fyffe, M.; Galimberti, M.; Gammaitoni, L.; Ganija, M. R.; Garcia, J.; Garofoli, J. A.; Garufi, F.; Gáspár, M. E.; Gemme, G.; Geng, R.; Genin, E.; Gennai, A.; Gergely, L. Á.; Ghosh, S.; Giaime, J. A.; Giampanis, S.; Giardina, K. D.; Giazotto, A.; Gill, C.; Goetz, E.; Goggin, L. M.; González, G.; Gorodetsky, M. L.; Goßler, S.; Gouaty, R.; Graef, C.; Granata, M.; Grant, A.; Gras, S.; Gray, C.; Gray, N.; Greenhalgh, R. J. S.; Gretarsson, A. M.; Greverie, C.; Grosso, R.; Grote, H.; Grunewald, S.; Guidi, G. M.; Guido, C.; Gupta, R.; Gustafson, E. K.; Gustafson, R.; Ha, T.; Hage, B.; Hallam, J. M.; Hammer, D.; Hammond, G.; Hanks, J.; Hanna, C.; Hanson, J.; Harms, J.; Harry, G. M.; Harry, I. W.; Harstad, E. D.; Hartman, M. T.; Haughian, K.; Hayama, K.; Hayau, J.-F.; Hayler, T.; Heefner, J.; Heidmann, A.; Heintze, M. C.; Heitmann, H.; Hello, P.; Hendry, M. A.; Heng, I. S.; Heptonstall, A. W.; Herrera, V.; Hewitson, M.; Hild, S.; Hoak, D.; Hodge, K. A.; Holt, K.; Hong, T.; Hooper, S.; Hosken, D. J.; Hough, J.; Howell, E. J.; Hughey, B.; Husa, S.; Huttner, S. H.; Huynh-Dinh, T.; Ingram, D. R.; Inta, R.; Isogai, T.; Ivanov, A.; Izumi, K.; Jacobson, M.; Jang, H.; Jaranowski, P.; Johnson, W. W.; Jones, D. I.; Jones, G.; Jones, R.; Ju, L.; Kalmus, P.; Kalogera, V.; Kamaretsos, I.; Kandhasamy, S.; Kang, G.; Kanner, J. B.; Katsavounidis, E.; Katzman, W.; Kaufer, H.; Kawabe, K.; Kawamura, S.; Kawazoe, F.; Kells, W.; Keppel, D. G.; Keresztes, Z.; Khalaidovski, A.; Khalili, F. Y.; Khazanov, E. A.; Kim, B.; Kim, C.; Kim, D.; Kim, H.; Kim, K.; Kim, N.; Kim, Y.-M.; King, P. J.; Kinsey, M.; Kinzel, D. L.; Kissel, J. S.; Klimenko, S.; Kokeyama, K.; Kondrashov, V.; Kopparapu, R.; Koranda, S.; Korth, W. Z.; Kowalska, I.; Kozak, D.; Kringel, V.; Krishnamurthy, S.; Krishnan, B.; Królak, A.; Kuehn, G.; Kumar, R.; Kwee, P.; Lam, P. K.; Landry, M.; Lang, M.; Lantz, B.; Lastzka, N.; Lawrie, C.; Lazzarini, A.; Leaci, P.; Lee, C. H.; Lee, H. M.; Leindecker, N.; Leong, J. R.; Leonor, I.; Leroy, N.; Letendre, N.; Li, J.; Li, T. G. F.; Liguori, N.; Lindquist, P. E.; Lockerbie, N. A.; Lodhia, D.; Lorenzini, M.; Loriette, V.; Lormand, M.; Losurdo, G.; Luan, J.; Lubinski, M.; Lück, H.; Lundgren, A. P.; MacDonald, E.; Machenschalk, B.; Macinnis, M.; MacLeod, D. M.; Mageswaran, M.; Mailand, K.; Majorana, E.; Maksimovic, I.; Man, N.; Mandel, I.; Mandic, V.; Mantovani, M.; Marandi, A.; Marchesoni, F.; Marion, F.; Márka, S.; Márka, Z.; Markosyan, A.; Maros, E.; Marque, J.; Martelli, F.; Martin, I. W.; Martin, R. M.; Marx, J. N.; Mason, K.; Masserot, A.; Matichard, F.; Matone, L.; Matzner, R. A.; Mavalvala, N.; Mazzolo, G.; McCarthy, R.; McClelland, D. E.; McGuire, S. C.; McIntyre, G.; McIver, J.; McKechan, D. J. A.; Meadors, G. D.; Mehmet, M.; Meier, T.; Melatos, A.; Melissinos, A. C.; Mendell, G.; Menendez, D.; Mercer, R. A.; Meshkov, S.; Messenger, C.; Meyer, M. S.; Miao, H.; Michel, C.; Milano, L.; Miller, J.; Minenkov, Y.; Mitrofanov, V. P.; Mitselmakher, G.; Mittleman, R.; Miyakawa, O.; Moe, B.; Moesta, P.; Mohan, M.; Mohanty, S. D.; Mohapatra, S. R. P.; Moraru, D.; Moreno, G.; Morgado, N.; Morgia, A.; Mori, T.; Mosca, S.; Mossavi, K.; Mours, B.; Mow-Lowry, C. M.; Mueller, C. L.; Mueller, G.; Mukherjee, S.; Mullavey, A.; Müller-Ebhardt, H.; Munch, J.; Murphy, D.; Murray, P. G.; Mytidis, A.; Nash, T.; Naticchioni, L.; Nawrodt, R.; Necula, V.; Nelson, J.; Newton, G.; Nishizawa, A.; Nocera, F.; Nolting, D.; Nuttall, L.; Ochsner, E.; O'Dell, J.; Oelker, E.; Ogin, G. H.; Oh, J. J.; Oh, S. H.; Oldenburg, R. G.; O'Reilly, B.; O'Shaughnessy, R.; Osthelder, C.; Ott, C. D.; Ottaway, D. J.; Ottens, R. S.; Overmier, H.; Owen, B. J.; Page, A.; Pagliaroli, G.; Palladino, L.; Palomba, C.; Pan, Y.; Pankow, C.; Paoletti, F.; Papa, M. A.; Parisi, M.; Pasqualetti, A.; Passaquieti, R.; Passuello, D.; Patel, P.; Pedraza, M.; Peiris, P.; Pekowsky, L.; Penn, S.; Peralta, C.; Perreca, A.; Persichetti, G.; Phelps, M.; Pickenpack, M.; Piergiovanni, F.; Pietka, M.; Pinard, L.; Pinto, I. M.; Pitkin, M.; Pletsch, H. J.; Plissi, M. V.; Poggiani, R.; Pöld, J.; Postiglione, F.; Prato, M.; Predoi, V.; Price, L. R.; Prijatelj, M.; Principe, M.; Privitera, S.; Prix, R.; Prodi, G. A.; Prokhorov, L.; Puncken, O.; Punturo, M.; Puppo, P.; Quetschke, V.; Raab, F. J.; Rabeling, D. S.; Rácz, I.; Radkins, H.; Raffai, P.; Rakhmanov, M.; Ramet, C. R.; Rankins, B.; Rapagnani, P.; Raymond, V.; Re, V.; Redwine, K.; Reed, C. M.; Reed, T.; Regimbau, T.; Reid, S.; Reitze, D. H.; Ricci, F.; Riesen, R.; Riles, K.; Robertson, N. A.; Robinet, F.; Robinson, C.; Robinson, E. L.; Rocchi, A.; Roddy, S.; Rodriguez, C.; Rodruck, M.; Rolland, L.; Rollins, J.; Romano, J. D.; Romano, R.; Romie, J. H.; Rosińska, D.; Röver, C.; Rowan, S.; Rüdiger, A.; Ruggi, P.; Ryan, K.; Ryll, H.; Sainathan, P.; Sakosky, M.; Salemi, F.; Samblowski, A.; Sammut, L.; Sancho de La Jordana, L.; Sandberg, V.; Sankar, S.; Sannibale, V.; Santamaría, L.; Santiago-Prieto, I.; Santostasi, G.; Sassolas, B.; Sathyaprakash, B. S.; Sato, S.; Saulson, P. R.; Savage, R. L.; Schilling, R.; Schlamminger, S.; Schnabel, R.; Schofield, R. M. S.; Schulz, B.; Schutz, B. F.; Schwinberg, P.; Scott, J.; Scott, S. M.; Searle, A. C.; Seifert, F.; Sellers, D.; Sengupta, A. S.; Sentenac, D.; Sergeev, A.; Shaddock, D. A.; Shaltev, M.; Shapiro, B.; Shawhan, P.; Shoemaker, D. H.; Sibley, A.; Siemens, X.; Sigg, D.; Singer, A.; Singer, L.; Sintes, A. M.; Skelton, G.; Slagmolen, B. J. J.; Slutsky, J.; Smith, J. R.; Smith, M. R.; Smith, N. D.; Smith, R. J. E.; Somiya, K.; Sorazu, B.; Soto, J.; Speirits, F. C.; Sperandio, L.; Stefszky, M.; Stein, A. J.; Steinert, E.; Steinlechner, J.; Steinlechner, S.; Steplewski, S.; Stochino, A.; Stone, R.; Strain, K. A.; Strigin, S.; Stroeer, A. S.; Sturani, R.; Stuver, A. L.; Summerscales, T. Z.; Sung, M.; Susmithan, S.; Sutton, P. J.; Swinkels, B.; Tacca, M.; Taffarello, L.; Talukder, D.; Tanner, D. B.; Tarabrin, S. P.; Taylor, J. R.; Taylor, R.; Thomas, P.; Thorne, K. A.; Thorne, K. S.; Thrane, E.; Thüring, A.; Titsler, C.; Tokmakov, K. V.; Toncelli, A.; Tonelli, M.; Torre, O.; Torres, C.; Torrie, C. I.; Tournefier, E.; Travasso, F.; Traylor, G.; Trias, M.; Tseng, K.; Ugolini, D.; Urbanek, K.; Vahlbruch, H.; Vajente, G.; Vallisneri, M.; van den Brand, J. F. J.; van den Broeck, C.; van der Putten, S.; van Veggel, A. A.; Vass, S.; Vasuth, M.; Vaulin, R.; Vavoulidis, M.; Vecchio, A.; Vedovato, G.; Veitch, J.; Veitch, P. J.; Veltkamp, C.; Verkindt, D.; Vetrano, F.; Viceré, A.; Villar, A. E.; Vinet, J.-Y.; Vitale, S.; Vitale, S.; Vocca, H.; Vorvick, C.; Vyatchanin, S. P.; Wade, A.; Waldman, S. J.; Wallace, L.; Wan, Y.; Wang, X.; Wang, Z.; Wanner, A.; Ward, R. L.; Was, M.; Wei, P.; Weinert, M.; Weinstein, A. J.; Weiss, R.; Wen, L.; Wen, S.; Wessels, P.; West, M.; Westphal, T.; Wette, K.; Whelan, J. T.; Whitcomb, S. E.; White, D.; Whiting, B. F.; Wilkinson, C.; Willems, P. A.; Williams, H. R.; Williams, L.; Willke, B.; Winkelmann, L.; Winkler, W.; Wipf, C. C.; Wiseman, A. G.; Wittel, H.; Woan, G.; Wooley, R.; Worden, J.; Yablon, J.; Yakushin, I.; Yamamoto, H.; Yamamoto, K.; Yang, H.; Yeaton-Massey, D.; Yoshida, S.; Yu, P.; Yvert, M.; Zadroźny, A.; Zanolin, M.; Zendri, J.-P.; Zhang, F.; Zhang, L.; Zhang, W.; Zhang, Z.; Zhao, C.; Zotov, N.; Zucker, M. E.; Zweizig, J.

2012-01-01

We report on an all-sky search for periodic gravitational waves in the frequency band 50-800 Hz and with the frequency time derivative in the range of 0 through -6×10-9Hz/s. Such a signal could be produced by a nearby spinning and slightly nonaxisymmetric isolated neutron star in our Galaxy. After recent improvements in the search program that yielded a 10× increase in computational efficiency, we have searched in two years of data collected during LIGO’s fifth science run and have obtained the most sensitive all-sky upper limits on gravitational-wave strain to date. Near 150 Hz our upper limit on worst-case linearly polarized strain amplitude h0 is 1×10-24, while at the high end of our frequency range we achieve a worst-case upper limit of 3.8×10-24 for all polarizations and sky locations. These results constitute a factor of 2 improvement upon previously published data. A new detection pipeline utilizing a loosely coherent algorithm was able to follow up weaker outliers, increasing the volume of space where signals can be detected by a factor of 10, but has not revealed any gravitational-wave signals. The pipeline has been tested for robustness with respect to deviations from the model of an isolated neutron star, such as caused by a low-mass or long-period binary companion.
Factors influencing hospital high length of stay outliers

PubMed Central

2012-01-01

Background The study of length of stay (LOS) outliers is important for the management and financing of hospitals. Our aim was to study variables associated with high LOS outliers and their evolution over time. Methods We used hospital administrative data from inpatient episodes in public acute care hospitals in the Portuguese National Health Service (NHS), with discharges between years 2000 and 2009, together with some hospital characteristics. The dependent variable, LOS outliers, was calculated for each diagnosis related group (DRG) using a trim point defined for each year by the geometric mean plus two standard deviations. Hospitals were classified on the basis of administrative, economic and teaching characteristics. We also studied the influence of comorbidities and readmissions. Logistic regression models, including a multivariable logistic regression, were used in the analysis. All the logistic regressions were fitted using generalized estimating equations (GEE). Results In near nine million inpatient episodes analysed we found a proportion of 3.9% high LOS outliers, accounting for 19.2% of total inpatient days. The number of hospital patient discharges increased between years 2000 and 2005 and slightly decreased after that. The proportion of outliers ranged between the lowest value of 3.6% (in years 2001 and 2002) and the highest value of 4.3% in 2009. Teaching hospitals with over 1,000 beds have significantly more outliers than other hospitals, even after adjustment to readmissions and several patient characteristics. Conclusions In the last years both average LOS and high LOS outliers are increasing in Portuguese NHS hospitals. As high LOS outliers represent an important proportion in the total inpatient days, this should be seen as an important alert for the management of hospitals and for national health policies. As expected, age, type of admission, and hospital type were significantly associated with high LOS outliers. The proportion of high outliers does not seem to be related to their financial coverage; they should be studied in order to highlight areas for further investigation. The increasing complexity of both hospitals and patients may be the single most important determinant of high LOS outliers and must therefore be taken into account by health managers when considering hospital costs. PMID:22906386
Epigenetic variability in cells of normal cytology is associated with the risk of future morphological transformation.

PubMed

Teschendorff, Andrew E; Jones, Allison; Fiegl, Heidi; Sargent, Alexandra; Zhuang, Joanna J; Kitchener, Henry C; Widschwendter, Martin

2012-03-27

Recently, it has been proposed that epigenetic variation may contribute to the risk of complex genetic diseases like cancer. We aimed to demonstrate that epigenetic changes in normal cells, collected years in advance of the first signs of morphological transformation, can predict the risk of such transformation. We analyzed DNA methylation (DNAm) profiles of over 27,000 CpGs in cytologically normal cells of the uterine cervix from 152 women in a prospective nested case-control study. We used statistics based on differential variability to identify CpGs associated with the risk of transformation and a novel statistical algorithm called EVORA (Epigenetic Variable Outliers for Risk prediction Analysis) to make predictions. We observed many CpGs that were differentially variable between women who developed a non-invasive cervical neoplasia within 3 years of sample collection and those that remained disease-free. These CpGs exhibited heterogeneous outlier methylation profiles and overlapped strongly with CpGs undergoing age-associated DNA methylation changes in normal tissue. Using EVORA, we demonstrate that the risk of cervical neoplasia can be predicted in blind test sets (AUC = 0.66 (0.58 to 0.75)), and that assessment of DNAm variability allows more reliable identification of risk-associated CpGs than statistics based on differences in mean methylation levels. In independent data, EVORA showed high sensitivity and specificity to detect pre-invasive neoplasia and cervical cancer (AUC = 0.93 (0.86 to 1) and AUC = 1, respectively). We demonstrate that the risk of neoplastic transformation can be predicted from DNA methylation profiles in the morphologically normal cell of origin of an epithelial cancer. Having profiled only 0.1% of CpGs in the human genome, studies of wider coverage are likely to yield improved predictive and diagnostic models with the accuracy needed for clinical application. The ARTISTIC trial is registered with the International Standard Randomised Controlled Trial Number ISRCTN25417821.
Epigenetic variability in cells of normal cytology is associated with the risk of future morphological transformation

PubMed Central

2012-01-01

Background Recently, it has been proposed that epigenetic variation may contribute to the risk of complex genetic diseases like cancer. We aimed to demonstrate that epigenetic changes in normal cells, collected years in advance of the first signs of morphological transformation, can predict the risk of such transformation. Methods We analyzed DNA methylation (DNAm) profiles of over 27,000 CpGs in cytologically normal cells of the uterine cervix from 152 women in a prospective nested case-control study. We used statistics based on differential variability to identify CpGs associated with the risk of transformation and a novel statistical algorithm called EVORA (Epigenetic Variable Outliers for Risk prediction Analysis) to make predictions. Results We observed many CpGs that were differentially variable between women who developed a non-invasive cervical neoplasia within 3 years of sample collection and those that remained disease-free. These CpGs exhibited heterogeneous outlier methylation profiles and overlapped strongly with CpGs undergoing age-associated DNA methylation changes in normal tissue. Using EVORA, we demonstrate that the risk of cervical neoplasia can be predicted in blind test sets (AUC = 0.66 (0.58 to 0.75)), and that assessment of DNAm variability allows more reliable identification of risk-associated CpGs than statistics based on differences in mean methylation levels. In independent data, EVORA showed high sensitivity and specificity to detect pre-invasive neoplasia and cervical cancer (AUC = 0.93 (0.86 to 1) and AUC = 1, respectively). Conclusions We demonstrate that the risk of neoplastic transformation can be predicted from DNA methylation profiles in the morphologically normal cell of origin of an epithelial cancer. Having profiled only 0.1% of CpGs in the human genome, studies of wider coverage are likely to yield improved predictive and diagnostic models with the accuracy needed for clinical application. Trial registration The ARTISTIC trial is registered with the International Standard Randomised Controlled Trial Number ISRCTN25417821. PMID:22453031
Video Denoising via Dynamic Video Layering

NASA Astrophysics Data System (ADS)

Guo, Han; Vaswani, Namrata

2018-07-01

Video denoising refers to the problem of removing "noise" from a video sequence. Here the term "noise" is used in a broad sense to refer to any corruption or outlier or interference that is not the quantity of interest. In this work, we develop a novel approach to video denoising that is based on the idea that many noisy or corrupted videos can be split into three parts - the "low-rank layer", the "sparse layer", and a small residual (which is small and bounded). We show, using extensive experiments, that our denoising approach outperforms the state-of-the-art denoising algorithms.
Robust neural network with applications to credit portfolio data analysis.

PubMed

Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun

2010-01-01

In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.
Guidelines for determining flood flow frequency—Bulletin 17C

USGS Publications Warehouse

England, John F.; Cohn, Timothy A.; Faber, Beth A.; Stedinger, Jery R.; Thomas, Wilbert O.; Veilleux, Andrea G.; Kiang, Julie E.; Mason, Robert R.

2018-03-29

Accurate estimates of flood frequency and magnitude are a key component of any effective nationwide flood risk management and flood damage abatement program. In addition to accuracy, methods for estimating flood risk must be uniformly and consistently applied because management of the Nation’s water and related land resources is a collaborative effort involving multiple actors including most levels of government and the private sector.Flood frequency guidelines have been published in the United States since 1967, and have undergone periodic revisions. In 1967, the U.S. Water Resources Council presented a coherent approach to flood frequency with Bulletin 15, “A Uniform Technique for Determining Flood Flow Frequencies.” The method it recommended involved fitting the log-Pearson Type III distribution to annual peak flow data by the method of moments.The first extension and update of Bulletin 15 was published in 1976 as Bulletin 17, “Guidelines for Determining Flood Flow Frequency” (Guidelines). It extended the Bulletin 15 procedures by introducing methods for dealing with outliers, historical flood information, and regional skew. Bulletin 17A was published the following year to clarify the computation of weighted skew. The next revision of the Bulletin, the Bulletin 17B, provided a host of improvements and new techniques designed to address situations that often arise in practice, including better methods for estimating and using regional skew, weighting station and regional skew, detection of outliers, and use of the conditional probability adjustment.The current version of these Guidelines are presented in this document, denoted Bulletin 17C. It incorporates changes motivated by four of the items listed as “Future Work” in Bulletin 17B and 30 years of post-17B research on flood processes and statistical methods. The updates include: adoption of a generalized representation of flood data that allows for interval and censored data types; a new method, called the Expected Moments Algorithm, which extends the method of moments so that it can accommodate interval data; a generalized approach to identification of low outliers in flood data; and an improved method for computing confidence intervals.Federal agencies are requested to use these Guidelines in all planning activities involving water and related land resources. State, local, and private organizations are encouraged to use these Guidelines to assure uniformity in the flood frequency estimates that all agencies concerned with flood risk should use for Federal planning decisions.This revision is adopted with the knowledge and understanding that review of these procedures will be ongoing. Updated methods will be adopted when warranted by experience and by examination and testing of new techniques.
L-moments and TL-moments of the generalized lambda distribution

USGS Publications Warehouse

Asquith, W.H.

2007-01-01

The 4-parameter generalized lambda distribution (GLD) is a flexible distribution capable of mimicking the shapes of many distributions and data samples including those with heavy tails. The method of L-moments and the recently developed method of trimmed L-moments (TL-moments) are attractive techniques for parameter estimation for heavy-tailed distributions for which the L- and TL-moments have been defined. Analytical solutions for the first five L- and TL-moments in terms of GLD parameters are derived. Unfortunately, numerical methods are needed to compute the parameters from the L- or TL-moments. Algorithms are suggested for parameter estimation. Application of the GLD using both L- and TL-moment parameter estimates from example data is demonstrated, and comparison of the L-moment fit of the 4-parameter kappa distribution is made. A small simulation study of the 98th percentile (far-right tail) is conducted for a heavy-tail GLD with high-outlier contamination. The simulations show, with respect to estimation of the 98th-percent quantile, that TL-moments are less biased (more robost) in the presence of high-outlier contamination. However, the robustness comes at the expense of considerably more sampling variability. ?? 2006 Elsevier B.V. All rights reserved.
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

ERIC Educational Resources Information Center

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

2013-01-01

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Iterative Ellipsoidal Trimming.

DTIC Science & Technology

1980-02-11

to above. Iterative ellipsoidal trimming has been investigated before by other statisticians, most notably by Gnanadesikan and his coworkers...J., Gnanadesikan R., and Kettenring, J. R. (1975). "Robust estimation and outlier detection with correlation coefficients." Biometrika. 62, 531-45. [6...Duda, Richard, and Hart, Peter (1973). Pattern Classification and Scene Analysis. Wiley, New York. [7] Gnanadesikan , R. (1977). Methods for
Patterns of Care for Biologic-Dosing Outliers and Nonoutliers in Biologic-Naive Patients with Rheumatoid Arthritis.

PubMed

Delate, Thomas; Meyer, Roxanne; Jenkins, Daniel

2017-08-01

Although most biologic medications for patients with rheumatoid arthritis (RA) have recommended fixed dosing, actual biologic dosing may vary among real-world patients, since some patients can receive higher (high-dose outliers) or lower (low-dose outliers) doses than what is recommended in medication package inserts. To describe the patterns of care for biologic-dosing outliers and nonoutliers in biologic-naive patients with RA. This was a retrospective, longitudinal cohort study of patients with RA who were not pregnant and were aged ≥ 18 and < 90 years from an integrated health care delivery system. Patients were newly initiated on adalimumab (ADA), etanercept (ETN), or infliximab (IFX) as index biologic therapy between July 1, 2006, and February 28, 2014. Outlier status was defined as a patient having received at least 1 dose < 90% or > 110% of the approved dose in the package insert at any time during the study period. Baseline patient profiles, treatment exposures, and outcomes were collected during the 180 days before and up to 2 years after biologic initiation and compared across index biologic outlier groups. Patients were followed for at least 1 year, with a subanalysis of those patients who remained as members for 2 years. This study included 434 RA patients with 1 year of follow-up and 372 RA patients with 2 years of follow-up. Overall, the vast majority of patients were female (≈75%) and had similar baseline characteristics. Approximately 10% of patients were outliers in both follow-up cohorts. ETN patients were least likely to become outliers, and ADA patients were most likely to become outliers. Of all outliers during the 1-year follow-up, patients were more likely to be a high-dose outlier (55%) than a low-dose outlier (45%). Median 1- and 2-year adjusted total biologic costs (based on wholesale acquisition costs) were higher for ADA and ETA nonoutliers than for IFX nonoutliers. Biologic persistence was highest for IFX patients. Charlson Comorbidity Index score, ETN and IFX index biologic, and treatment with a nonbiologic disease-modifying antirheumatic drug (DMARD) before biologic initiation were associated with becoming high- or low-dose outliers (c-statistic = 0.79). Approximately 1 in 10 study patients with RA was identified as a biologic-dosing outlier. Dosing outliers did not appear to have better clinical outcomes compared with nonoutliers. Before initiating outlier biologic dosing, health care providers may better serve their RA patients by prescribing alternate DMARD therapy. This study was sponsored by Janssen Scientific Affairs. It is the policy of Janssen Scientific Affairs to publish all sponsored studies unless they are exploratory studies or are determined a priori for internal use only (e.g., to inform business decisions). Meyer is an employee of Janssen Scientific Affairs and a stockholder in Johnson and Johnson, its parent company. Delate and Jenkins have nothing to disclose. Study concept and design were contributed by Delate and Meyer. Delate took the lead in data collection, along with Jenkins. All authors participated in data analysis. The manuscript was written primarily by Delate, along with Meyers and Jenkins, and was revised by Meyer, along with Delate and Jenkins.
RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

PubMed

Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch

2017-06-06

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Non-negative Matrix Factorization for Self-calibration of Photometric Redshift Scatter in Weak-lensing Surveys

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Le; Yu, Yu; Zhang, Pengjie, E-mail: lezhang@sjtu.edu.cn

Photo- z error is one of the major sources of systematics degrading the accuracy of weak-lensing cosmological inferences. Zhang et al. proposed a self-calibration method combining galaxy–galaxy correlations and galaxy–shear correlations between different photo- z bins. Fisher matrix analysis shows that it can determine the rate of photo- z outliers at a level of 0.01%–1% merely using photometric data and do not rely on any prior knowledge. In this paper, we develop a new algorithm to implement this method by solving a constrained nonlinear optimization problem arising in the self-calibration process. Based on the techniques of fixed-point iteration and non-negativemore » matrix factorization, the proposed algorithm can efficiently and robustly reconstruct the scattering probabilities between the true- z and photo- z bins. The algorithm has been tested extensively by applying it to mock data from simulated stage IV weak-lensing projects. We find that the algorithm provides a successful recovery of the scatter rates at the level of 0.01%–1%, and the true mean redshifts of photo- z bins at the level of 0.001, which may satisfy the requirements in future lensing surveys.« less
Kfits: a software framework for fitting and cleaning outliers in kinetic measurements.

PubMed

Rimon, Oded; Reichmann, Dana

2018-01-01

Kinetic measurements have played an important role in elucidating biochemical and biophysical phenomena for over a century. While many tools for analysing kinetic measurements exist, most require low noise levels in the data, leaving outlier measurements to be cleaned manually. This is particularly true for protein misfolding and aggregation processes, which are extremely noisy and hence difficult to model. Understanding these processes is paramount, as they are associated with diverse physiological processes and disorders, most notably neurodegenerative diseases. Therefore, a better tool for analysing and cleaning protein aggregation traces is required. Here we introduce Kfits, an intuitive graphical tool for detecting and removing noise caused by outliers in protein aggregation kinetics data. Following its workflow allows the user to quickly and easily clean large quantities of data and receive kinetic parameters for assessment of the results. With minor adjustments, the software can be applied to any type of kinetic measurements, not restricted to protein aggregation. Kfits is implemented in Python and available online at http://kfits.reichmannlab.com, in source at https://github.com/odedrim/kfits/, or by direct installation from PyPI (`pip install kfits`). danare@mail.huji.ac.il. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Robustly detecting differential expression in RNA sequencing data using observation weights

PubMed Central

Zhou, Xiaobei; Lindsay, Helen; Robinson, Mark D.

2014-01-01

A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of ‘sharing of information’ across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/. PMID:24753412
MIDAS robust trend estimator for accurate GPS station velocities without step detection

PubMed Central

Kreemer, Corné; Hammond, William C.; Gazeaux, Julien

2016-01-01

Abstract Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil‐Sen median trend estimator, for which the ordinary version is the median of slopes vij = (xj–xi)/(tj–ti) computed between all data pairs i > j. For normally distributed data, Theil‐Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil‐Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one‐sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root‐mean‐square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences. PMID:27668140
MIDAS robust trend estimator for accurate GPS station velocities without step detection.

PubMed

Blewitt, Geoffrey; Kreemer, Corné; Hammond, William C; Gazeaux, Julien

2016-03-01

Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil-Sen median trend estimator, for which the ordinary version is the median of slopes v ij = ( x j -x i )/( t j -t i ) computed between all data pairs i > j . For normally distributed data, Theil-Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil-Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one-sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root-mean-square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences.
MIDAS robust trend estimator for accurate GPS station velocities without step detection

NASA Astrophysics Data System (ADS)

Blewitt, Geoffrey; Kreemer, Corné; Hammond, William C.; Gazeaux, Julien

2016-03-01

Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil-Sen median trend estimator, for which the ordinary version is the median of slopes vij = (xj-xi)/(tj-ti) computed between all data pairs i > j. For normally distributed data, Theil-Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil-Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one-sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root-mean-square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences.
The variance of length of stay and the optimal DRG outlier payments.

PubMed

Felder, Stefan

2009-09-01

Prospective payment schemes in health care often include supply-side insurance for cost outliers. In hospital reimbursement, prospective payments for patient discharges, based on their classification into diagnosis related group (DRGs), are complemented by outlier payments for long stay patients. The outlier scheme fixes the length of stay (LOS) threshold, constraining the profit risk of the hospitals. In most DRG systems, this threshold increases with the standard deviation of the LOS distribution. The present paper addresses the adequacy of this DRG outlier threshold rule for risk-averse hospitals with preferences depending on the expected value and the variance of profits. It first shows that the optimal threshold solves the hospital's tradeoff between higher profit risk and lower premium loading payments. It then demonstrates for normally distributed truncated LOS that the optimal outlier threshold indeed decreases with an increase in the standard deviation.
Direct endoscopic video registration for sinus surgery

NASA Astrophysics Data System (ADS)

Mirota, Daniel; Taylor, Russell H.; Ishii, Masaru; Hager, Gregory D.

2009-02-01

Advances in computer vision have made possible robust 3D reconstruction of monocular endoscopic video. These reconstructions accurately represent the visible anatomy and, once registered to pre-operative CT data, enable a navigation system to track directly through video eliminating the need for an external tracking system. Video registration provides the means for a direct interface between an endoscope and a navigation system and allows a shorter chain of rigid-body transformations to be used to solve the patient/navigation-system registration. To solve this registration step we propose a new 3D-3D registration algorithm based on Trimmed Iterative Closest Point (TrICP)1 and the z-buffer algorithm.2 The algorithm takes as input a 3D point cloud of relative scale with the origin at the camera center, an isosurface from the CT, and an initial guess of the scale and location. Our algorithm utilizes only the visible polygons of the isosurface from the current camera location during each iteration to minimize the search area of the target region and robustly reject outliers of the reconstruction. We present example registrations in the sinus passage applicable to both sinus surgery and transnasal surgery. To evaluate our algorithm's performance we compare it to registration via Optotrak and present closest distance point to surface error. We show our algorithm has a mean closest distance error of .2268mm.
Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs

PubMed Central

Freedman, Adam H.; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Davis, Brian W.; Gronau, Ilan; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Marques-Bonet, Tomas; Ostrander, Elaine A.; Wayne, Robert K.; Novembre, John

2016-01-01

Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers. PMID:26943675
VizieR Online Data Catalog: Outliers and similarity in APOGEE (Reis+, 2018)

NASA Astrophysics Data System (ADS)

Reis, I.; Poznanski, D.; Baron, D.; Zasowski, G.; Shahaf, S.

2017-11-01

t-SNE is a dimensionality reduction algorithm that is particularly well suited for the visualization of high-dimensional datasets. We use t-SNE to visualize our distance matrix. A-priori, these distances could define a space with almost as many dimensions as objects, i.e., tens of thousand of dimensions. Obviously, since many stars are quite similar, and their spectra are defined by a few physical parameters, the minimal spanning space might be smaller. By using t-SNE we can examine the structure of our sample projected into 2D. We use our distance matrix as input to the t-SNE algorithm and in return get a 2D map of the objects in our dataset. For each star in a sample of 183232 APOGEE stars, the APOGEE IDs of the 99 stars with most similar spectra (according to the method described in paper), ordered by similarity. (3 data files).
DOE Office of Scientific and Technical Information (OSTI.GOV)

Delaney, Alexander R., E-mail: a.delaney@vumc.nl; Tol, Jim P.; Dahele, Max

Purpose: RapidPlan, a commercial knowledge-based planning solution, uses a model library containing the geometry and associated dosimetry of existing plans. This model predicts achievable dosimetry for prospective patients that can be used to guide plan optimization. However, it is unknown how suboptimal model plans (outliers) influence the predictions or resulting plans. We investigated the effect of, first, removing outliers from the model (cleaning it) and subsequently adding deliberate dosimetric outliers. Methods and Materials: Clinical plans from 70 head and neck cancer patients comprised the uncleaned (UC) Model{sub UC}, from which outliers were cleaned (C) to create Model{sub C}. The lastmore » 5 to 40 patients of Model{sub C} were replanned with no attempt to spare the salivary glands. These substantial dosimetric outliers were reintroduced to the model in increments of 5, creating Model{sub 5} to Model{sub 40} (Model{sub 5-40}). These models were used to create plans for a 10-patient evaluation group. Plans from Model{sub UC} and Model{sub C}, and Model{sub C} and Model{sub 5-40} were compared on the basis of boost (B) and elective (E) target volume homogeneity indexes (HI{sub B}/HI{sub E}) and mean doses to oral cavity, composite salivary glands (comp{sub sal}) and swallowing (comp{sub swal}) structures. Results: On average, outlier removal (Model{sub C} vs Model{sub UC}) had minimal effects on HI{sub B}/HI{sub E} (0%-0.4%) and sparing of organs at risk (mean dose difference to oral cavity and comp{sub sal}/comp{sub swal} were ≤0.4 Gy). Model{sub 5-10} marginally improved comp{sub sal} sparing, whereas adding a larger number of outliers (Model{sub 20-40}) led to deteriorations in comp{sub sal} up to 3.9 Gy, on average. These increases are modest compared to the 14.9 Gy dose increases in the added outlier plans, due to the placement of optimization objectives below the inferior boundary of the dose-volume histogram-predicted range. Conclusions: Overall, dosimetric outlier removal from or addition of 5 to 10 outliers to a 70-patient model had marginal effects on resulting plan quality. Although the addition of >20 outliers deteriorated plan quality, the effect was modest. In this study, RapidPlan demonstrated robustness for moderate proportions of salivary gland dosimetric outliers.« less
Quality control of the soil moisture probe response patterns from a green infrastructure site using Dynamic Time Warping (DTW) and association rule learning

NASA Astrophysics Data System (ADS)

Yu, Z.; Bedig, A.; Quigley, M.; Montalto, F. A.

2017-12-01

In-situ field monitoring can help to improve the design and management of decentralized Green Infrastructure (GI) systems in urban areas. Because of the vast quantity of continuous data generated from multi-site sensor systems, cost-effective post-construction opportunities for real-time control are limited; and the physical processes that influence the observed phenomena (e.g. soil moisture) are hard to track and control. To derive knowledge efficiently from real-time monitoring data, there is currently a need to develop more efficient approaches to data quality control. In this paper, we employ dynamic time warping method to compare the similarity of two soil moisture patterns without ignoring the inherent autocorrelation. We also use a rule-based machine learning method to investigate the feasibility of detecting anomalous responses from soil moisture probes. The data was generated from both individual and clusters of probes, deployed in a GI site in Milwaukee, WI. In contrast to traditional QAQC methods, which seek to detect outliers at individual time steps, the new method presented here converts the continuous time series into event-based symbolic sequences from which unusual response patterns can be detected. Different Matching rules are developed on different physical characteristics for different seasons. The results suggest that this method could be used alternatively to detect sensor failure, to identify extreme events, and to call out abnormal change patterns, compared to intra-probe and inter-probe historical observations. Though this algorithm was developed for soil moisture probes, the same approach could easily be extended to advance QAQC efficiency for any continuous environmental datasets.
Automated X-ray quality control of catalytic converters

NASA Astrophysics Data System (ADS)

Shashishekhar, N.; Veselitza, D.

2017-02-01

Catalytic converters are devices attached to the exhaust system of automobile or other engines to eliminate or substantially reduce polluting emissions. They consist of coated substrates enclosed in a stainless steel housing. The substrate is typically made of ceramic honeycombs; however stainless steel foil honeycombs are also used. The coating is usually a slurry of alumina, silica, rare earth oxides and platinum group metals. The slurry also known as the wash coat is applied to the substrate in two doses, one on each end of the substrate; in some cases multiple layers of coating are applied. X-ray imaging is used to inspect the applied coating depth on a substrate to confirm compliance with quality requirements. Automated image analysis techniques are employed to measure the coating depth from the X-ray image. Coating depth is assessed by analysis of attenuation line profiles in the image. Edge detection algorithms with noise reduction and outlier rejection are used to calculate the coating depth at a specified point along an attenuation line profile. Quality control of the product is accomplished using several attenuation line profile regions for coating depth measurements, with individual pass or fail criteria specified for each region.
Tensor voting for image correction by global and local intensity alignment.

PubMed

Jia, Jiaya; Tang, Chi-Keung

2005-01-01

This paper presents a voting method to perform image correction by global and local intensity alignment. The key to our modeless approach is the estimation of global and local replacement functions by reducing the complex estimation problem to the robust 2D tensor voting in the corresponding voting spaces. No complicated model for replacement function (curve) is assumed. Subject to the monotonic constraint only, we vote for an optimal replacement function by propagating the curve smoothness constraint using a dense tensor field. Our method effectively infers missing curve segments and rejects image outliers. Applications using our tensor voting approach are proposed and described. The first application consists of image mosaicking of static scenes, where the voted replacement functions are used in our iterative registration algorithm for computing the best warping matrix. In the presence of occlusion, our replacement function can be employed to construct a visually acceptable mosaic by detecting occlusion which has large and piecewise constant color. Furthermore, by the simultaneous consideration of color matches and spatial constraints in the voting space, we perform image intensity compensation and high contrast image correction using our voting framework, when only two defective input images are given.
Multivariate-$t$ nonlinear mixed models with application to censored multi-outcome AIDS studies.

PubMed

Lin, Tsung-I; Wang, Wan-Lun

2017-10-01

In multivariate longitudinal HIV/AIDS studies, multi-outcome repeated measures on each patient over time may contain outliers, and the viral loads are often subject to a upper or lower limit of detection depending on the quantification assays. In this article, we consider an extension of the multivariate nonlinear mixed-effects model by adopting a joint multivariate-$t$ distribution for random effects and within-subject errors and taking the censoring information of multiple responses into account. The proposed model is called the multivariate-$t$ nonlinear mixed-effects model with censored responses (MtNLMMC), allowing for analyzing multi-outcome longitudinal data exhibiting nonlinear growth patterns with censorship and fat-tailed behavior. Utilizing the Taylor-series linearization method, a pseudo-data version of expectation conditional maximization either (ECME) algorithm is developed for iteratively carrying out maximum likelihood estimation. We illustrate our techniques with two data examples from HIV/AIDS studies. Experimental results signify that the MtNLMMC performs favorably compared to its Gaussian analogue and some existing approaches. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Aerial video mosaicking using binary feature tracking

NASA Astrophysics Data System (ADS)

Minnehan, Breton; Savakis, Andreas

2015-05-01

Unmanned Aerial Vehicles are becoming an increasingly attractive platform for many applications, as their cost decreases and their capabilities increase. Creating detailed maps from aerial data requires fast and accurate video mosaicking methods. Traditional mosaicking techniques rely on inter-frame homography estimations that are cascaded through the video sequence. Computationally expensive keypoint matching algorithms are often used to determine the correspondence of keypoints between frames. This paper presents a video mosaicking method that uses an object tracking approach for matching keypoints between frames to improve both efficiency and robustness. The proposed tracking method matches local binary descriptors between frames and leverages the spatial locality of the keypoints to simplify the matching process. Our method is robust to cascaded errors by determining the homography between each frame and the ground plane rather than the prior frame. The frame-to-ground homography is calculated based on the relationship of each point's image coordinates and its estimated location on the ground plane. Robustness to moving objects is integrated into the homography estimation step through detecting anomalies in the motion of keypoints and eliminating the influence of outliers. The resulting mosaics are of high accuracy and can be computed in real time.
Investigating outliers to improve conceptual models of bedrock aquifers

NASA Astrophysics Data System (ADS)

Worthington, Stephen R. H.

2018-06-01

Numerical models play a prominent role in hydrogeology, with simplifying assumptions being inevitable when implementing these models. However, there is a risk of oversimplification, where important processes become neglected. Such processes may be associated with outliers, and consideration of outliers can lead to an improved scientific understanding of bedrock aquifers. Using rigorous logic to investigate outliers can help to explain fundamental scientific questions such as why there are large variations in permeability between different bedrock lithologies.
Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations.

PubMed

Bakker, Marjan; Wicherts, Jelte M

2014-09-01

In psychology, outliers are often excluded before running an independent samples t test, and data are often nonnormal because of the use of sum scores based on tests and questionnaires. This article concerns the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores. After reviewing common practice, we present results of simulations of artificial and actual psychological data, which show that the removal of outliers based on commonly used Z value thresholds severely increases the Type I error rate. We found Type I error rates of above 20% after removing outliers with a threshold value of Z = 2 in a short and difficult test. Inflations of Type I error rates are particularly severe when researchers are given the freedom to alter threshold values of Z after having seen the effects thereof on outcomes. We recommend the use of nonparametric Mann-Whitney-Wilcoxon tests or robust Yuen-Welch tests without removing outliers. These alternatives to independent samples t tests are found to have nominal Type I error rates with a minimal loss of power when no outliers are present in the data and to have nominal Type I error rates and good power when outliers are present. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Detection of anomalous signals in temporally correlated data (Invited)

NASA Astrophysics Data System (ADS)

Langbein, J. O.

2010-12-01

Detection of transient tectonic signals in data obtained from large geodetic networks requires the ability to detect signals that are both temporally and spatially coherent. In this report I will describe a modification to an existing method that estimates both the coefficients of temporally correlated noise model and an efficient filter based on the noise model. This filter, when applied to the original time-series, effectively whitens (or flattens) the power spectrum. The filtered data provide the means to calculate running averages which are then used to detect deviations from the background trends. For large networks, time-series of signal-to-noise ratio (SNR) can be easily constructed since, by filtering, each of the original time-series has been transformed into one that is closer to having a Gaussian distribution with a variance of 1.0. Anomalous intervals may be identified by counting the number of GPS sites for which the SNR exceeds a specified value. For example, during one time interval, if there were 5 out of 20 time-series with SNR>2, this would be considered anomalous; typically, one would expect at 95% confidence that there would be at least 1 out of 20 time-series with an SNR>2. For time intervals with an anomalously large number of high SNR, the spatial distribution of the SNR is mapped to identify the location of the anomalous signal(s) and their degree of spatial clustering. Estimating the filter that should be used to whiten the data requires modification of the existing methods that employ maximum likelihood estimation to determine the temporal covariance of the data. In these methods, it is assumed that the noise components in the data are a combination of white, flicker and random-walk processes and that they are derived from three different and independent sources. Instead, in this new method, the covariance matrix is constructed assuming that only one source is responsible for the noise and that source can be represented as a white-noise random-number generator convolved with a filter whose spectral properties are frequency (f) independent at its highest frequencies, 1/f at the middle frequencies, and 1/f2 at the lowest frequencies. For data sets with no gaps in their time-series, construction of covariance and inverse covariance matrices is extremely efficient. Application of the above algorithm to real data potentially involves several iterations as small, tectonic signals of interest are often indistinguishable from background noise. Consequently, simply plotting the time-series of each GPS site is used to identify the largest outliers and signals independent of their cause. Any analysis of the background noise levels must factor in these other signals while the gross outliers need to be removed.
Atlas-Guided Cluster Analysis of Large Tractography Datasets

PubMed Central

Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

2013-01-01

Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292
Muscle Activity Map Reconstruction from High Density Surface EMG Signals With Missing Channels Using Image Inpainting and Surface Reconstruction Methods.

PubMed

Ghaderi, Parviz; Marateb, Hamid R

2017-07-01

The aim of this study was to reconstruct low-quality High-density surface EMG (HDsEMG) signals, recorded with 2-D electrode arrays, using image inpainting and surface reconstruction methods. It is common that some fraction of the electrodes may provide low-quality signals. We used variety of image inpainting methods, based on partial differential equations (PDEs), and surface reconstruction methods to reconstruct the time-averaged or instantaneous muscle activity maps of those outlier channels. Two novel reconstruction algorithms were also proposed. HDsEMG signals were recorded from the biceps femoris and brachial biceps muscles during low-to-moderate-level isometric contractions, and some of the channels (5-25%) were randomly marked as outliers. The root-mean-square error (RMSE) between the original and reconstructed maps was then calculated. Overall, the proposed Poisson and wave PDE outperformed the other methods (average RMSE 8.7 μV rms ± 6.1 μV rms and 7.5 μV rms ± 5.9 μV rms ) for the time-averaged single-differential and monopolar map reconstruction, respectively. Biharmonic Spline, the discrete cosine transform, and the Poisson PDE outperformed the other methods for the instantaneous map reconstruction. The running time of the proposed Poisson and wave PDE methods, implemented using a Vectorization package, was 4.6 ± 5.7 ms and 0.6 ± 0.5 ms, respectively, for each signal epoch or time sample in each channel. The proposed reconstruction algorithms could be promising new tools for reconstructing muscle activity maps in real-time applications. Proper reconstruction methods could recover the information of low-quality recorded channels in HDsEMG signals.
Privacy-preserving outlier detection through random nonlinear data distortion.

PubMed

Bhaduri, Kanishka; Stefanski, Mark D; Srivastava, Ashok N

2011-02-01

Consider a scenario in which the data owner has some private or sensitive data and wants a data miner to access them for studying important patterns without revealing the sensitive information. Privacy-preserving data mining aims to solve this problem by randomly transforming the data prior to their release to the data miners. Previous works only considered the case of linear data perturbations--additive, multiplicative, or a combination of both--for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy-preserving anomaly detection from sensitive data sets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that, for specific cases, it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. The experiments conducted on real-life data sets demonstrate the effectiveness of the approach.

The search for loci under selection: trends, biases and progress.

PubMed

Ahrens, Collin W; Rymer, Paul D; Stow, Adam; Bragg, Jason; Dillon, Shannon; Umbers, Kate D L; Dudaniec, Rachael Y

2018-03-01

Detecting genetic variants under selection using F ST outlier analysis (OA) and environmental association analyses (EAAs) are popular approaches that provide insight into the genetic basis of local adaptation. Despite the frequent use of OA and EAA approaches and their increasing attractiveness for detecting signatures of selection, their application to field-based empirical data have not been synthesized. Here, we review 66 empirical studies that use Single Nucleotide Polymorphisms (SNPs) in OA and EAA. We report trends and biases across biological systems, sequencing methods, approaches, parameters, environmental variables and their influence on detecting signatures of selection. We found striking variability in both the use and reporting of environmental data and statistical parameters. For example, linkage disequilibrium among SNPs and numbers of unique SNP associations identified with EAA were rarely reported. The proportion of putatively adaptive SNPs detected varied widely among studies, and decreased with the number of SNPs analysed. We found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection. OA identified a higher proportion of outliers when more individuals were sampled, but this was not the case for EAA. To facilitate repeatability, interpretation and synthesis of studies detecting selection, we recommend that future studies consistently report geographical coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure. Identifying standards for how OA and EAA studies are designed and reported will aid future transparency and comparability of SNP-based selection studies and help to progress landscape and evolutionary genomics. © 2018 John Wiley & Sons Ltd.
Accurate motion parameter estimation for colonoscopy tracking using a regression method

NASA Astrophysics Data System (ADS)

Liu, Jianfei; Subramanian, Kalpathi R.; Yoo, Terry S.

2010-03-01

Co-located optical and virtual colonoscopy images have the potential to provide important clinical information during routine colonoscopy procedures. In our earlier work, we presented an optical flow based algorithm to compute egomotion from live colonoscopy video, permitting navigation and visualization of the corresponding patient anatomy. In the original algorithm, motion parameters were estimated using the traditional Least Sum of squares(LS) procedure which can be unstable in the context of optical flow vectors with large errors. In the improved algorithm, we use the Least Median of Squares (LMS) method, a robust regression method for motion parameter estimation. Using the LMS method, we iteratively analyze and converge toward the main distribution of the flow vectors, while disregarding outliers. We show through three experiments the improvement in tracking results obtained using the LMS method, in comparison to the LS estimator. The first experiment demonstrates better spatial accuracy in positioning the virtual camera in the sigmoid colon. The second and third experiments demonstrate the robustness of this estimator, resulting in longer tracked sequences: from 300 to 1310 in the ascending colon, and 410 to 1316 in the transverse colon.
Classifier dependent feature preprocessing methods

NASA Astrophysics Data System (ADS)

Rodriguez, Benjamin M., II; Peterson, Gilbert L.

2008-04-01

In mobile applications, computational complexity is an issue that limits sophisticated algorithms from being implemented on these devices. This paper provides an initial solution to applying pattern recognition systems on mobile devices by combining existing preprocessing algorithms for recognition. In pattern recognition systems, it is essential to properly apply feature preprocessing tools prior to training classification models in an attempt to reduce computational complexity and improve the overall classification accuracy. The feature preprocessing tools extended for the mobile environment are feature ranking, feature extraction, data preparation and outlier removal. Most desktop systems today are capable of processing a majority of the available classification algorithms without concern of processing while the same is not true on mobile platforms. As an application of pattern recognition for mobile devices, the recognition system targets the problem of steganalysis, determining if an image contains hidden information. The measure of performance shows that feature preprocessing increases the overall steganalysis classification accuracy by an average of 22%. The methods in this paper are tested on a workstation and a Nokia 6620 (Symbian operating system) camera phone with similar results.
42 CFR 484.240 - Methodology used for the calculation of the outlier payment.

Code of Federal Regulations, 2010 CFR

2010-10-01

... for each case-mix group. (b) The outlier threshold for each case-mix group is the episode payment... the same for all case-mix groups. (c) The outlier payment is a proportion of the amount of estimated...
Online gesture spotting from visual hull data.

PubMed

Peng, Bo; Qian, Gang

2011-06-01

This paper presents a robust framework for online full-body gesture spotting from visual hull data. Using view-invariant pose features as observations, hidden Markov models (HMMs) are trained for gesture spotting from continuous movement data streams. Two major contributions of this paper are 1) view-invariant pose feature extraction from visual hulls, and 2) a systematic approach to automatically detecting and modeling specific nongesture movement patterns and using their HMMs for outlier rejection in gesture spotting. The experimental results have shown the view-invariance property of the proposed pose features for both training poses and new poses unseen in training, as well as the efficacy of using specific nongesture models for outlier rejection. Using the IXMAS gesture data set, the proposed framework has been extensively tested and the gesture spotting results are superior to those reported on the same data set obtained using existing state-of-the-art gesture spotting methods.
On Visualizing Mixed-Type Data: A Joint Metric Approach to Profile Construction and Outlier Detection

ERIC Educational Resources Information Center

Grané, Aurea; Romera, Rosario

2018-01-01

Survey data are usually of mixed type (quantitative, multistate categorical, and/or binary variables). Multidimensional scaling (MDS) is one of the most extended methodologies to visualize the profile structure of the data. Since the past 60s, MDS methods have been introduced in the literature, initially in publications in the psychometrics area.…
Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach

ERIC Educational Resources Information Center

Magis, David; De Boeck, Paul

2011-01-01

We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…
BKG/DGFI Combination Center Annual Report 2012

NASA Technical Reports Server (NTRS)

Bachmann, Sabine; Loesler, Michael; Heinkelmann, Robert; Gerstl, Michael

2013-01-01

This report summarizes the activities of the Federal Agency for Cartography and Geodesy (Bundesamt fuer Kartographie und Geodaesie, BKG) and the German Geodetic Research Institute (Deutsches Geodaetisches Forschungsinstitut, DGFI)BKG/DGFI Combination Center in 2011 and outlines the planned activities for the year 2012. The main focus was to stabilize outlier detection and to update the Web presentation of the combined products.
Extensive cross-talk and global regulators identified from an analysis of the integrated transcriptional and signaling network in Escherichia coli.

PubMed

Antiqueira, Lucas; Janga, Sarath Chandra; Costa, Luciano da Fontoura

2012-11-01

To understand the regulatory dynamics of transcription factors (TFs) and their interplay with other cellular components we have integrated transcriptional, protein-protein and the allosteric or equivalent interactions which mediate the physiological activity of TFs in Escherichia coli. To study this integrated network we computed a set of network measurements followed by principal component analysis (PCA), investigated the correlations between network structure and dynamics, and carried out a procedure for motif detection. In particular, we show that outliers identified in the integrated network based on their network properties correspond to previously characterized global transcriptional regulators. Furthermore, outliers are highly and widely expressed across conditions, thus supporting their global nature in controlling many genes in the cell. Motifs revealed that TFs not only interact physically with each other but also obtain feedback from signals delivered by signaling proteins supporting the extensive cross-talk between different types of networks. Our analysis can lead to the development of a general framework for detecting and understanding global regulatory factors in regulatory networks and reinforces the importance of integrating multiple types of interactions in underpinning the interrelationships between them.
Simple automatic strategy for background drift correction in chromatographic data analysis.

PubMed

Fu, Hai-Yan; Li, He-Dong; Yu, Yong-Jie; Wang, Bing; Lu, Peng; Cui, Hua-Peng; Liu, Ping-Ping; She, Yuan-Bin

2016-06-03

Chromatographic background drift correction, which influences peak detection and time shift alignment results, is a critical stage in chromatographic data analysis. In this study, an automatic background drift correction methodology was developed. Local minimum values in a chromatogram were initially detected and organized as a new baseline vector. Iterative optimization was then employed to recognize outliers, which belong to the chromatographic peaks, in this vector, and update the outliers in the baseline until convergence. The optimized baseline vector was finally expanded into the original chromatogram, and linear interpolation was employed to estimate background drift in the chromatogram. The principle underlying the proposed method was confirmed using a complex gas chromatographic dataset. Finally, the proposed approach was applied to eliminate background drift in liquid chromatography quadrupole time-of-flight samples used in the metabolic study of Escherichia coli samples. The proposed method was comparable with three classical techniques: morphological weighted penalized least squares, moving window minimum value strategy and background drift correction by orthogonal subspace projection. The proposed method allows almost automatic implementation of background drift correction, which is convenient for practical use. Copyright © 2016 Elsevier B.V. All rights reserved.
Identification of unusual events in multichannel bridge monitoring data using wavelet transform and outlier analysis

NASA Astrophysics Data System (ADS)

Omenzetter, Piotr; Brownjohn, James M. W.; Moyo, Pilate

2003-08-01

Continuously operating instrumented structural health monitoring (SHM) systems are becoming a practical alternative to replace visual inspection for assessment of condition and soundness of civil infrastructure. However, converting large amount of data from an SHM system into usable information is a great challenge to which special signal processing techniques must be applied. This study is devoted to identification of abrupt, anomalous and potentially onerous events in the time histories of static, hourly sampled strains recorded by a multi-sensor SHM system installed in a major bridge structure in Singapore and operating continuously for a long time. Such events may result, among other causes, from sudden settlement of foundation, ground movement, excessive traffic load or failure of post-tensioning cables. A method of outlier detection in multivariate data has been applied to the problem of finding and localizing sudden events in the strain data. For sharp discrimination of abrupt strain changes from slowly varying ones wavelet transform has been used. The proposed method has been successfully tested using known events recorded during construction of the bridge, and later effectively used for detection of anomalous post-construction events.
Measurement Consistency from Magnetic Resonance Images

PubMed Central

Chung, Dongjun; Chung, Moo K.; Durtschi, Reid B.; Lindell, R. Gentry; Vorperian, Houri K.

2010-01-01

Rationale and Objectives In quantifying medical images, length-based measurements are still obtained manually. Due to possible human error, a measurement protocol is required to guarantee the consistency of measurements. In this paper, we review various statistical techniques that can be used in determining measurement consistency. The focus is on detecting a possible measurement bias and determining the robustness of the procedures to outliers. Materials and Methods We review correlation analysis, linear regression, Bland-Altman method, paired t-test, and analysis of variance (ANOVA). These techniques were applied to measurements, obtained by two raters, of head and neck structures from magnetic resonance images (MRI). Results The correlation analysis and the linear regression were shown to be insufficient for detecting measurement inconsistency. They are also very sensitive to outliers. The widely used Bland-Altman method is a visualization technique so it lacks the numerical quantification. The paired t-test tends to be sensitive to small measurement bias. On the other hand, ANOVA performs well even under small measurement bias. Conclusion In almost all cases, using only one method is insufficient and it is recommended to use several methods simultaneously. In general, ANOVA performs the best. PMID:18790405
Moving standard deviation and moving sum of outliers as quality tools for monitoring analytical precision.

PubMed

Liu, Jiakai; Tan, Chin Hon; Badrick, Tony; Loh, Tze Ping

2018-02-01

An increase in analytical imprecision (expressed as CV a ) can introduce additional variability (i.e. noise) to the patient results, which poses a challenge to the optimal management of patients. Relatively little work has been done to address the need for continuous monitoring of analytical imprecision. Through numerical simulations, we describe the use of moving standard deviation (movSD) and a recently described moving sum of outlier (movSO) patient results as means for detecting increased analytical imprecision, and compare their performances against internal quality control (QC) and the average of normal (AoN) approaches. The power of detecting an increase in CV a is suboptimal under routine internal QC procedures. The AoN technique almost always had the highest average number of patient results affected before error detection (ANPed), indicating that it had generally the worst capability for detecting an increased CV a . On the other hand, the movSD and movSO approaches were able to detect an increased CV a at significantly lower ANPed, particularly for measurands that displayed a relatively small ratio of biological variation to CV a. CONCLUSION: The movSD and movSO approaches are effective in detecting an increase in CV a for high-risk measurands with small biological variation. Their performance is relatively poor when the biological variation is large. However, the clinical risks of an increase in analytical imprecision is attenuated for these measurands as an increased analytical imprecision will only add marginally to the total variation and less likely to impact on the clinical care. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Baseline Estimation and Outlier Identification for Halocarbons

NASA Astrophysics Data System (ADS)

Wang, D.; Schuck, T.; Engel, A.; Gallman, F.

2017-12-01

The aim of this paper is to build a baseline model for halocarbons and to statistically identify the outliers under specific conditions. In this paper, time series of regional CFC-11 and Chloromethane measurements was discussed, which taken over the last 4 years at two locations, including a monitoring station at northwest of Frankfurt am Main (Germany) and Mace Head station (Ireland). In addition to analyzing time series of CFC-11 and Chloromethane, more importantly, a statistical approach of outlier identification is also introduced in this paper in order to make a better estimation of baseline. A second-order polynomial plus harmonics are fitted to CFC-11 and chloromethane mixing ratios data. Measurements with large distance to the fitting curve are regard as outliers and flagged. Under specific requirement, the routine is iteratively adopted without the flagged measurements until no additional outliers are found. Both model fitting and the proposed outlier identification method are realized with the help of a programming language, Python. During the period, CFC-11 shows a gradual downward trend. And there is a slightly upward trend in the mixing ratios of Chloromethane. The concentration of chloromethane also has a strong seasonal variation, mostly due to the seasonal cycle of OH. The usage of this statistical method has a considerable effect on the results. This method efficiently identifies a series of outliers according to the standard deviation requirements. After removing the outliers, the fitting curves and trend estimates are more reliable.
Efficient super-resolution image reconstruction applied to surveillance video captured by small unmanned aircraft systems

NASA Astrophysics Data System (ADS)

He, Qiang; Schultz, Richard R.; Chu, Chee-Hung Henry

2008-04-01

The concept surrounding super-resolution image reconstruction is to recover a highly-resolved image from a series of low-resolution images via between-frame subpixel image registration. In this paper, we propose a novel and efficient super-resolution algorithm, and then apply it to the reconstruction of real video data captured by a small Unmanned Aircraft System (UAS). Small UAS aircraft generally have a wingspan of less than four meters, so that these vehicles and their payloads can be buffeted by even light winds, resulting in potentially unstable video. This algorithm is based on a coarse-to-fine strategy, in which a coarsely super-resolved image sequence is first built from the original video data by image registration and bi-cubic interpolation between a fixed reference frame and every additional frame. It is well known that the median filter is robust to outliers. If we calculate pixel-wise medians in the coarsely super-resolved image sequence, we can restore a refined super-resolved image. The primary advantage is that this is a noniterative algorithm, unlike traditional approaches based on highly-computational iterative algorithms. Experimental results show that our coarse-to-fine super-resolution algorithm is not only robust, but also very efficient. In comparison with five well-known super-resolution algorithms, namely the robust super-resolution algorithm, bi-cubic interpolation, projection onto convex sets (POCS), the Papoulis-Gerchberg algorithm, and the iterated back projection algorithm, our proposed algorithm gives both strong efficiency and robustness, as well as good visual performance. This is particularly useful for the application of super-resolution to UAS surveillance video, where real-time processing is highly desired.
A real-time, practical sensor fault-tolerant module for robust EMG pattern recognition.

PubMed

Zhang, Xiaorong; Huang, He

2015-02-19

Unreliability of surface EMG recordings over time is a challenge for applying the EMG pattern recognition (PR)-controlled prostheses in clinical practice. Our previous study proposed a sensor fault-tolerant module (SFTM) by utilizing redundant information in multiple EMG signals. The SFTM consists of multiple sensor fault detectors and a self-recovery mechanism that can identify anomaly in EMG signals and remove the recordings of the disturbed signals from the input of the pattern classifier to recover the PR performance. While the proposed SFTM has shown great promise, the previous design is impractical. A practical SFTM has to be fast enough, lightweight, automatic, and robust under different conditions with or without disturbances. This paper presented a real-time, practical SFTM towards robust EMG PR. A novel fast LDA retraining algorithm and a fully automatic sensor fault detector based on outlier detection were developed, which allowed the SFTM to promptly detect disturbances and recover the PR performance immediately. These components of SFTM were then integrated with the EMG PR module and tested on five able-bodied subjects and a transradial amputee in real-time for classifying multiple hand and wrist motions under different conditions with different disturbance types and levels. The proposed fast LDA retraining algorithm significantly shortened the retraining time from nearly 1 s to less than 4 ms when tested on the embedded system prototype, which demonstrated the feasibility of a nearly "zero-delay" SFTM that is imperceptible to the users. The results of the real-time tests suggested that the SFTM was able to handle different types of disturbances investigated in this study and significantly improve the classification performance when one or multiple EMG signals were disturbed. In addition, the SFTM could also maintain the system's classification performance when there was no disturbance. This paper presented a real-time, lightweight, and automatic SFTM, which paved the way for reliable and robust EMG PR for prosthesis control.
Kepler AutoRegressive Planet Search

NASA Astrophysics Data System (ADS)

Caceres, Gabriel Antonio; Feigelson, Eric

2016-01-01

The Kepler AutoRegressive Planet Search (KARPS) project uses statistical methodology associated with autoregressive (AR) processes to model Kepler lightcurves in order to improve exoplanet transit detection in systems with high stellar variability. We also introduce a planet-search algorithm to detect transits in time-series residuals after application of the AR models. One of the main obstacles in detecting faint planetary transits is the intrinsic stellar variability of the host star. The variability displayed by many stars may have autoregressive properties, wherein later flux values are correlated with previous ones in some manner. Our analysis procedure consisting of three steps: pre-processing of the data to remove discontinuities, gaps and outliers; AR-type model selection and fitting; and transit signal search of the residuals using a new Transit Comb Filter (TCF) that replaces traditional box-finding algorithms. The analysis procedures of the project are applied to a portion of the publicly available Kepler light curve data for the full 4-year mission duration. Tests of the methods have been made on a subset of Kepler Objects of Interest (KOI) systems, classified both as planetary `candidates' and `false positives' by the Kepler Team, as well as a random sample of unclassified systems. We find that the ARMA-type modeling successfully reduces the stellar variability, by a factor of 10 or more in active stars and by smaller factors in more quiescent stars. A typical quiescent Kepler star has an interquartile range (IQR) of ~10 e-/sec, which may improve slightly after modeling, while those with IQR ranging from 20 to 50 e-/sec, have improvements from 20% up to 70%. High activity stars (IQR exceeding 100) markedly improve. A periodogram based on the TCF is constructed to concentrate the signal of these periodic spikes. When a periodic transit is found, the model is displayed on a standard period-folded averaged light curve. Our findings to date on real-data tests of the KARPS methodology will be discussed including confirmation of some Kepler Team `candidate' planets. We also present cases of new possible planetary signals.
The influence of outliers on results of wet deposition measurements as a function of measurement strategy

NASA Astrophysics Data System (ADS)

Slanina, J.; Möls, J. J.; Baard, J. H.

The results of a wet deposition monitoring experiment, carried out by eight identical wet-only precipitation samplers operating on the basis of 24 h samples, have been used to investigate the accuracy and uncertainties in wet deposition measurements. The experiment was conducted near Lelystad, The Netherlands over the period 1 March 1983-31 December 1985. By rearranging the data for one to eight samplers and sampling periods of 1 day to 1 month both systematic and random errors were investigated as a function of measuring strategy. A Gaussian distribution of the results was observed. Outliers, detected by a Dixon test ( a = 0.05) influenced strongly both the yearly averaged results and the standard deviation of this average as a function of the number of samplers and the length of the sampling period. The systematic bias in bulk elements, using one sampler, varies typically from 2 to 20% and for trace elements from 10 to 500%, respectively. Severe problems are encountered in the case of Zn, Cu, Cr, Ni and especially Cd. For the sensitive detection of trends generally more than one sampler per measuring station is necessary as the standard deviation in the yearly averaged wet deposition is typically 10-20% relative for one sampler. Using three identical samplers, trends of, e.g. 3% per year will be generally detected in 6 years.
An approach to automated particle picking from electron micrographs based on reduced representation templates.

PubMed

Volkmann, Niels

2004-01-01

Reduced representation templates are used in a real-space pattern matching framework to facilitate automatic particle picking from electron micrographs. The procedure consists of five parts. First, reduced templates are constructed either from models or directly from the data. Second, a real-space pattern matching algorithm is applied using the reduced representations as templates. Third, peaks are selected from the resulting score map using peak-shape characteristics. Fourth, the surviving peaks are tested for distance constraints. Fifth, a correlation-based outlier screening is applied. Test applications to a data set of keyhole limpet hemocyanin particles indicate that the method is robust and reliable.
Convex relaxations of spectral sparsity for robust super-resolution and line spectrum estimation

NASA Astrophysics Data System (ADS)

Chi, Yuejie

2017-08-01

We consider recovering the amplitudes and locations of spikes in a point source signal from its low-pass spectrum that may suffer from missing data and arbitrary outliers. We first review and provide a unified view of several recently proposed convex relaxations that characterize and capitalize the spectral sparsity of the point source signal without discretization under the framework of atomic norms. Next we propose a new algorithm when the spikes are known a priori to be positive, motivated by applications such as neural spike sorting and fluorescence microscopy imaging. Numerical experiments are provided to demonstrate the effectiveness of the proposed approach.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.